Symmetric Queries as a Building Block for Efficient Parallel Query Evaluation

Introduction | People | Publications | Boarder Impacts

This material is based upon work supported by the National Science Foundation under Grant No. 1606557 (2015-2020)

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation."

Introduction

Today's applications frequently feature massive and heterogeneous data and complicated computational requirements. There have been many efforts towards efficient parallel query processing and optimization. However, this does not mean that the parallel potentials have been realized by existing techniques and frameworks in scaling to massive datasets, especially for applications that inherently demand recursive data accesses.

  • Goal:

    The aim of the project is to offer a theoretical methodology for tackling the problem of parallel query evaluation on massive data. The PI conjectures that to maximize parallelizability of generic queries, e.g. queries that are used frequently in analytical and transactional applications, one need to examine queries that are inherently parallelizable as the basic unit of study, and identifies symmetric queries as a set of queries that are potentially highly parallelizable and use such queries as a stepping stone to study parallelizable query languages and leverage the findings to design techniques for efficient evaluation of generic queries. In particular, the project will focus on three separate yet highly related tasks: (1) design and study a set of query languages whose queries are symmetric, investigate the properties of these languages, and propose and prove theoretical bounds of the computational complexity of the languages, in terms of scaling and data skew; (2) investigate and propose data structures and algorithms for efficiently evaluating queries of these languages in a parallel manner; and (3) propose strategies including query rewrite and optimization techniques for efficient evaluation of arbitrary queries, based on the new data structures and algorithms that  result from (2).

  • Challenges:

    The challenges of this project is logistical rather than technical. The PI, Melanie Wu, moved from Indiana University, an R1 institution, to Pomona College, a liberal arts institution with only undergraduate students. She had to delay the start of the project for a year from 2014 to 2015 and conducted research mainly through collaborations. With the support from the NSF program directors, she was able to readjust the aim of the project to limit the scope to theoretical studies. However, the research is fruitful and she was able to involve undergraduate students, especially supporting URM students.

  • People

    Research Outcomes and Publications

    Boarder Impacts and Community Outreach

    The fund provided training for undergraduate students to get into research. Among the students funded by this project, Eric Campbell is currently a PhD candidate at Cornell University and a 2019 NSF Graduate Research Fellow. Many of the students are URMs and 1st-gen students.

    With the support from NSF, the PI has been a leading force in promoting diversity in the field of computer science.

    Point of Contact: Yuqing Melanie Wu

    Last updated on July 2, 2020