Non-Relational Optimization of Relational
Queries
This project was supported by NSF
under research grant IIS-0091928 from 2000 to 2003, with a no-cost extension
in 2004.
The thesis underlying this line of work is that SQL is an inherently
redundant language in which some business queries cannot be expressed in
a simple way (and sometimes, not with a single query). But since SQL
is here to stay (and databases are only growing bigger), we need to
optimize it the best way we can. This project proposes to go outside
the relational model and look for other ways to implement SQL queries
which have the potential to yield better results for some queries,
especially those used in Decision-Support environments.
We focused on queries with nested subqueries. There were two main aims to
our research:
- To deal with redundancy in complex SQL queries. The key
observation here is that sometimes, especially when aggregates are
involved, there exists some overlap between a query and its nested
subquery. Current approaches ignore this overlap and happily repeat
computations.
- To deal uniformly with all kinds of subqueries in SQL. Our view
is that much effort has gone into decorrelating query and subquery,
while the linking condition (the condition relating query and
subquery) has received little attention. However, each different
linking condition brings different semantics to the query. The
treatment of the linking conditions in current approaches is
superficial and somewhat ad-hoc.
To address the first issue, we came up with the for-loop operator,
which allows us to compute some aggregates on the fly.
To address the second issue, we developed a novel approach:
using nested relational algebra for query trees and query plans.
Details on each approach are provided in the papers below (earlier papers
at the top, newer papers at the bottom).
-
Here is the project report for the 2001 PI Workshop, which was held in
Fort Worth, TX, on April-May 2001, in
Postscript and here in PDF.
-
Here is the project report for the 2002 PI Workshop, which was held in
Arlington, Virginia, on May 5-7 2002, in
Postscript and here in PDF.
-
The project report for 2003 PI Workshop is in the following page
-
Here is a paper on this project which was written before the award: Optimization of Sequences of Relational
Decision-Support Queries, which appeared in DAWAK'99.
-
Here is a technical report based on the master thesis of Matt
Chanda ( in ps and in
pdf). He added subqueries to MySQL by rewritting queries with
WHERE-clause subqueries into queries with FROM-clause subqueries.
The code (which involves non-correlated subqueries only) will also be
made publicly available in a few days!
- Here is the paper that was published based on Matt's thesis (written
by Matt, Bin Cao and myself):
Adding subqueries to MySQL, or
What Does it Take to Have a
Decision-Support Engine?, in Proceedings of the Fifth
International Workshop on Data Warehousing and OLAP, DOLAP'02.
-
Here is a technical report on the for-loop approach in Postscrip and in PDF. A summarized
version of this report appeared in DAWAK'03.
-
Our approach to query unnesting
deals with correlations following Dayal's approach, but incorporates
the treatment of linking conditions into the approach, and does so
without adding almost any overhead. A partial version of this research
appeared in the proceedings of DAWAK'03.
- Our approach of using nested relational algebra appeared in the paper
title that appeared in SIGMOD'05.
Experimental work was carried out in the Database Laboratory
of the Department of
Computer Engineering and Computer Science of the Speed Scientific School at
the University of
Louisville, which I direct. Feel free to contact me at
abadia@louisville.edu.