Section 6.3. Small Intersection of Broad Criteria


6.3. Small Intersection of Broad Criteria

The situation we talk about in this section is that of a small result set based on the intersection of several broad criteria. Each criterion individually would produce a large result set, yet the intersection of those individual, large sets is a very small, final result set returned by the query.

Continuing on with our query example from the preceding section, if the existence test on the article that was ordered is not selective, we must necessarily apply some other criteria elsewhere (otherwise the result set would no longer be a small result set). In this case, the question of whether to use a regular join, a correlated subquery, or an uncorrelated subquery usually receives a different answer depending on both the relative "strength" of the different criteria and the existing indexes.

Let's suppose that instead of checking people who have ordered a Batmobile, admittedly not our best-selling article, we look for customers who have ordered something that I hope is much less unusual, in this case some soap, but purchased last Saturday. Our query then becomes something like this:

     select distinct orders.custid     from orders          join orderdetail             on (orderdetail.ordid = orders.ordid)          join articles             on (articles.artid = orderdetail.artid)     where articles.artname = 'SOAP'       and <selective criterion on the date in the orders table> 

Quite logically, the processing flow will be the reverse of what we had with a selective article: get the article, then the order lines that contained the article, and finally the orders. In the case we're currently discussing, that of orders for soap, we should first get the small number of orders placed during the relatively short interval of time, and then check which ones refer to the article soap. From a practical point of view, we are going to use a totally different set of indexes. In the first case, ideally, we would like to see one index on the article name and one on the article identifier in the orderdetail table, and then we would have used the index on the primary key ordid in the orders table. In the case of orders for soap, what we want to find is an index on the date in orders and then one on orderid in orderdetail, from which we can use the index on the primary key of articlesassuming, of course, that in both cases using the indexes is the best course to take.

The obvious natural choice to get customers who bought soap last Saturday would appear to be a correlated subquery:

     select distinct orders.custid     from orders     where <selective criterion on the date in the orders table>       and exists (select 1                   from orderdetail                       join articles                         on (articles.artid = orderdetail.artid)                     where articles.artname = 'SOAP'                       and orderdetails.ordid = orders.ordid) 

In this approach, we take for granted that the correlated subquery executes very quickly. Our assumption will prove true only if orderdetail is indexed on ordid (we shall then get the article through its primary key artid; therefore, there is no other issue).

You've seen in Chapter 3 that indexes are something of a luxury in transactional databases, due to their high cost of maintenance in an environment of frequent inserts, updates, and deletes. This cost may lead us to opt for a "second-best" solution. The absence of the vital index on orderdetail and good reason for not creating further indexes might prompt us to consider the following:

     select distinct orders.custid     from orders,          (select orderdetails.ordid           from orderdetail,                articles           where articles.artid = orderdetail.artid             and articles.artname = 'SOAP') as sub_q     where sub_q.ordid = orders.ordid       and <selective criterion on the date in the orders table> 

In this second approach, the index requirements are different: if we don't sell millions of articles, it is likely that the condition on the article name will perform quite satisfactorily even in the absence of any index on artname. We shall probably not need any index on the column artid of orderdetail either: if the article is popular and appears in many orders, the join between orderdetail and articles is probably performed in a more efficient manner by hash or merge join, rather than by a nested loop that would need such an index on artid. Compared to the first approach, we have here a solution that we could call a low index solution. Because we cannot afford to create indexes on each and every column in a table, and because we usually have in every application a set of "secondary" queries that are not absolutely critical but only require a decent response time, the low index approach may perform in a perfectly acceptable manner.

Adding one extra search criterion to an existing query can completely change a previous construct: a modified query is a new query.




The Art of SQL
The Art of SQL
ISBN: 0596008945
EAN: 2147483647
Year: N/A
Pages: 143

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net