6.3. Small Intersection of Broad CriteriaThe situation we talk about in this section is that of a small result set based on the intersection of several broad criteria. Each criterion individually would produce a large result set, yet the intersection of those individual, large sets is a very small, final result set returned by the query. Continuing on with our query example from the preceding section, if the existence test on the article that was ordered is not selective, we must necessarily apply some other criteria elsewhere (otherwise the result set would no longer be a small result set). In this case, the question of whether to use a regular join, a correlated subquery, or an uncorrelated subquery usually receives a different answer depending on both the relative "strength" of the different criteria and the existing indexes. Let's suppose that instead of checking people who have ordered a Batmobile, admittedly not our best-selling article, we look for customers who have ordered something that I hope is much less unusual, in this case some soap, but purchased last Saturday. Our query then becomes something like this: select distinct orders.custid from orders join orderdetail on (orderdetail.ordid = orders.ordid) join articles on (articles.artid = orderdetail.artid) where articles.artname = 'SOAP' and <selective criterion on the date in the orders table> Quite logically, the processing flow will be the reverse of what we had with a selective article: get the article, then the order lines that contained the article, and finally the orders. In the case we're currently discussing, that of orders for soap, we should first get the small number of orders placed during the relatively short interval of time, and then check which ones refer to the article soap. From a practical point of view, we are going to use a totally different set of indexes. In the first case, ideally, we would like to see one index on the article name and one on the article identifier in the orderdetail table, and then we would have used the index on the primary key ordid in the orders table. In the case of orders for soap, what we want to find is an index on the date in orders and then one on orderid in orderdetail, from which we can use the index on the primary key of articlesassuming, of course, that in both cases using the indexes is the best course to take. The obvious natural choice to get customers who bought soap last Saturday would appear to be a correlated subquery: select distinct orders.custid from orders where <selective criterion on the date in the orders table> and exists (select 1 from orderdetail join articles on (articles.artid = orderdetail.artid) where articles.artname = 'SOAP' and orderdetails.ordid = orders.ordid) In this approach, we take for granted that the correlated subquery executes very quickly. Our assumption will prove true only if orderdetail is indexed on ordid (we shall then get the article through its primary key artid; therefore, there is no other issue). You've seen in Chapter 3 that indexes are something of a luxury in transactional databases, due to their high cost of maintenance in an environment of frequent inserts, updates, and deletes. This cost may lead us to opt for a "second-best" solution. The absence of the vital index on orderdetail and good reason for not creating further indexes might prompt us to consider the following: select distinct orders.custid from orders, (select orderdetails.ordid from orderdetail, articles where articles.artid = orderdetail.artid and articles.artname = 'SOAP') as sub_q where sub_q.ordid = orders.ordid and <selective criterion on the date in the orders table> In this second approach, the index requirements are different: if we don't sell millions of articles, it is likely that the condition on the article name will perform quite satisfactorily even in the absence of any index on artname. We shall probably not need any index on the column artid of orderdetail either: if the article is popular and appears in many orders, the join between orderdetail and articles is probably performed in a more efficient manner by hash or merge join, rather than by a nested loop that would need such an index on artid. Compared to the first approach, we have here a solution that we could call a low index solution. Because we cannot afford to create indexes on each and every column in a table, and because we usually have in every application a set of "secondary" queries that are not absolutely critical but only require a decent response time, the low index approach may perform in a perfectly acceptable manner. Adding one extra search criterion to an existing query can completely change a previous construct: a modified query is a new query. |