RecipeB.6.Answering Questions Involving


Recipe B.6. Answering Questions Involving "Any" or "All"

Queries involving "any" or "all" typically require you to find rows that satisfy one or more conditions completely. For example, if you are asked to find people who eat all vegetables, you are essentially looking for people for whom there is no vegetable that they do not eat. This type of problem statement is typically categorized as relational division. With questions regarding "any," it is crucial you pay close attention to how the question is phrased. Consider the difference between these two requirements: "a student who takes any class" and "a plane faster than any train." The former implies, "find a student who takes at least one class," while the latter implies "find a plane that is faster than all trains."

Question 12

You want to find students who take all courses.

The number of courses for a student in table TAKE must be equal to the total number of courses in table COURSES. There are three courses in table COURSES. Only AARON takes all three courses and should be the only student returned. The final result set should be:

 SNO SNAME AGE --- ------  ---   1 AARON 20 

MySQL and PostgreSQL

Use the aggregate function COUNT to find students who take every course:

 1 select s.sno,s.sname,s.age 2 from student s, take t 3  where s.sno = t.sno 4  group by s.sno,s.sname,s.age 5 having count(t.cno) = (select count(*) from courses) 

DB2 and SQL Server

Use the window function COUNT OVER and an outer join instead of a subquery:

 1 select sno,sname,age 2 from ( 3 select s.sno,s.sname,s.age, 4  count(t.cno) 5  over (partition by s.sno) as cnt, 6  count(distinct c.title) over() as total, 7  row_number() over 8  (partition by s.sno order by c.cno) as rn 9 from courses c 10  left join take t on (c.cno = t.cno) 11  left join student s on (t.sno = s.sno) 12  ) x 13 where cnt = total 14   and rn = 1 

Oracle

Users on Oracle9i and later can use the DB2 solution. Alternatively, you can use the proprietary Oracle outer-join syntax, which is mandatory for users on 8i and earlier:

 1  select sno,sname,age 2  from ( 3  select s.sno,s.sname,s.age, 4   count(t.cno) 5   over (partition by s.sno) as cnt, 6         count(distinct c.title) over( ) as total, 7         row_number( ) over 8   (partition by s.sno order by c.cno) as rn 9   from courses c, take t, student s 10  where c.cno = t.cno (+) 11 and t.sno = s.sno (+) 12 ) 13   where cnt = total 14     and rn = 1 

Discussion

The aggregate solution uses a subquery to return the total number of courses available. The outer query keeps only students who take the same number of courses as the value returned by the subquery. The window solution takes a different approach: it uses an outer join to table COURSES instead of a subquery. The window solution also uses window functions to return the number of courses a student takes (aliased CNT) along with the total number of courses there are in table COURSES (aliased TOTAL). The query below shows the intermediate results from those window functions:

 select s.sno,s.sname,s.age, count(distinct t.cno) over (partition by s.sno) as cnt, count(distinct c.title) over( ) as total, row_number( ) over(partition by s.sno order by c.cno) as rn    from courses c left join take t on (c.cno = t.cno) left join student s on (t.sno = s.sno)   order by 1 SNO SNAME AGE  CNT TOTAL RN  --- ------ ----  ----  ----------  ----   1 AARON 20 3 3 1   1 AARON 20 3 3 2   1 AARON 20 3 3 3   2 CHUCK 21 1 3 1   3 DOUG 20 2 3 1   3 DOUG 20 2 3 2   4 MAGGIE 19 2 3 1   4 MAGGIE 19 2 3 2   5 STEVE 22 1 3 1   6 JING 18 2 3 1   6 JING 18 2 3 2 

The student who takes all courses is the one where CNT equals TOTAL. ROW_NUMBER is used instead of DISTINCT to filter out the duplicates from the final result set. Strictly speaking, the outer joins to tables TAKE and STUDENT are not necessary, as there are no courses that aren't taken by at least one student. If there is a course that no students take, CNT would not equal TOTAL, and a row with NULL values for SNO, SNAME, and AGE would be returned. The example below creates a new course that no students take. The following query demonstrates what the intermediate result set would look like if there exists a course no students take (for clarity, C.TITLE is included below):

 insert into courses values ('CS115','BIOLOGY',4) select s.sno,s.sname,s.age,c.title,    count(distinct t.cno)    over (partition by s.sno) as cnt,    count(distinct c.title) over( ) as total,    row_number( )    over(partition by s.sno order by c.cno) as rn   from courses c    left join take t on (c.cno = t.cno)    left join student s on (t.sno = s.sno)      order by 1  SNO SNAME AGE  TITLE    CNT TOTAL RN  --- ------ --- ---------- --- ----- ---  1  AARON 20 PHYSICS 3 4   1  1  AARON 20 CALCULUS 3 4   2  1   AARON 20 HISTORY 3 4   3  2   CHUCK 21 PHYSICS 1 4   1  3   DOUG 20 PHYSICS 2 4   1  3   DOUG 20 HISTORY 2 4   2  4   MAGGIE 19 PHYSICS 2 4   1  4   MAGGIE 19 CALCULUS 2 4   2  5   STEVE 22 CALCULUS 1 4   1  6   JING 18 CALCULUS 2 4   1  6   JING 18 HISTORY 2 4   2 BIOLOGY 0 4   1  

Examining these results, it's easy to see no rows will be returned when the final filters are applied. Additionally, keep in mind that window functions take effect after the WHERE clause is evaluated so it is necessary to use DISTINCT when counting the total courses available in table COURSES (otherwise you get the total from the result set, which would be the total number of courses taken by all students, i.e., select count(cno) from take).

The sample data used for this example does not have any duplicates in table TAKE, so the solution provided works fine. If there had been duplicates in TAKE, for example, a student that takes the same courses three times, the solution would fail. The workaround for dealing with duplicates in this solution is trivial; simply add DISTINCT when performing the count on T.CNO and the solution will work correctly.


Original solution

The original solution avoids aggregates by using a Cartesian product in a devilishly clever way. The query below is based on the original:

 select * from student where sno not in ( select s.sno from student s, courses c    where (s.sno,c.cno) not in (select sno,cno from take) ) 

Rozenshtein restates the problem to be "Which students are not among those for whom there is a course that they do not take?" If you look at the problem that way, you are now working with negation. Recall how Rozenshtein suggests handling negation:

Remember that real negation requires two passes: To find out "who does not," first find out "who does" and then get rid of them.

The innermost subquery returns all valid SNO/CNO combinations. The middle subquery, which uses a Cartesian product between tables STUDENT and COURSES, returns all students and all courses (i.e., every student taking every course) and filters out the valid SNO/CNO combinations (leaving only "made up" SNO/CNO combinations). The outermost query returns only the rows from table STUDENT where the SNO is not amongst those returned by the middle subquery. The following queries may make the solution a bit more clear. To keep it readable, I'll use only AARON and CHUCK (only AARON takes all courses):

 select * from student   where sno in ( 1,2 )      SNO SNAME   AGE   --- ---------- ---- 1 AARON   20 2 CHUCK   21 select * from take   where sno in ( 1,2 )   SNO CNO   --- ----- 1 CS112 1 CS113 1 CS114 2 CS112 select s.sno, c.cno from student s, courses c  where s.sno in ( 1,2 )  order by 1  SNO CNO  --- -----    1 CS112    1 CS113    1 CS114    2 CS112    2 CS113    2 CS114 

These queries show the rows from table STUDENT for AARON and CHUCK, the courses that AARON and CHUCK take, and a Cartesian product that returns AARON and CHUCK taking all courses, respectively. The result set from the Cartesian product for AARON matches the result set returned for AARON from table TAKE, but CHUCK has two "made up" rows as a result of the Cartesian product that do not match his rows in table TAKE. The following query is the middle subquery and uses NOT IN to filter out the valid SNO/CNO combinations:

 select s.sno, c.cno from student s, courses c   where s.sno in ( 1,2 ) and (s.sno,c.cno) not in (select sno,cno from take)      SNO CNO  --- ----    2 CS113    2 CS114 

Notice that AARON is not returned by the middle subquery (because AARON takes all courses). The result set of the middle subquery contains rows that exist due to the Cartesian product, not because CHUCK actually takes those courses. The outermost query then returns rows from table STUDENT where the SNO is not amongst the SNO returned by the middle subquery:

 select * from student   where sno in ( 1,2 ) and sno not in (select s.sno from student s, courses c where s.sno in ( 1,2 ) and (s.sno,c.cno) not in (select sno,cno from take)) SNO SNAME AGE --- ---------- ----- 1 AARON 20 

Question 13

Find students who are older than any other students.

You can restate the problem as "Find the oldest students." The final result set should be:

 SNO SNAME AGE --- -------- ------  5  STEVE 22 

MySQL and PostgreSQL

Use the aggregate function MAX in a subquery to find the oldest students:

 1 select * 2 from student 3  where age = (select max(age) from student) 

DB2, Oracle, and SQL Server

Use the window function MAX OVER in an inline view to find the oldest students:

 1 select sno,sname,age 2   from ( 3 select s.*, 4 max(s.age)over() as oldest 5   from student s 6 ) x 7  where age = oldest 

Discussion

Both solutions use the function MAX to find the oldest student. The subquery solution first finds the greatest age in table STUDENT and returns it to the outer query, which finds student of that age. The window version does the same as the subquery solution but returns the greatest age for each row. The intermediate results of the window query are as follows:

 select s.*, max(s.age) over() as oldest from student s SNO SNAME     AGE     OLDEST --- ---------- ---- ----------  1 AARON 20 22  2 CHUCK 21 22  3 DOUG 20 22  4 MAGGIE     19 22  5 STEVE 22 22  6 JING 18 22  7 BRIAN 21 22  8 KAY 20 22  9 GILLIAN 20 22     10 CHAD 21 22 

To find the oldest students, simply keep the rows where AGE = OLDEST.

Original solution

The original solution uses a self join on table STUDENT in a subquery to find all students who are younger than some other student. The outer query returns all students from table STUDENT who are not amongst those returned by the subquery. The operation can be rephrased as "find all students who are not amongst those students who are younger than at least one other student":

 select * from student   where age not in (select a.age from student a, student b   where a.age < b.age) 

The subquery returns use a Cartesian product to find all ages in A that are younger than all ages in B. The only age that would not be younger than any other age is the greatest age. The greatest age is not returned by the subquery. The outer query uses NOT IN to return all rows from table STUDENT where AGE is not amongst the AGE returned by the subquery (if A.AGE is returned, that means there is an AGE somewhere in table STUDENT that is greater than it). If you have trouble understanding how it works, examine the following query. Conceptually they both work in a similar way, but the following is probably more common:

 select * from student   where age >= all (select age from student) 




SQL Cookbook
SQL Cookbook (Cookbooks (OReilly))
ISBN: 0596009763
EAN: 2147483647
Year: 2005
Pages: 235

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net