RecipeB.5.Answering Questions Involving


Recipe B.5. Answering Questions Involving "Exactly"

You would think that answering the question of whether or not something is true would be easy. In many cases it is easy. But sometimes it can be tricky to answer questions of whether something is "exactly" true, especially when answering involves joining master/detail data. The problem stems from the exclusive nature of "exactly." It may be more helpful to think of it as "only." Consider the difference between people who wear shoes and those who wear only shoes. It is not enough to satisfy the condition; you must satisfy the condition while ensuring that no other conditions are satisfied.

Question 9

Find professors who teach exactly one course.

You can restate the problem as "Find professors who teach only one course." Which course they teach is unimportant; what matters is that only one course is taught. The final result set should be:

 LNAME      DEPT           SALARY AGE ---------- ---------- ---------- ---- POMEL      SCIENCE           500   65 

MySQL and PostgreSQL

Use the aggregate function COUNT to find the professors who teach exactly one course:

 1 select p.lname,p.dept,p.salary,p.age 2   from professor p, teach t 3  where p.lname = t.lname 4  group by p.lname,p.dept,p.salary,p.age 5 having count(*) = 1 

DB2, Oracle, and SQL Server

Use the window function COUNT OVER to find the professors who teach exactly one course:

 1 select lname, dept, salary, age 2   from ( 3 select p.lname,p.dept,p.salary,p.age, 4        count(*) over (partition by p.lname) as cnt 5   from professor p, teach t 6  where p.lname = t.lname 7        ) x 8  where cnt = 1 

Discussion

By inner joining table PROFESSOR to table TEACH you ensure that all professors who teach no courses are excluded. The aggregate solution uses the COUNT function in the HAVING clause to return only professors who teach exactly one course. The window solution uses the COUNT OVER function, but notice that the columns from table PROFESSOR that are used in the PARTITION clause of the COUNT OVER function are different from the columns that are used in the GROUP BY of the aggregate solution. In this example it is safe for the GROUP BY and PARTITION BY clauses to be different, because the last names are unique in table TEACHER, i.e., excluding P.DEPT, P.SALARY, and .PAGE from the partition does not affect the COUNT operation. In solutions prior to this one, I purposely use the same columns in the PARTITION clause of a window function solution as I use in the GROUP BY clause of an aggregate solution to show that the PARTITION is a moving, more flexible kind of GROUP BY.

Original solution

This solution uses the same technique used in Question 3: perform two passes to find the answer. The first step is to find those professors who teach two or more classes. The second step is to find those professors who teach a course and are not amongst those returned by step 1. Please refer to Question 3 for a full discussion. The solution is shown below:

 select p.*   from professor p,        teach t  where p.lname = t.lname    and p.lname not in ( select t1.lname   from teach t1,        teach t2  where t1.lname = t2.lname    and t1.cno > t2.cno ) LNAME      DEPT           SALARY       AGE ---------- ---------- ---------- ---------- POMEL      SCIENCE           500         65 

Question 10

You want to find students who take only CS112 and CS114 (exactly those two courses and no other courses), but the following query returns an empty result set:

 select s.*   from student s, take t  where s.sno = t.sno    and t.cno = 'CS112'    and t.cno = 'CS114' 

No row can have a column that is simultaneously two values (assuming simple scalar data types such as those used for table STUDENT), so the query will never work. Rozenshtein's book does a nice job of discussing how intuitive thinking when writing queries causes errors such as this one. DOUG is the only student who takes only CS112 and CS114 and should be the only student returned for this query.

MySQL and PostgreSQL

Use a CASE expression and the aggregate function COUNT to find students who take only CS112 and CS114:

 1 select s.sno, s.sname, s.age 2   from student s, take t 3  where s.sno = t.sno 4  group by s.sno, s.sname, s.age 5 having count(*) = 2 6    and max(case when cno = 'CS112' then 1 else 0 end) + 7        max(case when cno = 'CS114' then 1 else 0 end) = 2 

DB2, Oracle, and SQL Server

Use the window function COUNT OVER with a CASE expression to find students who take only CS112 and CS114:

  1 select sno,sname,age  2   from (  3 select s.sno,  4        s.sname,  5   s.age,  6   count(*) over (partition by s.sno) as cnt,  7   sum(case when t.cno in ( 'CS112', 'CS114' )  8    then 1 else 0  9   end) 10       over (partition by s.sno) as both, 11   row_number() 12   over (partition by s.sno order by s.sno) as rn 13   from student s, take t 14  where s.sno = t.sno 15        ) x 16  where cnt = 2 17    and both = 2 18    and rn = 1 

Discussion

The aggregate solution uses the same technique found in Question 1 and Question 2. The inner join from table STUDENT to table TAKE ensures that any students who take no courses are excluded. The COUNT expression in the HAVING clause keeps only students who take exactly two courses. The results of the CASE expressions counting the number of courses are summed. Only those students who take both CS112 and CS114 have a sum of 2.

The window solution uses a technique similar to the window solutions found in Question 1 and Question 2. This version is slightly different as the value of the CASE expression is returned to the window function SUM OVER. Another variation in this solution is the use of the window function ROW_NUMBER to avoid using DISTINCT. The results of the window solution without the final filters are shown below:

 select s.sno,        s.sname,    s.age,    count(*) over (partition by s.sno) as cnt,    sum(case when t.cno in ( 'CS112', 'CS114' )             then 1 else 0    end)    over (partition by s.sno) as both,    row_number( )    over (partition by s.sno order by s.sno) as rn   from student s, take t  where s.sno = t.sno    SNO SNAME   AGE  CNT BOTH   RN  --- ------ ---- ---- ---- ----   1  AARON    20    3    2    1   1  AARON    20    3    2    2   1  AARON    20    3    2    3   2  CHUCK    21    1    1    1   3  DOUG     20    2    2    1   3  DOUG     20    2    2    2   4  MAGGIE   19    2    1    1   4  MAGGIE   19    2    1    2     5  STEVE    22    1    0    1   6  JING     18    2    1    1   6  JING     18    2    1    2 

Examining these results, you can see that the final result set is the one where BOTH and CNT are 2. RN can be either 1 or 2, it doesn't matter; that column exists only to help filter out duplicates without using DISTINCT.

Original solution

This solution uses a subquery with multiple self joins to first find students who take at least three classes. The next step is to use a self join on table TAKE to find those students who take both CS112 and CS114. The final step is to keep only those students who take both CS112 and CS114 and do not take three or more classes. The solution is shown below:

 select s1.*   from student s1,    take t1,        take t2  where s1.sno = t1.sno    and s1.sno = t2.sno    and t1.cno = 'CS112'    and t2.cno = 'CS114'    and s1.sno not in ( select s2.sno   from student s2,    take t3,    take t4,    take t5  where s2.sno = t3.sno    and s2.sno = t4.sno    and s2.sno = t5.sno    and t3.cno > t4.cno    and t4.cno > t5.cno ) SNO SNAME      AGE --- ---------- ---   3 DOUG        20 

Question 11

You want to find students who are older than exactly two other students. Another way of stating the problem is that you want to find the third youngest student(s). The final result set should be:

 SNO SNAME             AGE --- ---------- ----------     1 AARON              20   3 DOUG               20   8 KAY                20   9 GILLIAN            20 

MySQL and PostgreSQL

Use the aggregate function COUNT and a correlated subquery to find the third youngest student:

 1 select s1.* 2   from student s1 3  where 2 = ( select count(*) 4                from student s2 5               where s2.age < s1.age ) 

DB2, Oracle, and SQL Server

Use the window function DENSE_RANK to find the third youngest student:

 1 select sno,sname,age 2   from ( 3 select sno,sname,age, 4        dense_rank( )over(order by age) as dr 5   from student 6        ) x 7  where dr = 3 

Discussion

The aggregate solution uses a scalar subquery to find all students who are older than two (and only two) other students. To see how this works, rewrite the solution to use a scalar subquery. In the following example, the column CNT represents the number of students that are younger than the current student:

 select s1.*,        (select count(*) from student s2          where s2.age < s1.age) as cnt   from student s1   order by 4 SNO SNAME             AGE        CNT --- ---------- ---------- ----------  6  JING               18          0  4  MAGGIE             19    1  1  AARON              20          2  3  DOUG               20          2  8  KAY                20          2  9  GILLIAN            20          2  2  CHUCK              21          6  7  BRIAN              21          6 10  CHAD               21          6  5  STEVE              22          9 

Rewriting the solution this way makes it easy to see who the third youngest students are (those whose CNT is 2).

The solution using the window function DENSE_RANK is similar to the scalar subquery example in that every row is ranked based on how many students are younger than the current student (ties are allowed and there are no gaps). The following query shows the output from the DENSE_RANK function:

 select sno,sname,age,        dense_rank()over(order by age) as dr   from student   SNO SNAME             AGE         DR   --- ---------- ---------- ----------     6 JING               18          1     4 MAGGIE             19          2     1 AARON              20          3     3 DOUG               20          3     8 KAY                20          3     9 GILLIAN            20          3     2 CHUCK              21          4 7 BRIAN              21          4    10 CHAD               21          4 5 STEVE              22          5 

The final step is to wrap the query in an inline view and keep only those rows where DR is 3.

Original solution

The original solution uses a two-pass approach: step 1, find the students who are older than three or more students; step 2, find the students who are older than two students who are not amongst the students returned by step 1. Alternatively, Rozenshtein would rephrase this as, "Find students who are older than at least two students and are not older than at least three students." The solution is shown below:

 select s5.*    from student s5, student s6, student s7   where s5.age > s6.age and s6.age > s7.age and s5.sno not in (     select s1.sno    from student s1, student s2, student s3, student s4   where s1.age > s2.age     and s2.age > s3.age     and s3.age > s4.age    ) SNO SNAME AGE --- ------ ---- 1 AARON 20 3 DOUG 20 9 GILLIAN 20 8 KAY 20 

The solution above uses the technique shown in Question 5. Refer to Question 5 for a complete discussion of how extremes are found using self joins.




SQL Cookbook
SQL Cookbook (Cookbooks (OReilly))
ISBN: 0596009763
EAN: 2147483647
Year: 2005
Pages: 235

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net