Recipe B.5. Answering Questions Involving "Exactly"You would think that answering the question of whether or not something is true would be easy. In many cases it is easy. But sometimes it can be tricky to answer questions of whether something is "exactly" true, especially when answering involves joining master/detail data. The problem stems from the exclusive nature of "exactly." It may be more helpful to think of it as "only." Consider the difference between people who wear shoes and those who wear only shoes. It is not enough to satisfy the condition; you must satisfy the condition while ensuring that no other conditions are satisfied. Question 9Find professors who teach exactly one course. You can restate the problem as "Find professors who teach only one course." Which course they teach is unimportant; what matters is that only one course is taught. The final result set should be: LNAME DEPT SALARY AGE ---------- ---------- ---------- ---- POMEL SCIENCE 500 65 MySQL and PostgreSQLUse the aggregate function COUNT to find the professors who teach exactly one course: 1 select p.lname,p.dept,p.salary,p.age 2 from professor p, teach t 3 where p.lname = t.lname 4 group by p.lname,p.dept,p.salary,p.age 5 having count(*) = 1 DB2, Oracle, and SQL ServerUse the window function COUNT OVER to find the professors who teach exactly one course: 1 select lname, dept, salary, age 2 from ( 3 select p.lname,p.dept,p.salary,p.age, 4 count(*) over (partition by p.lname) as cnt 5 from professor p, teach t 6 where p.lname = t.lname 7 ) x 8 where cnt = 1 DiscussionBy inner joining table PROFESSOR to table TEACH you ensure that all professors who teach no courses are excluded. The aggregate solution uses the COUNT function in the HAVING clause to return only professors who teach exactly one course. The window solution uses the COUNT OVER function, but notice that the columns from table PROFESSOR that are used in the PARTITION clause of the COUNT OVER function are different from the columns that are used in the GROUP BY of the aggregate solution. In this example it is safe for the GROUP BY and PARTITION BY clauses to be different, because the last names are unique in table TEACHER, i.e., excluding P.DEPT, P.SALARY, and .PAGE from the partition does not affect the COUNT operation. In solutions prior to this one, I purposely use the same columns in the PARTITION clause of a window function solution as I use in the GROUP BY clause of an aggregate solution to show that the PARTITION is a moving, more flexible kind of GROUP BY. Original solutionThis solution uses the same technique used in Question 3: perform two passes to find the answer. The first step is to find those professors who teach two or more classes. The second step is to find those professors who teach a course and are not amongst those returned by step 1. Please refer to Question 3 for a full discussion. The solution is shown below: select p.* from professor p, teach t where p.lname = t.lname and p.lname not in ( select t1.lname from teach t1, teach t2 where t1.lname = t2.lname and t1.cno > t2.cno ) LNAME DEPT SALARY AGE ---------- ---------- ---------- ---------- POMEL SCIENCE 500 65 Question 10You want to find students who take only CS112 and CS114 (exactly those two courses and no other courses), but the following query returns an empty result set: select s.* from student s, take t where s.sno = t.sno and t.cno = 'CS112' and t.cno = 'CS114' No row can have a column that is simultaneously two values (assuming simple scalar data types such as those used for table STUDENT), so the query will never work. Rozenshtein's book does a nice job of discussing how intuitive thinking when writing queries causes errors such as this one. DOUG is the only student who takes only CS112 and CS114 and should be the only student returned for this query. MySQL and PostgreSQLUse a CASE expression and the aggregate function COUNT to find students who take only CS112 and CS114: 1 select s.sno, s.sname, s.age 2 from student s, take t 3 where s.sno = t.sno 4 group by s.sno, s.sname, s.age 5 having count(*) = 2 6 and max(case when cno = 'CS112' then 1 else 0 end) + 7 max(case when cno = 'CS114' then 1 else 0 end) = 2 DB2, Oracle, and SQL ServerUse the window function COUNT OVER with a CASE expression to find students who take only CS112 and CS114: 1 select sno,sname,age 2 from ( 3 select s.sno, 4 s.sname, 5 s.age, 6 count(*) over (partition by s.sno) as cnt, 7 sum(case when t.cno in ( 'CS112', 'CS114' ) 8 then 1 else 0 9 end) 10 over (partition by s.sno) as both, 11 row_number() 12 over (partition by s.sno order by s.sno) as rn 13 from student s, take t 14 where s.sno = t.sno 15 ) x 16 where cnt = 2 17 and both = 2 18 and rn = 1 DiscussionThe aggregate solution uses the same technique found in Question 1 and Question 2. The inner join from table STUDENT to table TAKE ensures that any students who take no courses are excluded. The COUNT expression in the HAVING clause keeps only students who take exactly two courses. The results of the CASE expressions counting the number of courses are summed. Only those students who take both CS112 and CS114 have a sum of 2. The window solution uses a technique similar to the window solutions found in Question 1 and Question 2. This version is slightly different as the value of the CASE expression is returned to the window function SUM OVER. Another variation in this solution is the use of the window function ROW_NUMBER to avoid using DISTINCT. The results of the window solution without the final filters are shown below: select s.sno, s.sname, s.age, count(*) over (partition by s.sno) as cnt, sum(case when t.cno in ( 'CS112', 'CS114' ) then 1 else 0 end) over (partition by s.sno) as both, row_number( ) over (partition by s.sno order by s.sno) as rn from student s, take t where s.sno = t.sno SNO SNAME AGE CNT BOTH RN --- ------ ---- ---- ---- ---- 1 AARON 20 3 2 1 1 AARON 20 3 2 2 1 AARON 20 3 2 3 2 CHUCK 21 1 1 1 3 DOUG 20 2 2 1 3 DOUG 20 2 2 2 4 MAGGIE 19 2 1 1 4 MAGGIE 19 2 1 2 5 STEVE 22 1 0 1 6 JING 18 2 1 1 6 JING 18 2 1 2 Examining these results, you can see that the final result set is the one where BOTH and CNT are 2. RN can be either 1 or 2, it doesn't matter; that column exists only to help filter out duplicates without using DISTINCT. Original solutionThis solution uses a subquery with multiple self joins to first find students who take at least three classes. The next step is to use a self join on table TAKE to find those students who take both CS112 and CS114. The final step is to keep only those students who take both CS112 and CS114 and do not take three or more classes. The solution is shown below: select s1.* from student s1, take t1, take t2 where s1.sno = t1.sno and s1.sno = t2.sno and t1.cno = 'CS112' and t2.cno = 'CS114' and s1.sno not in ( select s2.sno from student s2, take t3, take t4, take t5 where s2.sno = t3.sno and s2.sno = t4.sno and s2.sno = t5.sno and t3.cno > t4.cno and t4.cno > t5.cno ) SNO SNAME AGE --- ---------- --- 3 DOUG 20 Question 11You want to find students who are older than exactly two other students. Another way of stating the problem is that you want to find the third youngest student(s). The final result set should be: SNO SNAME AGE --- ---------- ---------- 1 AARON 20 3 DOUG 20 8 KAY 20 9 GILLIAN 20 MySQL and PostgreSQLUse the aggregate function COUNT and a correlated subquery to find the third youngest student: 1 select s1.* 2 from student s1 3 where 2 = ( select count(*) 4 from student s2 5 where s2.age < s1.age ) DB2, Oracle, and SQL ServerUse the window function DENSE_RANK to find the third youngest student: 1 select sno,sname,age 2 from ( 3 select sno,sname,age, 4 dense_rank( )over(order by age) as dr 5 from student 6 ) x 7 where dr = 3 DiscussionThe aggregate solution uses a scalar subquery to find all students who are older than two (and only two) other students. To see how this works, rewrite the solution to use a scalar subquery. In the following example, the column CNT represents the number of students that are younger than the current student: select s1.*, (select count(*) from student s2 where s2.age < s1.age) as cnt from student s1 order by 4 SNO SNAME AGE CNT --- ---------- ---------- ---------- 6 JING 18 0 4 MAGGIE 19 1 1 AARON 20 2 3 DOUG 20 2 8 KAY 20 2 9 GILLIAN 20 2 2 CHUCK 21 6 7 BRIAN 21 6 10 CHAD 21 6 5 STEVE 22 9 Rewriting the solution this way makes it easy to see who the third youngest students are (those whose CNT is 2). The solution using the window function DENSE_RANK is similar to the scalar subquery example in that every row is ranked based on how many students are younger than the current student (ties are allowed and there are no gaps). The following query shows the output from the DENSE_RANK function: select sno,sname,age, dense_rank()over(order by age) as dr from student SNO SNAME AGE DR --- ---------- ---------- ---------- 6 JING 18 1 4 MAGGIE 19 2 1 AARON 20 3 3 DOUG 20 3 8 KAY 20 3 9 GILLIAN 20 3 2 CHUCK 21 4 7 BRIAN 21 4 10 CHAD 21 4 5 STEVE 22 5 The final step is to wrap the query in an inline view and keep only those rows where DR is 3. Original solutionThe original solution uses a two-pass approach: step 1, find the students who are older than three or more students; step 2, find the students who are older than two students who are not amongst the students returned by step 1. Alternatively, Rozenshtein would rephrase this as, "Find students who are older than at least two students and are not older than at least three students." The solution is shown below: select s5.* from student s5, student s6, student s7 where s5.age > s6.age and s6.age > s7.age and s5.sno not in ( select s1.sno from student s1, student s2, student s3, student s4 where s1.age > s2.age and s2.age > s3.age and s3.age > s4.age ) SNO SNAME AGE --- ------ ---- 1 AARON 20 3 DOUG 20 9 GILLIAN 20 8 KAY 20 The solution above uses the technique shown in Question 5. Refer to Question 5 for a complete discussion of how extremes are found using self joins. |