Recipe B.2. Answering Questions Involving NegationIn his book, Rozenshtein approached the teaching of SQL through an examination of the different types of fundamental problems that you are often called upon to solve, in one form or another. Negation is one such type. It is often necessary to find rows for which some condition is not true. Simple conditions are easy but, as the following questions show, some negation problems require a bit of creativity and thought to solve. Question 1You want to find students who do not take CS112, but the following query is returning the wrong results: select * from student where sno in ( select sno from take where cno != 'CS112' ) Because a student may take several courses, this query can (and does) return students who take CS112. The query is incorrect because it does not answer the question: "Who does not take CS112?" Instead, it answers the question "Who takes a course that is not CS112?" The correct result set should include students who take no courses as well as students who take courses but none of them CS112. Ultimately, you should return the following result set: SNO SNAME AGE --------- ---------- ---------- 5 STEVE 22 6 JING 18 7 BRIAN 21 8 KAY 20 9 GILLIAN 20 10 CHAD 21 MySQL and PostgreSQLUse a CASE expression with the aggregate function MAX to flag CS112 if it exists for a particular student: 1 select s.sno,s.sname,s.age 2 from student s left join take t 3 on (s.sno = t.sno) 4 group by s.sno,s.sname,s.age 5 having max(case when t.cno = 'CS112' 6 then 1 else 0 end) = 0 DB2 and SQL ServerUse a CASE expression with the window function MAX OVER to flag CS112 if it exists for a particular student: 1 select distinct sno,sname,age 2 from ( 3 select s.sno,s.sname,s.age, 4 max(case when t.cno = 'CS112' 5 then 1 else 0 end) 7 over(partition by s.sno,s.sname,s.age) as takes_CS112 9 from student s left join take t 10 on (s.sno = t.sno) 11 ) x 12 where takes_CS112 = 0 OracleFor users on Oracle9i Database and later, you can use the DB2 solution above. Alternatively, you can use the proprietary Oracle outer-join syntax, which is mandatory for users on Oracle8i Database and earlier: /* group by solution */ 1 select s.sno,s.sname,s.age 2 from student s, take t 3 where s.sno = t.sno (+) 4 group by s.sno,s.sname,s.age 5 having max(case when t.cno = 'CS112' 6 then 1 else 0 end) = 0 /* window solution */ 1 select distinct sno,sname,age 2 from ( 3 select s.sno,s.sname,s.age, 4 max(case when t.cno = 'CS112' 5 then 1 else 0 end) 7 over(partition by s.sno,s.sname,s.age) as takes_CS112 9 from student s, take t 10 where s.sno = t.sno (+) 11 ) x 12 where takes_CS112 = 0 DiscussionDespite the different syntax for each solution, the technique is the same. The idea is to create a "Boolean" column in the result set to denote whether or not a student takes CS112. If a student takes CS112, then return 1 in that column; otherwise, return 0. The following query moves the CASE expression into the SELECT list and shows the intermediate results thus far: select s.sno,s.sname,s.age, case when t.cno = 'CS112' then 1 else 0 end as takes_CS112 from student s left join take t on (s.sno=t.sno) SNO SNAME AGE TAKES_CS112 --- ---------- ---------- ----------- 1 AARON 20 1 1 AARON 20 0 1 AARON 20 0 2 CHUCK 21 1 3 DOUG 20 1 3 DOUG 20 0 4 MAGGIE 19 1 4 MAGGIE 19 0 5 STEVE 22 0 6 JING 18 0 6 JING 18 0 8 KAY 20 0 10 CHAD 21 0 7 BRIAN 21 0 9 GILLIAN 20 0 The outer join to table TAKE ensures that even students who take no courses are returned. The next step is to use MAX to take the greatest value returned by the CASE expression for each student. If a student takes CS112, the greatest value will be 1, because all other courses are 0. For the solution using GROUP BY, the final step is to use the HAVING clause to keep only students with 0 returned from the MAX/CASE expression. For the window solution, you need to wrap the query in an inline view and then reference TAKES_CS112, because window functions cannot be referenced directly in the WHERE clause. Because of how window functions work, it is also necessary to remove duplicates caused by multiple courses. Original solutionThe original solution to this problem is quite clever and is shown here: select * from student where sno not in (select sno from take where cno = 'CS112') This can be stated as: "Find the students in table TAKE who take CS112, and then return all students in table STUDENT who are not them." This technique follows the advice regarding negation found at the end of Rozenshtein's book:
Question 2You want to find students who take CS112 or CS114 but not both. The following query looks promising as a solution but returns the wrong result set: select * from student where sno in ( select sno from take where cno != 'CS112' and cno != 'CS114' ) Of the students who take courses, only students DOUG and AARON take both CS112 and CS114. Those two should be excluded. Student STEVE takes CS113, but not CS112 or CS114, and should be excluded as well. Because a student can take multiple courses, the approach here is to return a single row for each student with information regarding whether the student takes CS112 or CS114, or both. This approach allows you to easily evaluate whether or not the student takes both courses without having to make multiple passes through the data. The final result set should be: SNO SNAME AGE --- ---------- ---------- 2 CHUCK 21 4 MAGGIE 19 6 JING 18 MySQL and PostgreSQLUse a CASE expression with the aggregate function SUM to find students who take either CS112 or CS114 but not both: 1 select s.sno,s.sname,s.age 2 from student s, take t 3 where s.sno = t.sno 4 group by s.sno,s.sname,s.age 5 having sum(case when t.cno in ('CS112','CS114') 6 then 1 else 0 end) = 1 DB2, Oracle, and SQL ServerUse a CASE expression with the window function SUM OVER to find students who take either CS112 or CS114 but not both: 1 select distinct sno,sname,age 2 from ( 3 select s.sno,s.sname,s.age, 4 sum(case when t.cno in ('CS112','CS114') then 1 else 0 end) 5 over (partition by s.sno,s.sname,s.age) as takes_either_or 6 from student s, take t 7 where s.sno = t.sno 8 ) x 9 where takes_either_or = 1 DiscussionThe first step in solving the problem is to use an inner join from table STUDENT to table TAKE, thus eliminating any students who do not take any courses. The next step is to use a CASE expression to denote whether a student takes each respective course. In the following query, the CASE expressions are moved into the SELECT list and return the intermediate results thus far: select s.sno,s.sname,s.age, case when t.cno in ('CS112','CS114') then 1 else 0 end as takes_either_or from student s, take t where s.sno = t.sno SNO SNAME AGE TAKES_EITHER_OR --- ---------- --- --------------- 1 AARON 20 1 1 AARON 20 0 1 AARON 20 1 2 CHUCK 21 1 3 DOUG 20 1 3 DOUG 20 1 4 MAGGIE 19 1 4 MAGGIE 19 0 5 STEVE 22 0 6 JING 18 0 6 JING 18 1 A value of 1 for TAKES_EITHER_OR signifies the student takes CS112 or CS114. Because a student can take multiple courses, the next step is to return only one row per student by using a GROUP BY with the aggregate function SUM. The function SUM will sum all the 1's for each student: select s.sno,s.sname,s.age, sum(case when t.cno in ('CS112','CS114') then 1 else 0 end) as takes_either_or from student s, take t where s.sno = t.sno group by s.sno,s.sname,s.age SNO SNAME AGE TAKES_EITHER_OR --- ---------- --- --------------- 1 AARON 20 2 2 CHUCK 21 1 3 DOUG 20 2 4 MAGGIE 19 1 5 STEVE 22 0 6 JING 18 1 Students who do not take CS112 or CS114 will have 0 for TAKES_EITHER_OR. Students who take both CS112 and CS114 will have 2 for TAKES_EITHER_OR. Thus the only students you want to return are those with a value of 1 for TAKES_EITHER_OR. The final solution uses the HAVING clause to keep only those students where the SUM of TAKES_EITHER_OR is one. For the window solution, the same technique is used. You also need to wrap the query in an inline view, and then reference the column TAKES_EITHER_OR, because window functions cannot be referenced directly in the WHERE clause (they are evaluated last in SQL processing, prior only to the ORDER BY clause). Because of how window functions work, it is necessary to remove duplicates caused by multiple courses. Original solutionThe following query is the original solution (modified slightly). The query is quite clever and uses the same approach as the original solution in Question 1. The solution uses a self join to find students who take both CS112 and CS114, and then uses a subquery to filter them out of the set of students who take either CS112 or CS114: select * from student s, take t where s.sno = t.sno and t.cno in ( 'CS112', 'CS114' ) and s.sno not in ( select a.sno from take a, take b where a.sno = b.sno and a.cno = 'CS112' and b.cno = 'CS114' ) Question 3You want to find students who take CS112 and no other courses, but the following query returns incorrect results: select s.* from student s, take t where s.sno = t.sno and t.cno = 'CS112' CHUCK is the only student who takes CS112 and no other courses, and is the only student that should be returned from the query. This question can be restated as "Find students who take only CS112." The query above finds students who take CS112, but also returns students who take other courses as well. The query should answer the question "Who takes only one course and that one course is CS112?" MySQL and PostgreSQLUse the aggregate function COUNT to ensure that students returned by the query take only one course: 1 select s.* 2 from student s, 3 take t1, 4 ( 5 select sno 6 from take 7 group by sno 8 having count(*) = 1 9 ) t2 10 where s.sno = t1.sno 11 and t1.sno = t2.sno 12 and t1.cno = 'CS112' DB2, Oracle, and SQL ServerUse the window function COUNT OVER to ensure a student takes only one course: 1 select sno,sname,age 2 from ( 3 select s.sno,s.sname,s.age,t.cno, 4 count(t.cno) over ( 5 partition by s.sno,s.sname,s.age 6 ) as cnt 7 from student s, take t 8 where s.sno = t.sno 9 ) x 10 where cnt = 1 11 and cno = 'CS112' DiscussionThe key to the solutions is to write a query to answer both of the following questions: "Which student takes only one course?" and "Which student takes CS112?" The first approach uses inline view T2 to find students who take only one course. The next step is to join inline view T2 to table TAKE and keep only students who take CS112 (so what you are left with are students who take only one course and that one course is CS112). The query below shows the results thus far: select t1.* from take t1, ( select sno from take group by sno having count(*) = 1 ) t2 where t1.sno = t2.sno and t1.cno = 'CS112' SNO CNO --- ----- 2 CS112 The final step is to join to table STUDENT and find the students who match those returned by the join between inline view T2 and table TAKE. The window solution takes a similar approach but does so in a different (more efficient) way. Inline view X returns the students, the courses they take, and the number of courses they take (the inner join between table TAKE and table STUDENT guarantees that students who take no courses are excluded). The results are shown below: select s.sno,s.sname,s.age,t.cno, count(t.cno) over ( partition by s.sno,s.sname,s.age ) as cnt from student s, take t where s.sno = t.sno SNO SNAME AGE CNO CNT --- ---------- ---------- ----- ---------- 1 AARON 20 CS112 3 1 AARON 20 CS113 3 1 AARON 20 CS114 3 2 CHUCK 21 CS112 1 3 DOUG 20 CS112 2 3 DOUG 20 CS114 2 4 MAGGIE 19 CS112 2 4 MAGGIE 19 CS113 2 5 STEVE 22 CS113 1 6 JING 18 CS113 2 6 JING 18 CS114 2 With the course and count available, the last step is to simply keep only rows such that CNT is 1 and CNO is CS112. Original solutionThe original solution uses a subquery and double negation: select s.* from student s, take t where s.sno = t.sno and s.sno not in ( select sno from take where cno != 'CS112' ) This is an extremely clever solution, because nowhere in the query is the number of courses checked, nor is there a filter to ensure that students returned by the query actually take CS112! How does this work, then? The subquery returns all students who take a course other than CS112 and the results are shown below: select sno from take where cno != 'CS112' SNO ---- 1 1 3 4 5 6 6 The outer query returns all students who take a course (any course) and are not amongst the students returned by the subquery. Ignoring the NOT IN portion of the outer query for a moment, the results would be the following (showing all students who take a course): select s.* from student s, take t where s.sno = t.sno SNO SNAME AGE --- ---------- ---------- 1 AARON 20 1 AARON 20 1 AARON 20 2 CHUCK 21 3 DOUG 20 3 DOUG 20 4 MAGGIE 19 4 MAGGIE 19 5 STEVE 22 6 JING 18 6 JING 18 If you compare the two results sets, you see that the addition of NOT IN to the outer query effectively performs a set difference between SNO from the outer query and SNO from the subquery, returning only the student whose SNO is 2. In summary, the subquery finds all students who take a course that is not CS112. The outer query returns all students who are not amongst those that take a course other than CS112 (at this point the only available students are those who actually take CS112 or take nothing at all). The join between table STUDENT and table TAKE filters out the students who do not take any classes at all, leaving you only with the student who takes CS112 and only CS112. Set-based problem solving at its best! |