RecipeB.2.Answering Questions Involving Negation


Recipe B.2. Answering Questions Involving Negation

In his book, Rozenshtein approached the teaching of SQL through an examination of the different types of fundamental problems that you are often called upon to solve, in one form or another. Negation is one such type. It is often necessary to find rows for which some condition is not true. Simple conditions are easy but, as the following questions show, some negation problems require a bit of creativity and thought to solve.

Question 1

You want to find students who do not take CS112, but the following query is returning the wrong results:

 select *   from student  where sno in ( select sno   from take  where cno != 'CS112' ) 

Because a student may take several courses, this query can (and does) return students who take CS112. The query is incorrect because it does not answer the question: "Who does not take CS112?" Instead, it answers the question "Who takes a course that is not CS112?" The correct result set should include students who take no courses as well as students who take courses but none of them CS112. Ultimately, you should return the following result set:

 SNO   SNAME AGE --------- ---------- ---------- 5 STEVE  22 6 JING  18 7 BRIAN  21 8 KAY  20 9 GILLIAN  20    10 CHAD  21 

MySQL and PostgreSQL

Use a CASE expression with the aggregate function MAX to flag CS112 if it exists for a particular student:

 1 select s.sno,s.sname,s.age 2 from student s left join take t 3   on (s.sno = t.sno) 4  group by s.sno,s.sname,s.age 5 having max(case when t.cno = 'CS112' 6   then 1 else 0 end) = 0 

DB2 and SQL Server

Use a CASE expression with the window function MAX OVER to flag CS112 if it exists for a particular student:

 1 select distinct sno,sname,age 2 from ( 3 select s.sno,s.sname,s.age, 4  max(case when t.cno = 'CS112' 5   then 1 else 0 end) 7  over(partition by s.sno,s.sname,s.age) as takes_CS112 9 from student s left join take t    10   on (s.sno = t.sno)    11  ) x    12  where takes_CS112 = 0 

Oracle

For users on Oracle9i Database and later, you can use the DB2 solution above. Alternatively, you can use the proprietary Oracle outer-join syntax, which is mandatory for users on Oracle8i Database and earlier:

 /* group by solution */ 1 select s.sno,s.sname,s.age 2 from student s, take t 3  where s.sno = t.sno (+) 4  group by s.sno,s.sname,s.age 5 having max(case when t.cno = 'CS112' 6   then 1 else 0 end) = 0 /* window solution */  1 select distinct sno,sname,age  2  from (  3 select s.sno,s.sname,s.age,  4   max(case when t.cno = 'CS112'  5    then 1 else 0 end)  7   over(partition by s.sno,s.sname,s.age) as takes_CS112  9  from student s, take t 10  where s.sno = t.sno (+) 11   ) x 12  where takes_CS112 = 0 

Discussion

Despite the different syntax for each solution, the technique is the same. The idea is to create a "Boolean" column in the result set to denote whether or not a student takes CS112. If a student takes CS112, then return 1 in that column; otherwise, return 0. The following query moves the CASE expression into the SELECT list and shows the intermediate results thus far:

 select s.sno,s.sname,s.age,    case when t.cno = 'CS112' then 1 else 0    end as takes_CS112   from student s left join take t on (s.sno=t.sno) SNO SNAME   AGE TAKES_CS112 --- ---------- ---------- ----------- 1   AARON    20 1 1 AARON    20 0 1 AARON    20 0 2 CHUCK    21 1 3 DOUG    20 1 3 DOUG    20 0 4 MAGGIE    19 1 4 MAGGIE    19 0 5 STEVE    22 0 6 JING    18 0 6 JING    18 0 8 KAY    20 0    10 CHAD    21 0 7 BRIAN    21 0 9 GILLIAN    20 0 

The outer join to table TAKE ensures that even students who take no courses are returned. The next step is to use MAX to take the greatest value returned by the CASE expression for each student. If a student takes CS112, the greatest value will be 1, because all other courses are 0. For the solution using GROUP BY, the final step is to use the HAVING clause to keep only students with 0 returned from the MAX/CASE expression. For the window solution, you need to wrap the query in an inline view and then reference TAKES_CS112, because window functions cannot be referenced directly in the WHERE clause. Because of how window functions work, it is also necessary to remove duplicates caused by multiple courses.

Original solution

The original solution to this problem is quite clever and is shown here:

 select *    from student  where sno not in (select sno from take where cno = 'CS112') 

This can be stated as: "Find the students in table TAKE who take CS112, and then return all students in table STUDENT who are not them." This technique follows the advice regarding negation found at the end of Rozenshtein's book:

Remember that real negation requires two passes: To find out "who does not," first find out "who does" and then get rid of them.

Question 2

You want to find students who take CS112 or CS114 but not both. The following query looks promising as a solution but returns the wrong result set:

 select *   from student  where sno in ( select sno   from take  where cno != 'CS112'    and cno != 'CS114' ) 

Of the students who take courses, only students DOUG and AARON take both CS112 and CS114. Those two should be excluded. Student STEVE takes CS113, but not CS112 or CS114, and should be excluded as well.

Because a student can take multiple courses, the approach here is to return a single row for each student with information regarding whether the student takes CS112 or CS114, or both. This approach allows you to easily evaluate whether or not the student takes both courses without having to make multiple passes through the data. The final result set should be:

 SNO SNAME   AGE --- ---------- ---------- 2 CHUCK    21 4 MAGGIE    19 6 JING    18 

MySQL and PostgreSQL

Use a CASE expression with the aggregate function SUM to find students who take either CS112 or CS114 but not both:

 1 select s.sno,s.sname,s.age 2 from student s, take t 3  where s.sno = t.sno 4  group by s.sno,s.sname,s.age 5 having sum(case when t.cno in ('CS112','CS114') 6   then 1 else 0 end) = 1 

DB2, Oracle, and SQL Server

Use a CASE expression with the window function SUM OVER to find students who take either CS112 or CS114 but not both:

 1 select distinct sno,sname,age 2 from ( 3 select s.sno,s.sname,s.age, 4  sum(case when t.cno in ('CS112','CS114') then 1 else 0 end) 5  over (partition by s.sno,s.sname,s.age) as takes_either_or 6 from student s, take t 7  where s.sno = t.sno 8  ) x 9  where takes_either_or = 1 

Discussion

The first step in solving the problem is to use an inner join from table STUDENT to table TAKE, thus eliminating any students who do not take any courses. The next step is to use a CASE expression to denote whether a student takes each respective course. In the following query, the CASE expressions are moved into the SELECT list and return the intermediate results thus far:

 select s.sno,s.sname,s.age,    case when t.cno in ('CS112','CS114') then 1 else 0 end as takes_either_or   from student s, take t  where s.sno = t.sno SNO SNAME    AGE TAKES_EITHER_OR --- ---------- --- --------------- 1 AARON 20  1 1 AARON 20  0 1 AARON 20  1 2 CHUCK 21  1 3 DOUG 20  1 3 DOUG 20  1 4 MAGGIE 19  1 4 MAGGIE 19  0 5 STEVE 22  0 6 JING 18  0 6 JING 18  1 

A value of 1 for TAKES_EITHER_OR signifies the student takes CS112 or CS114. Because a student can take multiple courses, the next step is to return only one row per student by using a GROUP BY with the aggregate function SUM. The function SUM will sum all the 1's for each student:

 select s.sno,s.sname,s.age,    sum(case when t.cno in ('CS112','CS114') then 1 else 0 end) as takes_either_or   from student s, take t  where s.sno = t.sno  group by s.sno,s.sname,s.age SNO SNAME    AGE TAKES_EITHER_OR --- ---------- --- ---------------   1 AARON 20  2   2 CHUCK 21  1   3 DOUG 20  2   4 MAGGIE 19  1   5 STEVE 22  0   6 JING 18  1 

Students who do not take CS112 or CS114 will have 0 for TAKES_EITHER_OR. Students who take both CS112 and CS114 will have 2 for TAKES_EITHER_OR. Thus the only students you want to return are those with a value of 1 for TAKES_EITHER_OR. The final solution uses the HAVING clause to keep only those students where the SUM of TAKES_EITHER_OR is one.

For the window solution, the same technique is used. You also need to wrap the query in an inline view, and then reference the column TAKES_EITHER_OR, because window functions cannot be referenced directly in the WHERE clause (they are evaluated last in SQL processing, prior only to the ORDER BY clause). Because of how window functions work, it is necessary to remove duplicates caused by multiple courses.

Original solution

The following query is the original solution (modified slightly). The query is quite clever and uses the same approach as the original solution in Question 1. The solution uses a self join to find students who take both CS112 and CS114, and then uses a subquery to filter them out of the set of students who take either CS112 or CS114:

 select *   from student s, take t  where s.sno = t.sno    and t.cno in ( 'CS112', 'CS114' )    and s.sno not in ( select a.sno   from take a, take b  where a.sno = b.sno    and a.cno = 'CS112'    and b.cno = 'CS114' ) 

Question 3

You want to find students who take CS112 and no other courses, but the following query returns incorrect results:

 select s.*   from student s, take t  where s.sno = t.sno    and t.cno = 'CS112' 

CHUCK is the only student who takes CS112 and no other courses, and is the only student that should be returned from the query.

This question can be restated as "Find students who take only CS112." The query above finds students who take CS112, but also returns students who take other courses as well. The query should answer the question "Who takes only one course and that one course is CS112?"

MySQL and PostgreSQL

Use the aggregate function COUNT to ensure that students returned by the query take only one course:

 1 select s.* 2 from student s, 3  take t1, 4  ( 5 select sno 6 from take 7  group by sno 8 having count(*) = 1 9  ) t2    10  where s.sno = t1.sno    11    and t1.sno = t2.sno    12  and t1.cno = 'CS112' 

DB2, Oracle, and SQL Server

Use the window function COUNT OVER to ensure a student takes only one course:

 1 select sno,sname,age 2 from ( 3 select s.sno,s.sname,s.age,t.cno, 4  count(t.cno) over ( 5    partition by s.sno,s.sname,s.age 6  ) as cnt 7 from student s, take t 8  where s.sno = t.sno 9  ) x    10  where cnt = 1    11  and cno = 'CS112' 

Discussion

The key to the solutions is to write a query to answer both of the following questions: "Which student takes only one course?" and "Which student takes CS112?" The first approach uses inline view T2 to find students who take only one course. The next step is to join inline view T2 to table TAKE and keep only students who take CS112 (so what you are left with are students who take only one course and that one course is CS112). The query below shows the results thus far:

 select t1.*   from take t1,    ( select sno   from take  group by sno having count(*) = 1    ) t2  where t1.sno = t2.sno    and t1.cno = 'CS112' SNO CNO --- -----   2 CS112 

The final step is to join to table STUDENT and find the students who match those returned by the join between inline view T2 and table TAKE. The window solution takes a similar approach but does so in a different (more efficient) way. Inline view X returns the students, the courses they take, and the number of courses they take (the inner join between table TAKE and table STUDENT guarantees that students who take no courses are excluded). The results are shown below:

 select s.sno,s.sname,s.age,t.cno,    count(t.cno) over (      partition by s.sno,s.sname,s.age    ) as cnt   from student s, take t  where s.sno = t.sno SNO SNAME   AGE CNO    CNT --- ---------- ---------- ----- ----------   1 AARON    20 CS112  3   1 AARON    20 CS113  3   1 AARON    20 CS114  3   2 CHUCK    21 CS112  1   3 DOUG    20 CS112  2   3 DOUG    20 CS114  2   4 MAGGIE    19 CS112  2   4 MAGGIE    19 CS113  2   5 STEVE    22 CS113  1   6 JING    18 CS113  2   6 JING    18 CS114  2 

With the course and count available, the last step is to simply keep only rows such that CNT is 1 and CNO is CS112.

Original solution

The original solution uses a subquery and double negation:

 select s.*   from student s, take t  where s.sno = t.sno    and s.sno not in ( select sno from take    where cno != 'CS112' ) 

This is an extremely clever solution, because nowhere in the query is the number of courses checked, nor is there a filter to ensure that students returned by the query actually take CS112! How does this work, then? The subquery returns all students who take a course other than CS112 and the results are shown below:

 select sno   from take  where cno != 'CS112' SNO ----    1    1    3    4    5    6    6 

The outer query returns all students who take a course (any course) and are not amongst the students returned by the subquery. Ignoring the NOT IN portion of the outer query for a moment, the results would be the following (showing all students who take a course):

 select s.*   from student s, take t  where s.sno = t.sno SNO SNAME   AGE --- ---------- ----------   1 AARON    20   1 AARON    20   1 AARON    20   2 CHUCK    21   3 DOUG    20   3 DOUG    20   4 MAGGIE    19   4 MAGGIE    19   5 STEVE    22   6 JING    18   6 JING    18 

If you compare the two results sets, you see that the addition of NOT IN to the outer query effectively performs a set difference between SNO from the outer query and SNO from the subquery, returning only the student whose SNO is 2. In summary, the subquery finds all students who take a course that is not CS112. The outer query returns all students who are not amongst those that take a course other than CS112 (at this point the only available students are those who actually take CS112 or take nothing at all). The join between table STUDENT and table TAKE filters out the students who do not take any classes at all, leaving you only with the student who takes CS112 and only CS112. Set-based problem solving at its best!




SQL Cookbook
SQL Cookbook (Cookbooks (OReilly))
ISBN: 0596009763
EAN: 2147483647
Year: 2005
Pages: 235

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net