Recipe14.16.Testing for Existence of a Value Within a Group


Recipe 14.16. Testing for Existence of a Value Within a Group

Problem

You want to create a Boolean flag for a row depending on whether or not any row in its group contains a specific value. Consider an example of a student who has taken a certain number of exams during a period of time. A student will take three exams over three months. If a student passes one of these exams, the requirement is satisfied and a flag should be returned to express that fact. If a student did not pass any of the three tests in the three month period, then an additional flag should be returned to express that fact as well. Consider the following example (using Oracle syntax to make up rows for this example; minor modifications are necessary for DB2 and SQL Server, because both support window functions):

 create view V as select 1 student_id,        1 test_id,        2 grade_id,        1 period_id,        to_date('02/01/2005','MM/DD/YYYY') test_date,        0 pass_fail   from dual union all select 1, 2, 2, 1, to_date('03/01/2005','MM/DD/YYYY'), 1 from dual union all select 1, 3, 2, 1, to_date('04/01/2005','MM/DD/YYYY'), 0 from dual union all select 1, 4, 2, 2, to_date('05/01/2005','MM/DD/YYYY'), 0 from dual union all select 1, 5, 2, 2, to_date('06/01/2005','MM/DD/YYYY'), 0 from dual union all select 1, 6, 2, 2, to_date('07/01/2005','MM/DD/YYYY'), 0 from dual select *   from V STUDENT_ID TEST_ID GRADE_ID PERIOD_ID TEST_DATE   PASS_FAIL ---------- ------- -------- --------- ----------- ---------           1       1        2        1 01-FEB-2005         0           1       2        2        1 01-MAR-2005         1           1       3        2        1 01-APR-2005         0           1       4        2        2 01-MAY-2005         0           1       5        2        2 01-JUN-2005         0           1       6        2        2 01-JUL-2005         0 

Examining the result set above, you see that the student has taken six tests over two, three-month periods. The student has passed one test (1 means "pass"; 0 means "fail"), thus the requirement is satisfied for the entire first period. Because the student did not pass any exams during the second period (the next three months), PASS_FAIL is 0 for all three exams. You want to return a result set that highlights whether or not a student has passed a test for a given period. Ultimately you want to return the following result set:

 STUDENT_ID TEST_ID GRADE_ID PERIOD_ID TEST_DATE   METREQ IN_PROGRESS ---------- ------- -------- --------- ----------- ------ -----------          1       1        2         1 01-FEB-2005      +           0          1       2        2         1 01-MAR-2005      +           0          1       3        2         1 01-APR-2005      +           0          1       4        2         2 01-MAY-2005      -           0          1       5        2         2 01-JUN-2005      -           0          1       6        2         2 01-JUL-2005      -           1 

The values for METREQ ("met requirement") are + and -, signifying the student either has or has not satisfied the requirement of passing at least one test in a period (three-month span), respectively. The value for IN_PROGRESS should be 0 if a student has already passed a test in a given period. If a student has not passed a test for a given period, then the row that has the latest exam date for that student will have a value of 1 for IN_PROGRESS.

Solution

What makes this problem a bit tricky is the fact that you have to treat rows in a group as a group and not as individuals. Consider the values for PASS_FAIL in the problem section. If you evaluate row by row, it would seem that the value for METREQ for each row except TEST_ID 2 should be "-", when in fact that is not the case. You must ensure you evaluate the rows as a group. By using the window function MAX OVER you can easily determine whether or not a student passed at least one test during a particular period. Once you have that information, the "Boolean" values are a simple matter of using CASE expressions:

  1  select student_id,  2         test_id,  3         grade_id,  4         period_id,  5         test_date,  6         decode( grp_p_f,1,lpad('+',6),lpad('-',6) ) metreq,  7         decode( grp_p_f,1,0,  8                 decode( test_date,last_test,1,0 ) ) in_progress  9   from ( 10 select V.*, 11        max(pass_fail)over(partition by 12                      student_id,grade_id,period_id) grp_p_f, 13        max(test_date)over(partition by 14                      student_id,grade_id,period_id) last_test 15   from V 16        ) x 

Discussion

The key to the solution is using the window function MAX OVER to return the greatest value of PASS_FAIL for each group. Because the values for PASS_FAIL are only 1 or 0, if a student passed at least one exam, then MAX OVER would return 1 for the entire group. How this works is shown below:

 select V.*,        max(pass_fail)over(partition by                       student_id,grade_id,period_id) grp_pass_fail   from V STUDENT_ID TEST_ID GRADE_ID PERIOD_ID TEST_DATE   PASS_FAIL GRP_PASS_FAIL ---------- ------- -------- --------- ----------- --------- -------------          1       1        2         1 01-FEB-2005         0             1          1       2        2         1 01-MAR-2005         1             1          1       3        2         1 01-APR-2005         0             1          1       4        2         2 01-MAY-2005         0             0          1       5        2         2 01-JUN-2005         0             0          1       6        2         2 01-JUL-2005         0             0 

The result set above shows that the student passed at least one test during the first period, thus the entire group has a value of 1 or "pass." The next requirement is that if the student has not passed any tests in a period, return a value of 1 for he IN_ PROGRESS flag for the latest test date in that group. You can use the window function MAX OVER to do this as well:

 select V.*,        max(pass_fail)over(partition by                       student_id,grade_id,period_id) grp_p_f,        max(test_date)over(partition by                       student_id,grade_id,period_id) last_test   from V STUDENT_ID TEST_ID GRADE_ID PERIOD_ID TEST_DATE   PASS_FAIL GRP_P_F LAST_TEST ---------- ------- -------- --------- ----------- --------- ------- -----------          1       1        2         1 01-FEB-2005         0       1 01-APR-2005          1       2        2         1 01-MAR-2005         1       1 01-APR-2005          1       3        2         1 01-APR-2005         0       1 01-APR-2005          1       4        2         2 01-MAY-2005         0       0 01-JUL-2005          1       5        2         2 01-JUN-2005         0       0 01-JUL-2005          1       6        2         2 01-JUL-2005         0       0 01-JUL-2005 

Now that you have determined for which period the student has passed a test and what the latest test date for each period is, the last step is simply a matter of applying some formatting magic to make the result set look nice. The final solution uses Oracle's DECODE function (CASE supporters eat your hearts out) to create the METREQ and IN_PROGRESS columns. Use the LPAD function to right justify the values for METREQ:

 select student_id,        test_id,        grade_id,        period_id,        test_date,        decode( grp_p_f,1,lpad('+',6),lpad('-',6) ) metreq,        decode( grp_p_f,1,0,                decode( test_date,last_test,1,0 ) ) in_progress   from ( select V.*,        max(pass_fail)over(partition by                       student_id,grade_id,period_id) grp_p_f,        max(test_date)over(partition by                       student_id,grade_id,period_id) last_test   from V        ) x STUDENT_ID TEST_ID GRADE_ID PERIOD_ID TEST_DATE   METREQ IN_PROGRESS ---------- ------- -------- --------- ----------- ------ -----------          1       1        2         1 01-FEB-2005      +           0          1       2        2         1 01-MAR-2005      +           0          1       3        2         1 01-APR-2005      +           0          1       4        2         2 01-MAY-2005      -           0          1       5        2         2 01-JUN-2005      -           0          1       6        2         2 01-JUL-2005      -           1 




SQL Cookbook
SQL Cookbook (Cookbooks (OReilly))
ISBN: 0596009763
EAN: 2147483647
Year: 2005
Pages: 235

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net