Section 2.5. Implementing Partial Intersection

   

2.5 Implementing Partial Intersection

2.5.1 Problem

You have a set of sets. You want to find the elements that represent intersections between those sets, and then you want to count the number of sets to which each of those elements belongs. The intersections can be partial. In other words, it is not necessary for an element to be present in all sets. However, you do want to specify a threshold, in terms of the number of sets, so the query results will exlude an element that falls below it. As an example of this type of problem, you want to list all term papers and show how many students have submitted each one of them.

2.5.2 Solution

The solution to this problem is fairly trivial. You have to count the number of term papers that occurs in the table, so you can use a GROUP BY query as shown in the following example:

 SELECT CourseId, TermPaper, COUNT(*) NumStudents FROM Students GROUP BY TermPaper, CourseId ORDER BY  COUNT(*) DESC 

This query will return the submission count for each term paper. If you want to impose a threshold ” say that you only care about term papers that have been turned in by at least two students ” you can add a HAVING clause as follows :

 SELECT CourseId, TermPaper, COUNT(*) NumStudents FROM Students GROUP BY TermPaper, CourseId HAVING COUNT(*) >= 2 ORDER BY COUNT(*) DESC 

Run against the sample data provided with this chapter, this query will produce the following result:

 CourseId             TermPaper   NumStudents  -------------------- ----------- -----------  ACCN101              2           3 ACCN101              1           2 ACCN101              3           2 MGMT120              1           2 MGMT120              2           2 MGMT120              3           2 

2.5.3 Discussion

The GROUP BY query takes the rows from the Students table and sorts them into groups based on term papers. Term papers are uniquely identified by a combination of course ID and term paper number. The COUNT(*) in the SELECT list causes the rows in each group to be counted. The result is the number of times each paper has been submitted.

Strictly speaking, if a count is less than 2, then it doesn't represent an intersection at all. Either nobody has written the term paper yet or only one person has written it. If you need a mathematically correct intersection, specify a minimum count of 2 in the HAVING clause. That way you will only see term papers that fall into at least two sets. The following query, for example, returns the intersection between the set of term papers written by Andrew and the set written by Cindy:

 SELECT CourseId, TermPaper FROM Students WHERE StudentName IN ('Andrew','Cindy') GROUP BY TermPaper, CourseId HAVING COUNT(*) >= 2 ORDER BY  COUNT(*) DESC 

The way that this works is that the WHERE clause restricts the query to only those papers written by either Andrew or Cindy. The GROUP BY then sorts and counts by paper. A count of 2 indicates that both Andrew and Cindy must have submitted a paper. The HAVING clause further restricts the query's results to only those term papers.

All of the queries shown in this recipe are predicated on the assumption that a student cannot submit the same term paper twice or, at least, that such a double submission will not be recorded twice in the Students table.

   


Transact-SQL Cookbook
Transact-SQL Cookbook
ISBN: 1565927567
EAN: 2147483647
Year: 2005
Pages: 152

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net