Section 2.18. Dynamic Classification System

2.18 Dynamic Classification System

2.18.1 Problem

Define a query-processing framework that will classify sets according to rules that you can define and change dynamically. The number of rules is unlimited. As an example, let's say that a special credit system has been introduced at our university as part of a new cross-disciplinary program introduced by a new dean. Students do not get credit points directly for courses they have finished, but instead get credit for different combinations of term papers. A given term paper combination does not need to represent one course and, indeed, is most likely to represent several courses. There are different categories of credits, and the categories are marked by type codes such as A, B, and C. To finish the program, a student must earn at least one credit point in each category. Table 2-2 shows the requirement matrix that controls whether or not credit can be granted in a given category.

Table 2-2. Category credit requirement matrix

Course	Paper 1	Paper 2	Paper 3
ACCN101	A,C2	A,C2	A,B
MGMT120	C1	C1	C1,B

To earn an A credit point, a student must complete the first, second, and third term paper in the ACCN101 course. There are two ways to earn a C credit point. One is to submit the first, second, and third term papers for MGMT120. The other way to earn a C credit point is to submit the first and second term papers for ACCN101.

2.18.2 Solution

This problem can not be solved directly by just applying one query. You first need to create an additional table where you store the rules. Then you can write a query that looks at both the rules and at the actual term papers that students have submitted. That query can apply the rules to the actual results and determine the proper credits to grant in each category.

2.18.2.1 Step 1: Creating the rules table

The following table can be used to store the rules for our example scenario:

 CREATE TABLE CreditRules(    RuleId INTEGER,    TermId INTEGER,    CourseId CHAR(20),    TermPaper INTEGER )

After creating the table, you need to populate it with the credit rules. As you might have observed , the rules can be directly transformed into the following Boolean logic expressions:

 Rule1: Acc1 AND Acc2 AND Acc3 Rule2: Acc3 AND Mgm3 Rule3: (Mgm1 AND Mgm2 AND Mgm3) OR (Acc1 AND Acc2)

The following data shows the representation in the CreditRules table of the rules described in Table 2-2. Each rule is identified by a unique RuleId number. Within each rule, each term is identified by a TermId number.

 RuleId  TermId  CourseId  TermPaper    ------- ------- --------- -----------  1       1       ACCN101   1 1       1       ACCN101   2 1       1       ACCN101   3 2       1       ACCN101   3 2       1       MGMT120   3 3       1       MGMT120   1 3       1       MGMT120   2 3       1       MGMT120   3 3       2       ACCN101   1 3       2       ACCN101   2

One of the rules in Table 2-2 says that to get an A credit, you must submit term papers 1 through 3 from the ACCN 101 course. You can see that rule reflected in the data as rule 1. Notice that there is nothing in the Credit Rules table to link a rule to a specific credit category. The logic to do that resides in the query that you will read about next .

Before proceeding, it's important to understand the use of the terms rule and term in this recipe. Rules are composed of terms. A term is a set of conditions that, when met, indicates that the rule has been satisfied. Each rule in our scenario corresponds to a specific credit category. Only rule 3 has more than one term. Rule 3 states that there are two different ways to get a C credit. Term 1 says that you can get a C credit by submitting the first three term papers for MGMT120. Term 2 says that you can get a C credit by submitting the first two term papers for ACCT101. Think of the terms as being combined with OR together. Satisfy any term and you've satisfied the rule. Think of the elements of a term as being combined with AND together. To satisfy a term, you must satisfy all the elements.

2.18.2.2 Step 2: Running the query

Once the CreditRules table has been created and the rules have been defined, you can use the following query to compute each student's entitled credits:

 SELECT DISTINCT s.StudentName,    (CASE WHEN c.RuleId=1 THEN 'A'        WHEN c.RuleId=2 THEN 'B'        WHEN c.RuleId=3 THEN 'C' END) credit FROM Students s JOIN CreditRules c    ON s.CourseId=c.CourseId AND s.TermPaper=c.TermPaper GROUP BY c.RuleId, c.TermId, s.StudentName HAVING COUNT(*)=(SELECT COUNT(*)            FROM CreditRules AS c1            WHERE c.RuleId=c1.RuleId AND c.TermId=c1.TermId) ORDER BY StudentName

The results from executing this query will resemble the following:

 StudentName  credit  ------------ -------  Andrew       B Andrew       C Bert         A Bert         C Cindy        C

2.18.3 Discussion

As you can see, this solution is dependent on the CreditRules table to store the rules. One advantage of storing the rules in a table like this is that you can easily change the rules without changing the query that assigns the credits. While it would be relatively trivial to embed the rules directly into a query, changes to the rules would then require changes to the query. If the query were embedded within an application, then a change to the query would necessitate recompiling and retesting that application. By using a table-driven approach, you gain the flexibility of changing the rules on the fly without having to do any extra recompiling and retesting.

You'll find a table-driven approach, such as that shown here, to be an efficient solution that can be applied to large record sets. You can deal with a large number of term papers, a large number of rules, or both. It's also a fairly easy task to write an application to allow users to manage the rules themselves , thus saving you work in the long run.

2.18.3.1 The query

Let's begin our analysis of the query by looking at the basic SELECT statement:

 SELECT DISTINCT StudentName,    (CASE WHEN c.RuleId=1 THEN 'A'        WHEN c.RuleId=2 THEN 'B'        WHEN c.RuleId=3 THEN 'C' END) credit FROM Students s, CreditRules c    ON s.CourseId=c.CourseId AND s.TermPaper=c.TermPaper

The Students and the CreditRules table are joined to generate a temporary result set where rows from the Students table are linked with all possible rule terms that could be applied to those records.

The DISTINCT clause is used because it's possible that a student may qualify for a credit in more than one way. Table 2-2, for example, shows two ways to qualify for a C credit. We don't care if a student qualifies for a credit more than once. It's enough to know that the student qualifies at all, so DISTINCT is used to prevent duplicate credits from being reported .

The CASE statement is included in the query to link each rule to a specific credit. You can easily see that the satisfaction of rule 1 results in an A credit. The CASE statement not only makes the results more readable, it insulates the program using this query from needing to know which rule numbers correspond to which credit categories. You could later add a rule number 4 that also corresponds to an A credit, and the program executing the query would not know the difference.

Now, let's look at the GROUP BY part of the statement:

 GROUP BY c.RuleId, c.TermId, s.StudentName

This grouping combines the possible matches from one rule, one term, and one student together. If the group represents a full match, having all term requirements present in the student's record, the student can receive the corresponding credit point.

The HAVING clause in the query is where the check is made to see if all the terms of a rule have been met. Compliance with a rule is indicated when all rows for one of the rule's terms are matched by a Students table record for a given student. To determine if all terms have been met, count the terms and compare that value to the number of matches for a given student. Here is where the HAVING clause comes to our aid. The following HAVING clause does this comparison for us:

 HAVING COUNT(*)=(SELECT COUNT(*)            FROM CreditRules AS c1            WHERE c.RuleId=c1.RuleId AND c.TermId=c1.TermId)

The expression in the HAVING clause is evaluated for each group ” in other words, for each rule, term, and student combination. The subquery retrieves the total number of terms defined in the rule, and the result is compared against the number of rows in the group. Each row in the group represents a match; if the counts are the same, the conditions of the term are satisfied. The student, therefore, should get the corresponding credit.