7.10 Joining Arrays with Master Tables7.10.1 ProblemYou want to join a non-normalized array-like data table with a normalized master record. The results should still be non-normalized, but instead of showing coded values from the detail table, you want them to show the corresponding values from the master table. Taking both the StudentThesis table and the Professors table, you want to join them to replace the member codes with the corresponding professor names. You want to do it so that the structure of the result corresponds to the structure of the StudentThesis table ” you want the resulting rows to have one student ID, three professor names , and one grade. 7.10.2 SolutionThe following SELECT statement reproduces the structure from the non-normalized table (StudentThesis), but replaces the professors ID numbers with their corresponding professor names. Each line of the report will list a student ID together with the names of the three dissertation committee members for that student. SELECT t.StudentId, MAX(CASE WHEN t.Member1=p.id THEN p.Name ELSE NULL END) Member1, MAX(CASE WHEN t.Member2=p.id THEN p.Name ELSE NULL END) Member2, MAX(CASE WHEN t.Member3=p.id THEN p.Name ELSE NULL END) Member3 FROM StudentThesis t, Professors p GROUP BY t.StudentId The results from executing this query will look like these: StudentId Member1 Member2 Member3 ---------- --------------- ---------------- ---------------- 1 Baird, Frances Anderson, Terry Newton, Isaac 2 Newton, Isaac Anderson, Terry Smith, John 3 Newton, Isaac Einstein, Albert Base, Samuel 4 Anderson, Terry Smith, John Einstein, Albert 7.10.3 DiscussionThe query shown in this recipe is driven by the StudentThesis table, and it uses aggregation combined with conditional logic to link each member Id number with its corresponding master record in the Professors table. You may well wonder about the need for aggregation. To explain that, let's start by looking at the following, simpler query: SELECT t.StudentId, p1.Name, p2.Name, P3.Name, t.Grade FROM StudentThesis t JOIN Professors p1 ON t.Member1 = p1.Id JOIN Professors p2 ON t.Member2 = p2.Id JOIN Professors p3 ON t.Member3 = p3.Id This query also returns professor names in place of their Id numbers, but it does so by joining one instance of the Professors table for each member Id column in the StudentThesis table. The large number of table joins, three in this case, will reduce the efficiency of the query. To understand how aggregation can help us, let's take a look at the Cartesian product that you get when you join the StudentThesis and Professor tables together. Here's an excerpt that shows the Cartesian product for one student record: StudentId Member1 Member2 Member3 Grade Id Name --------- ------- ------- ------- ----- --- -------------------- 1 234 322 456 A 123 Smith, John 1 234 322 456 A 456 Newton, Isaac 1 234 322 456 A 455 Einstein, Albert 1 234 322 456 A 344 Base, Samuel 1 234 322 456 A 322 Anderson, Terry 1 234 322 456 A 234 Baird, Frances Because you have all possible combinations of all rows, you are ensured that for any given member Id column from the StudentThesis table, you will have at least one row with the corresponding professor's name from the Professors table. The following CASE expression, which happens to be for the member1 column, takes advantage of that fact to return the appropriate professor name when the Id in the member1 column matches the Id column of the Professors table: CASE WHEN t.Member1=p.Id THEN p.Name ELSE NULL END When there's a match between the two Id values, the professor name is returned by the CASE expression. Otherwise, when there's no match, a NULL is returned. The result is that the Cartesian product now looks like this: studentid member1 member2 member3 grade id name ---------- --------- --------- --------- ------ ---- ----------------- 1 NULL NULL NULL A 123 Smith, John 1 NULL NULL Newto... A 456 Newton, Isaac 1 NULL NULL NULL A 455 Einstein, Albert 1 NULL NULL NULL A 344 Base, Samuel 1 NULL Ander... NULL A 322 Anderson, Terry 1 Baird... NULL NULL A 234 Baird, Frances This is fine, but we really want one row to be returned for each student, not six rows as shown here. We can use aggregation to collapse these six rows into one. The recipe query uses GROUP BY t.StudentId to accomplish that. The name and id columns from the professor table aren't needed, so they aren't included in the select list. The MAX function is used to extract the one relevant value from each member column. Each column has only one non-NULL value. Since aggregate functions ignore NULLs, that one value is the one returned by MAX. While the query using aggregation is more complex than the simple query shown earlier, it's more efficient because only one table-join is needed instead of three. A further advantage is that the number of joins does not increase as the number of member columns increases . |