As humans , we have a finite attention span and finite capabilities for digesting large quantities of data. To deal with that, SQL Server provides several different ways to group and summarize data. You've seen some of the ways to group data and provide basic summary information, such as counts and average values; now you're going to see some other ways to count data and arrange aggregates to make the output easier to read and more palatable to outside applications. NOTE Target Consumer Keep in mind who the target audience is for most of your reports. Usually, it's a manager or someone in marketing. They tend to be swamped with reports , email, messages, meetings ”you get the idea. The more you can summarize the data, the more useful the results will be to a large extent. In addition to the COMPUTE and COMPUTE BY operators, you can also use the CUBE and ROLLUP operators to summarize your data. Take a look at the CUBE operator first. The CUBE OperatorCUBE works in conjunction with a GROUP BY to provide summary data. The CUBE operator puts this data directly into the rowset. Here's an example: SELECT PersonID, ProductID, count(*) as Ct FROM Sales WHERE personID > 4 GROUP BY PersonID, ProductID WITH CUBE This produces the following output: PersonID ProductID Ct ----------- ----------- ----------- 9 37 1 9 38 1 9 NULL 2 10 41 1 10 NULL 1 NULL NULL 3 NULL 37 1 NULL 38 1 NULL 41 1 (9 row(s) affected) The biggest question with a CUBE operator is not how to use it ”because the syntax of WITH CUBE really is all there is to it ”but instead, how do you interpret the output? It's pretty simple if you look at it like this: anywhere there is a NULL , it is a summary. So, the third value in the preceding resultset is PersonID 9 , and ProductID NULL had a count of 2. This means that PersonID 9 across all products had 2 sales. The row that starts NULL NULL is the total summary, which means there were a total of 3 sales covered by the resultset. The row that starts NULL 41 means there was 1 sale of the product with ID 41 . In the case of a count, if you just want to mentally substitute the word "count" where you see NULL , it might make more sense ”at least if you don't have any NULL values that you're counting. You'll see in a couple of sections how to do this programmatically. So, what CUBE does, in effect, is provide total summary information for a query for a given GROUP BY . It shows the information cut across all possible dimensions, with each part of the GROUP BY returning NULL to expose the summary information for the remaining fields. If CUBE does the summary for the entire query for every combination of columns , is there a way to just get the totals for one column? Summarizing Data with ROLLUPROLLUP is similar to CUBE , but it returns fewer results. The CUBE operator returns summary information for each column mentioned in the GROUP BY list from right to left and left to right. The ROLLUP operator returns the summary information for the elements of the GROUP BY from only right to left. For example, if you had these two SELECT statements: SELECT A, B, C, D FROM MyTable GROUP BY A, B, C, D WITH ROLLUP SELECT A, B, C, D FROM MyTable GROUP BY A, B, C, D WITH CUBE The first example would return summary information for all combinations of C and D, combinations of B, C, and D, and combinations of A, B, C, and D. The second SELECT would return the same as the first, but would also return results for A and B, as well as A, B, and C. The example given in the CUBE section received all the combinations of summaries for people who bought things and products that were purchased. Sometimes, though, you want to know only who bought things and the total number of things sold. That's what ROLLUP does: SELECT PersonID, ProductID, count(*) as Ct FROM Sales WHERE personID > 4 GROUP BY PersonID, ProductID WITH ROLLUP Which yields this resultset: PersonID ProductID Ct ----------- ----------- ----------- 9 37 1 9 38 1 9 NULL 2 10 41 1 10 NULL 1 NULL NULL 3 (6 row(s) affected) In the third row, you can see the ProductID is NULL , and the count is 2, which is the total for PersonID 9 . The ROLLUP operator works just like the CUBE operator, except that there's no per-product totals, only per-person totals. You use both CUBE and ROLLUP with aggregates other than just COUNT() . For example: SELECT PersonID, ProductID, sum(QtyPurchased) as QtyPurchased FROM Sales WHERE personID > 4 GROUP BY PersonID, ProductID WITH ROLLUP Which returns the resultset: PersonID ProductID QtyPurchased ----------- ----------- ------------ 9 37 10 9 38 5 9 NULL 15 10 41 6 10 NULL 6 NULL NULL 21 (6 row(s) affected) As promised , now it's time to see how to get rid of those darn NULL s everywhere. The GROUPING KeywordThe GROUPING keyword is used to clean up the output of CUBE and ROLLUP to make the output look nicer. It provides you with a programmatic way to identify where CUBE and ROLLUP are putting NULL s, as opposed to where the NULL s are in the data being queried. It goes like this: SELECT CASE WHEN GROUPING(PersonID) = 1 THEN '(Total)' ELSE CONVERT(varchar(10), PersonID) END PersonID, CASE WHEN GROUPING(ProductID) = 1 THEN '(Total)' ELSE CONVERT(varchar(10), ProductID) END ProductID, SUM(QtyPurchased) as QtyPurchased FROM Sales WHERE personID > 4 GROUP BY PersonID, ProductID WITH ROLLUP Which yields the somewhat nicer looking resultset: PersonID ProductID QtyPurchased ---------- ---------- ------------ 9 37 10 9 38 5 9 (Total) 15 10 41 6 10 (Total) 6 (Total) (Total) 21 (6 row(s) affected) Now you don't have to mentally think "Total"; you can actually put it into the output. The GROUPING function returns a 1 whenever the column is NULL because of a grouping summarization found in a CUBE or ROLLUP . Otherwise, it returns . When to Use CUBE and ROLLUPCUBE and ROLLUP should be used instead of COMPUTE and COMPUTE BY whenever possible. Why? Well, SQL Server's internal mechanics are better at figuring out how to optimize CUBE and ROLLUP because the syntax is simpler. Of course, it's hard to beat a WITH ROLLUP or WITH CUBE as far as simplicity of syntax. Also, CUBE and ROLLUP return a simple single resultset, instead of the extra "stuff" returned by COMPUTE and COMPUTE BY . In addition, in most cases a CUBE or ROLLUP provides you with the output you need, and the queries are shorter and simpler, making them easier to debug. So, now you know how to query data on a local server using different types of joins and how to summarize the data. What happens if the data isn't on just one server, but is on several servers, in several formats? The rest of this chapter is dedicated to talking to other servers, importing and exporting data, and dealing with XML to conduct business. REVIEW BREAK: Review of Advanced Query TechniquesSo far, we've covered a lot of material, from completing the SELECT statement to aggregating and summarizing data. We're about to move on to something completely different, so here's a review of what we've done so far.
Well, you now know all the parts of a SELECT statement that you'll need to know for the exam. SELECT is the backbone of SQL; it's how things get queried in the Structured Query Language. The structures you've learned, especially the JOIN syntax, are used in the other main SQL statements ” INSERT , UPDATE , and DELETE ”and the syntax is the same as with SELECT . You'll find that CUBE and ROLLUP are useful, especially if you get into writing queries for reporting systems. |