11.3. Aggregating by Range (Bands)Some people have trouble writing SQL queries that return aggregates for bands. Such queries are actually quite easy to write using the case construct. By way of example, look at the problem of reporting on the distribution of tables by their total row counts. For instance, how many tables contain fewer than 100 rows, how many contain 100 to 10,000 rows, how many 10,000 to 1,000,000 rows, and how many tables store more than 1,000,000 rows? Information about tables is usually accessible through data dictionary views: for instance, INFORMATION_SCHEMA.TABLES, pg_statistic, and pg_tables, dba_tables, syscat.tables, sysobjects and systabstats, and so on. In my explanation here, I'll assume the general case of a view named table_info, containing, among other things, the columns table_name and row_count. Using this table, a simple use of case and the suitable group by can give us the distribution by row_count that we are after: select case when row_count < 100 then 'Under 100 rows' when row_count >= 100 and row_count < 10000 then '100 to 10000' when row_count >= 10000 and row_count < 1000000 then '10000 to 1000000' else 'Over 1000000 rows' end as range, count(*) as table_count from table_info where row_count is not null group by case when row_count < 100 then 'Under 100 rows' when row_count >= 100 and row_count < 10000 then '100 to 10000' when row_count >= 10000 and row_count < 1000000 then '10000 to 1000000' else 'Over 1000000 rows' end There is only one snag here: group by performs a sort before aggregating data. Since we are associating a label with each of our aggregates, the result is, by default, alphabetically sorted on that label: RANGE TABLE_COUNT ----------------- ------------ 100 to 10000 18 10000 to 1000000 15 Over 1000000 rows 6 Under 100 rows 24 The ordering that would be logical to a human eye in such a case is to see Under 100 rows appear first, and then each band by increasing number of rows, with Over 1,000,000 rows coming last. Rather than trying to be creative with labels, the stratagem to solve this problem consists of two steps:
Here is the query that results from applying the preceding two steps: select row_range, table_count from ( -- Build a sort key to have bands suitably ordered -- and hide it inside a subquery select case when row_count < 100 then 1 when row_count >= 100 and row_count < 10000 then 2 when row_count >= 10000 and row_count < 1000000 then 3 else 4 end as sortkey, case when row_count < 100 then 'Under 100 rows' when row_count >= 100 and row_count < 10000 then '100 to 10000' when row_count >= 10000 and row_count < 1000000 then '10000 to 1000000' else 'Over 1000000 rows' end as row_range, count(*) as table_count from table_info where row_count is not null group by case when row_count < 100 then 'Under 100 rows' when row_count >= 100 and row_count < 10000 then '100 to 10000' when row_count >= 10000 and row_count < 1000000 then '10000 to 1000000' else 'Over 1000000 rows' end, case when row_count < 100 then 1 when row_count >= 100 and row_count < 10000 then 2 when row_count >= 10000 and row_count < 1000000 then 3 else 4 end) dummy order by sortkey; And following are the results from executing that query: ROW_RANGE TABLE_COUNT ----------------- ----------- Under 100 rows 24 100 to 10000 18 10000 to 1000000 15 Over 1000000 rows 6 Aggregating by range (bands) requires building an artificial sort key to display results in desired order. |