Section 11.3. Aggregating by Range (Bands)

11.3. Aggregating by Range (Bands)

Some people have trouble writing SQL queries that return aggregates for bands. Such queries are actually quite easy to write using the case construct. By way of example, look at the problem of reporting on the distribution of tables by their total row counts. For instance, how many tables contain fewer than 100 rows, how many contain 100 to 10,000 rows, how many 10,000 to 1,000,000 rows, and how many tables store more than 1,000,000 rows?

Information about tables is usually accessible through data dictionary views: for instance, INFORMATION_SCHEMA.TABLES, pg_statistic, and pg_tables, dba_tables, syscat.tables, sysobjects and systabstats, and so on. In my explanation here, I'll assume the general case of a view named table_info, containing, among other things, the columns table_name and row_count. Using this table, a simple use of case and the suitable group by can give us the distribution by row_count that we are after:

 select case          when row_count < 100               then 'Under 100 rows'           when row_count >= 100 and row_count < 10000                then '100 to 10000'           when row_count >= 10000 and row_count < 1000000                then '10000 to 1000000'           else                'Over 1000000 rows'         end as range,         count(*) as table_count from table_info where row_count is not null group by case            when row_count < 100                 then 'Under 100 rows'            when row_count >= 100 and row_count < 10000                 then '100 to 10000'            when row_count >= 10000 and row_count < 1000000                 then '10000 to 1000000'            else                 'Over 1000000 rows'          end

There is only one snag here: group by performs a sort before aggregating data. Since we are associating a label with each of our aggregates, the result is, by default, alphabetically sorted on that label:

 RANGE             TABLE_COUNT ----------------- ------------ 100 to 10000                18 10000 to 1000000            15 Over 1000000 rows            6 Under 100 rows              24

The ordering that would be logical to a human eye in such a case is to see Under 100 rows appear first, and then each band by increasing number of rows, with Over 1,000,000 rows coming last. Rather than trying to be creative with labels, the stratagem to solve this problem consists of two steps:

Performing the group by on two, instead of one, columns, associating with each label a dummy column, the only purpose of which is to serve as a sort key
Wrapping up the query as a query within the from clause, so as to mask the sort key thus created and ensure that only the data of interest is returned

Here is the query that results from applying the preceding two steps:

 select row_range, table_count from ( -- Build a sort key to have bands suitably ordered        -- and hide it inside a subquery        select case                when row_count < 100                  then 1                when row_count >= 100 and row_count < 10000                  then 2                when row_count >= 10000 and row_count < 1000000                  then 3                else                  4              end as sortkey,              case                when row_count < 100                  then 'Under 100 rows'                when row_count >= 100 and row_count < 10000                  then '100 to 10000'                when row_count >= 10000 and row_count < 1000000                  then '10000 to 1000000'                else                  'Over 1000000 rows'              end  as row_range,              count(*) as table_count       from table_info       where row_count is not null       group by case                  when row_count < 100                    then 'Under 100 rows'                  when row_count >= 100 and row_count < 10000                    then '100 to 10000'                  when row_count >= 10000 and row_count < 1000000                    then '10000 to 1000000'                  else                    'Over 1000000 rows'                end,                case                  when row_count < 100                    then 1                  when row_count >= 100 and row_count < 10000                    then 2                  when row_count >= 10000 and row_count < 1000000                    then 3                  else                    4                end) dummy order by sortkey;

And following are the results from executing that query:

 ROW_RANGE         TABLE_COUNT ----------------- ----------- Under 100 rows             24 100 to 10000               18 10000 to 1000000           15 Over 1000000 rows           6

Aggregating by range (bands) requires building an artificial sort key to display results in desired order.