Recipe12.7.Creating Buckets of Data, of a Fixed Size


Recipe 12.7. Creating Buckets of Data, of a Fixed Size

Problem

You wish to organized data into evenly sized buckets, with a predetermined number of elements in each bucket. The total number of buckets may be unknown, but you want to ensure that each bucket has five elements. For example, you want to organize the employees in table EMP into groups of five based on the value of EMPNO, as shown in the following results:

 GRP      EMPNO ENAME --- ---------- -------   1       7369 SMITH   1       7499 ALLEN   1       7521 WARD   1       7566 JONES   1       7654 MARTIN   2       7698 BLAKE   2       7782 CLARK   2       7788 SCOTT   2       7839 KING   2       7844 TURNER   3       7876 ADAMS   3       7900 JAMES   3       7902 FORD   3       7934 MILLER 

Solution

The solution to this problem is greatly simplified if your RDBMS provides functions for ranking rows. Once rows are ranked, creating buckets of five is simply a matter of dividing and then taking the mathematical ceiling of the quotient.

DB2, Oracle, and SQL Server

Use the window function ROW_NUMBER OVER to rank each employee by EMPNO. Then divide by 5 to create the groups (SQL Server users will use CEILING, not CEIL):

 1 select ceil(row_number()over(order by empno)/5.0) grp, 2        empno, 3        ename 4   from emp 

PostgreSQL and MySQL

Use a scalar subquery to rank each EMPNO. Then divide by 5 to create the groups:

 1 select ceil(rnk/5.0) as grp, 2        empno, ename 3   from ( 4 select e.empno, e.ename, 5        (select count(*) from emp d 6          where e.empno > d.empno)+1 as rnk 7   from emp e 8        ) x 9  order by grp 

Discussion

DB2, Oracle, and SQL Server

The window function ROW_NUMBER OVER assigns a rank or "row number" to each row sorted by EMPNO:

  select row_number( )over(order by empno) rn,        empno,        ename   from emp RN      EMPNO ENAME -- ---------- ----------  1       7369 SMITH  2       7499 ALLEN  3       7521 WARD  4       7566 JONES  5       7654 MARTIN  6       7698 BLAKE  7       7782 CLARK  8       7788 SCOTT  9       7839 KING 10       7844 TURNER 11       7876 ADAMS 12       7900 JAMES 13       7902 FORD 14       7934 MILLER 

The next step is to apply the function CEIL (or CEILING) after dividing ROW_ NUMBER OVER by five. Dividing by five logically organizes the rows into groups of five, i.e., five values less than or equal to 1, five values greater than 1 but less than or equal to 2, the remaining group (composed of the last four rows since 14, the number of rows in table EMP, is not a multiple of 5) has a value greater than 2 but less than or equal to 3.

The CEIL function will return the smallest whole number greater than the value passed to it; this will create whole number groups. The results of the division and application of the CEIL are shown below. You can follow the order of operation from left to right, from RN to DIVISION to GRP:

  select row_number( )over(order by empno) rn,        row_number( )over(order by empno)/5.0 division,        ceil(row_number( )over(order by empno)/5.0) grp,        empno,        ename   from emp RN   DIVISION GRP EMPNO ENAME -- ---------- --- ----- ----------  1         .2   1  7369 SMITH  2         .4   1  7499 ALLEN  3         .6   1  7521 WARD  4         .8   1  7566 JONES  5          1   1  7654 MARTIN  6        1.2   2  7698 BLAKE  7        1.4   2  7782 CLARK  8        1.6   2  7788 SCOTT  9        1.8   2  7839 KING 10          2   2  7844 TURNER 11        2.2   3  7876 ADAMS 12        2.4   3  7900 JAMES 13        2.6   3  7902 FORD 14        2.8   3  7934 MILLER 

PostgreSQL and MySQL

The first step is to use a scalar subquery to rank each row by EMPNO:

  select (select count(*) from emp d          where e.empno < d.empno)+1 as rnk,        e.empno, e.ename   from emp e  order by 1 RNK      EMPNO ENAME --- ---------- ----------  1        7934 MILLER  2        7902 FORD  3        7900 JAMES  4        7876 ADAMS  5        7844 TURNER  6        7839 KING  7        7788 SCOTT  8        7782 CLARK  9        7698 BLAKE 10        7654 MARTIN 11        7566 JONES 12        7521 WARD 13        7499 ALLEN 14        7369 SMITH 

The next step is to apply the function CEIL after dividing RNK by 5. Dividing by 5 logically organizes the rows into groups of five, i.e., five values less than or equal to 1, five values greater than one but less than or equal to 2, the remaining group (composed of the last four rows since 14, the number of rows in table EMP, is not a multiple of 5) has a value greater than 2 but less than or equal to 3. The results of the division and application of the CEIL are shown below. You can follow the order of operation from left to right as you work your way from RNK over to GRP:

  select rnk,        rnk/5.0 as division,        ceil(rnk/5.0) as grp,        empno, ename   from ( Select e.empno, e.ename,        (select count(*) from emp d          where e.empno < d.empno)+1 as rnk   from emp e        ) x  order by 1 RNK   DIVISION GRP EMPNO ENAME --- ---------- --- ----- -------   1         .2   1  7934 MILLER   2         .4   1  7902 FORD   3         .6   1  7900 JAMES   4         .8   1  7876 ADAMS   5          1   1  7844 TURNER   6        1.2   2  7839 KING   7        1.4   2  7788 SCOTT   8        1.6   2  7782 CLARK   9        1.8   2  7698 BLAKE  10          2   2  7654 MARTIN  11        2.2   3  7566 JONES  12        2.4   3  7521 WARD  13        2.6   3  7499 ALLEN  14        2.8   3  7369 SMITH 




SQL Cookbook
SQL Cookbook (Cookbooks (OReilly))
ISBN: 0596009763
EAN: 2147483647
Year: 2005
Pages: 235

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net