Recipe 12.7. Creating Buckets of Data, of a Fixed SizeProblemYou wish to organized data into evenly sized buckets, with a predetermined number of elements in each bucket. The total number of buckets may be unknown, but you want to ensure that each bucket has five elements. For example, you want to organize the employees in table EMP into groups of five based on the value of EMPNO, as shown in the following results: GRP EMPNO ENAME --- ---------- ------- 1 7369 SMITH 1 7499 ALLEN 1 7521 WARD 1 7566 JONES 1 7654 MARTIN 2 7698 BLAKE 2 7782 CLARK 2 7788 SCOTT 2 7839 KING 2 7844 TURNER 3 7876 ADAMS 3 7900 JAMES 3 7902 FORD 3 7934 MILLER SolutionThe solution to this problem is greatly simplified if your RDBMS provides functions for ranking rows. Once rows are ranked, creating buckets of five is simply a matter of dividing and then taking the mathematical ceiling of the quotient. DB2, Oracle, and SQL ServerUse the window function ROW_NUMBER OVER to rank each employee by EMPNO. Then divide by 5 to create the groups (SQL Server users will use CEILING, not CEIL): 1 select ceil(row_number()over(order by empno)/5.0) grp, 2 empno, 3 ename 4 from emp PostgreSQL and MySQLUse a scalar subquery to rank each EMPNO. Then divide by 5 to create the groups: 1 select ceil(rnk/5.0) as grp, 2 empno, ename 3 from ( 4 select e.empno, e.ename, 5 (select count(*) from emp d 6 where e.empno > d.empno)+1 as rnk 7 from emp e 8 ) x 9 order by grp DiscussionDB2, Oracle, and SQL ServerThe window function ROW_NUMBER OVER assigns a rank or "row number" to each row sorted by EMPNO: select row_number( )over(order by empno) rn, empno, ename from emp RN EMPNO ENAME -- ---------- ---------- 1 7369 SMITH 2 7499 ALLEN 3 7521 WARD 4 7566 JONES 5 7654 MARTIN 6 7698 BLAKE 7 7782 CLARK 8 7788 SCOTT 9 7839 KING 10 7844 TURNER 11 7876 ADAMS 12 7900 JAMES 13 7902 FORD 14 7934 MILLER The next step is to apply the function CEIL (or CEILING) after dividing ROW_ NUMBER OVER by five. Dividing by five logically organizes the rows into groups of five, i.e., five values less than or equal to 1, five values greater than 1 but less than or equal to 2, the remaining group (composed of the last four rows since 14, the number of rows in table EMP, is not a multiple of 5) has a value greater than 2 but less than or equal to 3. The CEIL function will return the smallest whole number greater than the value passed to it; this will create whole number groups. The results of the division and application of the CEIL are shown below. You can follow the order of operation from left to right, from RN to DIVISION to GRP: select row_number( )over(order by empno) rn, row_number( )over(order by empno)/5.0 division, ceil(row_number( )over(order by empno)/5.0) grp, empno, ename from emp RN DIVISION GRP EMPNO ENAME -- ---------- --- ----- ---------- 1 .2 1 7369 SMITH 2 .4 1 7499 ALLEN 3 .6 1 7521 WARD 4 .8 1 7566 JONES 5 1 1 7654 MARTIN 6 1.2 2 7698 BLAKE 7 1.4 2 7782 CLARK 8 1.6 2 7788 SCOTT 9 1.8 2 7839 KING 10 2 2 7844 TURNER 11 2.2 3 7876 ADAMS 12 2.4 3 7900 JAMES 13 2.6 3 7902 FORD 14 2.8 3 7934 MILLER PostgreSQL and MySQLThe first step is to use a scalar subquery to rank each row by EMPNO: select (select count(*) from emp d where e.empno < d.empno)+1 as rnk, e.empno, e.ename from emp e order by 1 RNK EMPNO ENAME --- ---------- ---------- 1 7934 MILLER 2 7902 FORD 3 7900 JAMES 4 7876 ADAMS 5 7844 TURNER 6 7839 KING 7 7788 SCOTT 8 7782 CLARK 9 7698 BLAKE 10 7654 MARTIN 11 7566 JONES 12 7521 WARD 13 7499 ALLEN 14 7369 SMITH The next step is to apply the function CEIL after dividing RNK by 5. Dividing by 5 logically organizes the rows into groups of five, i.e., five values less than or equal to 1, five values greater than one but less than or equal to 2, the remaining group (composed of the last four rows since 14, the number of rows in table EMP, is not a multiple of 5) has a value greater than 2 but less than or equal to 3. The results of the division and application of the CEIL are shown below. You can follow the order of operation from left to right as you work your way from RNK over to GRP: select rnk, rnk/5.0 as division, ceil(rnk/5.0) as grp, empno, ename from ( Select e.empno, e.ename, (select count(*) from emp d where e.empno < d.empno)+1 as rnk from emp e ) x order by 1 RNK DIVISION GRP EMPNO ENAME --- ---------- --- ----- ------- 1 .2 1 7934 MILLER 2 .4 1 7902 FORD 3 .6 1 7900 JAMES 4 .8 1 7876 ADAMS 5 1 1 7844 TURNER 6 1.2 2 7839 KING 7 1.4 2 7788 SCOTT 8 1.6 2 7782 CLARK 9 1.8 2 7698 BLAKE 10 2 2 7654 MARTIN 11 2.2 3 7566 JONES 12 2.4 3 7521 WARD 13 2.6 3 7499 ALLEN 14 2.8 3 7369 SMITH |