Ranking Functions

SQL Server 2005 introduces a new category of built-in functions: analytical ranking functions. This category consists of four functionsROW_NUMBER(), RANK(), DENSE_RANK(), and NTILE()that you can use to generate a sequence number or a sequential number for each row or to rank rows based on some partition criteria. The main idea behind providing these functions is to reduce the complexity and amount of query code that you have to write in order to efficiently calculate row numbers or rank the rows. Let's look at an example of this and then examine each function in detail.

The goal of this example is to output a column containing a sequential number based on a unique combination of ProductID and LocationID columns in the Production.ProductInventory table in the AdventureWorks sample database. Here is how you would do it without using any ranking function:

USE AdventureWorks; GO SELECT   ( SELECT COUNT(*)     FROM Production.ProductInventory AS pi2     WHERE          pi2.ProductID < pi1.ProductID OR          (           pi2.ProductID = pi1.ProductID AND           pi2.LocationID <= pi1.LocationID          )   ) AS SequenceNo, * FROM Production.ProductInventory AS pi1 ORDER BY ProductID, LocationID; GO

This query produces the following output (some columns and rows have been omitted for brevity):

SequenceNo  ProductID   LocationID Shelf      Bin  Quantity ----------- ----------- ---------- ---------- ---- -------- 1           1           1          A          1    408 2           1           6          B          5    324 3           1           50         A          5    353 4           2           1          A          2    427 5           2           6          B          1    318 6           2           50         A          6    364

Here is how you can obtain the same results by using the new ROW_NUMBER() ranking function:

SELECT ROW_NUMBER() OVER (ORDER BY ProductID, LocationID) as SequenceNo, * FROM Production.ProductInventory; GO

Let's assume that you want to reset the sequence number and start over from number 1 when the ProductID changes. Here is how you would do it without using any ranking function:

SELECT   ( SELECT COUNT(*)     FROM Production.ProductInventory AS pi2     WHERE pi2.ProductID = pi1.ProductID AND          (pi2.ProductID < pi1.ProductID OR            (pi2.ProductID = pi1.ProductID AND             pi2.LocationID <= pi1.LocationID)          )   ) AS SequenceNo, * FROM Production.ProductInventory AS pi1 ORDER BY ProductID, LocationID; GO

This query produces the following output (some columns and rows have been omitted for brevity):

SequenceNo  ProductID   LocationID Shelf      Bin  Quantity ----------- ----------- ---------- ---------- ---- -------- 1           1           1          A          1    408 2           1           6          B          5    324 3           1           50         A          5    353 1           2           1          A          2    427 2           2           6          B          1    318 3           2           50         A          6    364 1           3           1          A          7    585 2           3           6          B          9    443

You can obtain the same results by using the new ROW_NUMBER() ranking function as shown here:

SELECT ROW_NUMBER() OVER  (PARTITION BY ProductID ORDER BY ProductID, LocationID) as SequenceNo, * FROM Production.ProductInventory; GO

As you can see in these two examples, you write less code when you use the ranking functions, the queries are simple to understand and maintain. And if you study the execution plan for these queries, you notice that ranking functions perform better than the traditional approach.

The `ROW_NUMBER` Ranking Function

As illustrated in the preceding section, the ROW_NUMBER ranking function can be used to sequentially number starting at 1 for the first row. You can partition the result set and restart the sequential numbering by using the PARTITION BY clause, also illustrated in the preceding section.

One common application where ROW_NUMBER can be very valuable is when you are building a web application and need to implement the paging in the result set:

SELECT * FROM    (SELECT ROW_NUMBER() OVER (ORDER BY ProductID, LocationID) as SequenceNo, *     FROM Production.ProductInventory) AS tblRows WHERE SequenceNo BETWEEN 101 and 120; GO

Note that you can directly use the alias for the ROW_NUMBER function; you simply have to nest the query as shown here to use the ROW_NUMBER column alias (SequenceNo in this case) in the WHERE condition.

The `RANK` and `DENSE_RANK` Ranking Functions

The RANK and DENSE_RANK ranking functions are similar to the ROW_NUMBER function as they also produce ranking values according to sorting criteria, optionally partitioned into groups of rows. However, unlike the ROW_NUMBER function, which generates a different value for each row, the RANK and DENSE_RANK functions output the same value for all rows that have the same values in the sort column. The RANK and DENSE_RANK functions differ in the way the rank number value is generated when the sort order column value changes. The following example illustrates this:

SELECT RANK() OVER (ORDER BY ProductID) as Rank,        DENSE_RANK() OVER (ORDER BY ProductID) as DenseRank,        * FROM Production.ProductInventory;

This query produces the following output (some columns and rows have been omitted for brevity):

Rank   DenseRank  ProductID   LocationID Shelf  Bin  Quantity ------ ---------- ----------- ---------- ------ ---- -------- 1      1          1           1          A      1    408 1      1          1           6          B      5    324 1      1          1           50         A      5    353 4      2          2           1          A      2    427 4      2          2           6          B      1    318 4      2          2           50         A      6    364 7      3          3           1          A      7    585 7      3          3           6          B      9    443 7      3          3           50         A      10   324 10     4          4           1          A      6    512

When ProductID changes, the DENSE_RANK function increments the rank value by 1. However, the RANK function increments the rank value by the number of rows for the preceding rank.

The `NTILE` Ranking Function

The fourth and final ranking function introduced in SQL Server 2005, NTILE can be used to assemble rows into a desired number of buckets according to specified sorting criteria, optionally within partitions. This function takes an integer value that indicates how many groups you want:

SELECT NTILE(3) OVER (ORDER BY ShipMethodID) AS Bucket, * FROM Purchasing.ShipMethod;

The Purchasing.ShipMethod table contains five rows. NTILE(3) divides those five rows into three buckets, returning values 1 through 3 in the first column, named Bucket, as the output of preceding query (some columns have been omitted for brevity):

Bucket   ShipMethodID Name -------  ------------ -------------------- 1        1            XRQ - TRUCK GROUND 1        2            ZY - EXPRESS 2        3            OVERSEAS - DELUXE 2        4            OVERNIGHT J-FAST 3        5            CARGO TRANSPORT 5

Note how the NTILE function handles the situation when even distribution of rows is not possible. It puts two rows into the first group, two rows into the second group, and the remaining one row into the third group. When even distribution is not possible, larger groups precede smaller groups. Implementing this functionality in SQL Server 2000 is possible but a little complex, and it requires more code.

The ROW_NUMBER Ranking Function

The RANK and DENSE_RANK Ranking Functions

The NTILE Ranking Function

The `ROW_NUMBER` Ranking Function

The `RANK` and `DENSE_RANK` Ranking Functions

The `NTILE` Ranking Function