T-SQL Enhancements

Now that SQL Server 2005 allows the writing of stored procedures, functions, triggers, and other database objects using .NET code, is T-SQL going away? Absolutely not! T-SQL is still the premier language for writing SQL Server data access batches, stored procedures, and so on. Even with SQLCLR or any other data-access API, the query language used is still TSQL. SQL Server 2005 introduces some very cool enhancements to T-SQL that allow you to write better queries and modules. The following sections introduce some of these enhancements, and Chapter 6, "Transact-SQL Enhancements," discusses each of these new T-SQL features in detail.

Recursive and Non-recursive Common Table Expressions (CTEs)

You can think of CTEs as a simple yet more powerful alternative to derived tables. In some cases, CTEs may be used in places where you are currently using temporary tables, table variables, or views. The three important motivations for introducing CTEs in SQL Server 2005 are recursion; to provide alternative, simplified, readable, and manageable syntax for complex SQL statements, possibly making use of derived tables; and ANSI SQL-99 compliance (CTEs are defined in SQL-99).

A CTE can be defined as a temporary named result set, which is derived from a simple query and defined within the execution scope of a SELECT, INSERT, UPDATE, or DELETE statement. It is important to note that the scope of a CTE is just the statement in which it is declared. The CTE named result set is not available after the statement in which it is declared and used.

Here is a simple example that uses a CTE to count number of direct reports for each manager in the AdventureWorks sample database:

USE AdventureWorks ; GO WITH DirReps(ManagerID, DirectReports) AS (     SELECT ManagerID, COUNT(*)     FROM HumanResources.Employee AS e     WHERE ManagerID IS NOT NULL     GROUP BY ManagerID ) SELECT * FROM DirReps ORDER BY ManagerID GO

In the scope of a SELECT statement, this query declares a result set named DirReps, and then the outer SELECT statement retrieves everything from that result set. Nothing fancy herejust a simple example to give you a taste of CTE syntax.

The following CTE example that is a bit more involved. This query is used to trend the sales data; for every year, it gets the total sales amount and compares it with the previous year's sales:

WITH YearlyOrderAmtCTE(OrderYear, TotalAmount) AS (      SELECT YEAR(OrderDate), SUM(OrderQty*UnitPrice)      FROM Sales.SalesOrderHeader AS H JOIN Sales.SalesOrderDetail AS D           ON H.SalesOrderID = D.SalesOrderID      GROUP BY YEAR(OrderDate) ), SalesTrendCTE(OrderYear, Amount, AmtYearBefore, AmtDifference, DiffPerc) AS (  SELECT thisYear.OrderYear, thisYear.TotalAmount,         lastYear.TotalAmount,         thisYear.TotalAmount - lastYear.TotalAmount,         (thisYear.TotalAmount/lastYear.TotalAmount - 1) * 100  FROM YearlyOrderAmtCTE AS thisYear       LEFT OUTER JOIN YearlyOrderAmtCTE AS lastYear       ON thisYear.OrderYear = lastYear.OrderYear + 1 ) SELECT * FROM SalesTrendCTE GO

This query essentially defines two CTEs. The first CTE, called YearlyOrderAmtCTE, groups the total sales by year. This CTE is then used in the second CTE, called SalesTrendCTE, and the outer or main query selects all the rows from SalesTrendCTE. Notice how SalesTrendCTE uses YearlyOrderAmtCTE to get the current and previous years' total sales figures.

In the previous example, the second CTE refers to the previous CTE. If a CTE refers to itself, it is then called a recursive CTE, and this is where CTEs get interesting. Without any further ado, here is an example of a recursive CTE:

WITH MgrHierarchyCTE(EmployeeID, EmployeeName, ManagerID, Level) AS (    SELECT e.EmployeeID, c.FirstName + ' ' + c.LastName, e.ManagerID, 0    FROM HumanResources.Employee AS e         JOIN Person.Contact AS c         ON c.ContactID = e.ContactID    WHERE e.EmployeeID = 111    UNION ALL    SELECT mgr.EmployeeID, co.FirstName + ' ' + co.LastName, mgr.ManagerID,           Level + 1    FROM HumanResources.Employee AS mgr         JOIN Person.Contact AS co         ON co.ContactID = mgr.ContactID         JOIN MgrHierarchyCTE AS cte         ON cte.ManagerID = mgr.EmployeeID ) SELECT * FROM MgrHierarchyCTE; GO

This recursive CTE example illustrates traversing up the management hierarchy all the way up to the topmost manager for the employee with the ID 111. The query returns a result set that contains a row for each manager in the hierarchy, starting with the immediate manager and going up to the top-level manager.

The first SELECT statement within the CTE definition finds the immediate manager details. This row is then combined, using UNION ALL, with another SELECT statement that self-references the CTE to traverse the management hierarchy.

The most common problem with recursion is infinite loops. SQL Server solves this problem by defining a serverwide recursion level limit setting called MAXRECURSION, which defaults to 100. You can also specify a MAXRECURSION hint in your CTE query to limit the recursion level for that query.

CTEs are discussed in more detail in Chapter 6.

Improved Error Handling

In previous releases, the error handling inside T-SQL scripts was done by using @@ERROR at multiple places and by using GOTO and RETURN statements. SQL Server 2005 adopts the modern structured error handling by introducing the trY...CATCH construct. If you are familiar with Visual Basic .NET or C# .NET, you can think of this as being equivalent to TRy...catch blocks; however, there is no equivalent to finally blocks.

Here is an example of the modern error handling paradigm in SQL Server 2005:

USE AdventureWorks GO BEGIN TRY     -- Generate a constraint violation error.     DELETE FROM Production.Product WHERE ProductID = 980; END TRY BEGIN CATCH     SELECT ERROR_NUMBER() AS ErrorNumber,            ERROR_SEVERITY() AS ErrorSeverity,            ERROR_STATE() as ErrorState,            ERROR_MESSAGE() as ErrorMessage; END CATCH GO

To better handle errors, all you have to do is put T-SQL statements inside BEGIN TRY and END TRY, and if an error occurs, you can obtain detailed error information by using various ERROR_XXXX functions. In addition to the previously mentioned functions, you can also use the ERROR_LINE() and ERROR_PROCEDURE() functions inside a module to find the exact line where the error occurred and the name of the module.

Limiting a Result Set by Using `TOP` and `TABLESAMPLE`

The TOP keyword is not new to SQL Server 2005. It has been available since SQL Server 7 and can be used to limit a result set to a specified exact number or percentage of rows. However, it is enhanced in this edition so that TOP can be used for DML statements (INSERT, UPDATE, and DELETE). In addition, you no longer have to hard-code the number of rows or percentage value; the TOP keyword now also accepts an expression.

In addition to TOP, SQL Server 2005 introduces the new keyword TABLESAMPLE, which you use to limit the number of rows returned. Unlike TOP, TABLESAMPLE returns a random set of rows from throughout the set of rows processed by the query, and TABLESAMPLE cannot be used with a view. As with TOP, you can specify an exact number of rows or a percentage number.

This sampling technique can be used to get better performance for queries on large databases where absolutely exact results are not desired. For example, if you want to find an approximate estimate for the average employee salary, you can use TABLESAMPLE to do aggregation on a sample of data rather than on an entire large dataset.

The SQL:2003 proposal includes two sampling methods: BERNOULLI and SYSTEM. SQL Server 2005 supports only the SYSTEM sampling method.

Here is an example script that passes a variable to the TOP keyword and illustrates the new TABLESAMPLE keyword:

USE AdventureWorks GO DECLARE @var INT; SET @var = 3; SELECT TOP (@var) * FROM Sales.Store; GO SELECT AVG(SickLeaveHours) FROM HumanResources.Employee TABLESAMPLE SYSTEM (20 PERCENT) REPEATABLE(3); GO

Like all other T-SQL topics, TOP and TABLESAMPLE are also further discussed in Chapter 6.

Ranking Functions

SQL Server 2005 introduces four very useful functionsROW_NUMBER(), RANK(), DENSE_RANK(), and NTILE()that can be used to rank the rows in a partition. For example, you can generate a sequence number, generate a different number for each row, or rank rows based on some partition criteria:

ROW_NUMBER() You can use this function to generate sequential row numbers. This can be very handy when you're implementing scenarios such as paging in web pages.
RANK() You can use this function to rank the rows within the partition of a result set. The rank of a row is 1 plus the number of ranks that precede the row in question.
DENSE_RANK() This function is similar to RANK() except that it does not leave any gaps in ranking. The rank of a row is 1 plus the number of distinct ranks that precede the row in question.
NTILE() You can use this function to distribute the rows in an ordered partition into a specified number of groups.

Here is an example of a T-SQL statement that makes use of all four of the ranking functions:

USE AdventureWorks; GO SELECT ROW_NUMBER() OVER(ORDER BY ListPrice DESC) as RowNum,        ProductID, Name, ProductNumber, ListPrice,        RANK() OVER (ORDER BY ListPrice DESC) as Rank,        DENSE_RANK() OVER (ORDER BY ListPrice DESC) as DenseRank,        NTILE(10) OVER (ORDER BY ListPrice DESC) as NTile_10,        NTILE(20) OVER (ORDER BY ListPrice DESC) as NTile_20 FROM Production.Product;

If you try out this query in the Query Editor in Management Studio, you should notice that the first column, RowNum, is a sequential number starting with 1; the Rank column starts with 1 and when the price changes, the value for Rank column in the next row is 1 plus the number of rows having rank as 1, and so on. When the price column value changes, the DenseRank column value changes to 2, then 3, and so on. The NTile_10 and NTile_20 columns contain values starting from 1 through 10 and 1 through 20, dividing the result set into 10 and 20 groups, respectively.

`PIVOT` and `UNPIVOT`

PIVOT and UNPIVOT are two new keywords introduced in SQL Server 2005 that can be used with the SELECT statement. In simple terms, PIVOT can be used to turn rows into columns, and UNPIVOT can be used to turn columns into rows.

Let's begin by looking at an example of the PIVOT keyword:

SELECT [316] AS Blade, [331] AS [Fork End] FROM    (SELECT ProductID, Quantity FROM Production.ProductInventory) AS pinv    PIVOT    (       SUM (Quantity)       FOR ProductID IN ([316], [331])    ) AS pvt; GO

This query essentially generates cross-tabulation reports to summarize the quantity for two products. You should run the query by using the Query Editor in Management Studio and notice that it returns two columns, named Blade and Fork End, and a single row that contains the total quantity available for these two products.

The counterpart of PIVOT is UNPIVOT, which can be used to turn columns into rows. Here is an example of UNPIVOT:

CREATE TABLE dbo.tblGrades   (StudentID int NOT NULL PRIMARY KEY,    Term1 CHAR(1) NOT NULL,    Term2 CHAR(1) NOT NULL,    Term3 CHAR(1) NOT NULL,    Term4 CHAR(1) NOT NULL); GO INSERT INTO dbo.tblGrades SELECT 1, 'A', 'B', 'C', 'D'; INSERT INTO dbo.tblGrades SELECT 2, 'D', 'C', 'B', 'A'; GO SELECT * FROM dbo.tblGrades; GO SELECT StudentID, Term, Grade FROM    (SELECT StudentID, Term1, Term2, Term3, Term4 FROM dbo.tblGrades) p     UNPIVOT       (Grade FOR Term IN       (Term1, Term2, Term3, Term4) )AS unpvt GO IF OBJECT_ID('dbo.tblGrades') IS NOT NULL BEGIN    DROP TABLE dbo.tblGrades; END GO

This sample T-SQL code begins by creating a table that stores student grades for four terms. Each term is a column in the table. The UNPIVOT statement is then used to turn term grade columns into rows. Figure 4.3 shows the power of the UNPIVOT keyword.

Figure 4.3. The `UNPIVOT` keyword can be used to convert columns into rows.

SQL Server 2005 introduces several other T-SQL enhancements, such as the APPLY keyword to invoke a table valued function for each row in the rowset; the OUTPUT keyword to retrieve rows affected by DML statements; referential integrity enhancements to set a default value or null when a row is deleted from the parent table; and enhancements to FOR XML and OPENXML. Refer to Chapter 6 for complete details on these enhancements.

Recursive and Non-recursive Common Table Expressions (CTEs)

Improved Error Handling

Limiting a Result Set by Using TOP and TABLESAMPLE

Ranking Functions

PIVOT and UNPIVOT

Figure 4.3. The UNPIVOT keyword can be used to convert columns into rows.

Limiting a Result Set by Using `TOP` and `TABLESAMPLE`

`PIVOT` and `UNPIVOT`

Figure 4.3. The `UNPIVOT` keyword can be used to convert columns into rows.