The TABLESAMPLE Clause


The TABLESAMPLE Clause

The previous section illustrates one way of limiting a result set by using the TOP operator. SQL Server 2005 introduces a new clause, TABLESAMPLE, that you can use in a SELECT statement to restrict a result set to a randomly selected row. This clause is useful when you don't necessarily need to process all the rows and when you don't need the exact resultsyou just need to work on a sample of rows from a table that contains millions of rows. For instance, if a sales table contains a few million rows and you need to find out the approximate average order quantity, using TABLESAMPLE will yield better performance than processing all the rows in the table.

You can specify a number that indicates how many rows or what percentage of rows should be sampled. If you specify a number instead of a percentage, SQL Server calculates a percentage value based on the total number of rows and uses that percentage value for sampling. Let's assume that this percentage number is 30. SQL Server will then return all the rows from 30% of the specified table's data pages. If a table consists of a single page, either all rows on the page are returned or none of the rows are returned.

Note

TABLESAMPLE cannot be used with views or table variables. Also, before you can use TABLESAMPLE, the database's compatibility level must be set to 90.


Let's look at an example of using the TABLESAMPLE clause to find out the approximate average line total from the Sales.SalesOrderDetail table in the AdventureWorks sample database:

USE AdventureWorks; GO SELECT AVG(LineTotal) FROM Sales.SalesOrderDetail; GO SELECT AVG(LineTotal) FROM Sales.SalesOrderDetail      TABLESAMPLE (30 PERCENT); GO

The first SELECT statement finds the exact average value by processing all the rows in the table, and on my machine it returns 905.449206. If you execute this SELECT statement multiple times, you will notice that every time it returns a different value, which shows that SQL Server 2005 selects rows from a 30% set of random pages to calculate the average. You should get values such as 919.500875, 913.139024, 907.427679, 922.164805, and so on. If you want to get the same results every time, you can use the REPEATABLE clause and specify a seed value. As long as the same seed value is passed and the data is unchanged, you should get the same result:

SELECT AVG(LineTotal) FROM Sales.SalesOrderDetail      TABLESAMPLE (30 PERCENT) REPEATABLE(2); GO

The ISO SQL-2003 standard includes two sampling methods: BERNOULLI and SYSTEM. SQL Server 2005 supports only the SYSTEM sampling method. The other sampling method might be supported in the future. You can be specific and provide the SYSTEM method as an algorithm name in the TABLESAMPLE statement, as shown here:

SELECT AVG(LineTotal) FROM Sales.SalesOrderDetail      TABLESAMPLE SYSTEM (50000 ROWS); GO

This query also illustrates how you specify the number of rows instead of a percentage value.




Microsoft SQL Server 2005(c) Changing the Paradigm
Microsoft SQL Server 2005: Changing the Paradigm (SQL Server 2005 Public Beta Edition)
ISBN: 0672327783
EAN: 2147483647
Year: 2005
Pages: 150

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net