The simplest way to obtain data from a single table is by executing the SELECT statement. This statement is used to retrieve data stored in relational databases. The following example uses the SELECT statement to obtain the content of a table that stores the headers of purchase orders.
From the Start menu, select Programs Microsoft SQL Server 2005 SQL Server Management Studio.
Connect to your Microsoft SQL Server 2005 instance.
In Object Explorer, expand the Databases node, and then select the AdventureWorks database.
Click the New Query toolbar button to display a query window.
In query window, enter the following T-SQL sentence. This sentence can be accessed from \Ch07\Samples01.sql in the sample files.
SELECT * FROM Sales.SalesOrderHeader
Click the Execute toolbar button.
Part of the result is shown below (the results are shown scrolled to the right).
The statement above returns all of the fields of the SalesOrderHeader table because the asterisk (*) requests all columns. It is typically not necessary to request all columns in a table, and doing so when not needed may cause several problems.
An application may not work properly when new columns are added to a database. The application will continue to retrieve all of the columns, and the new, unexpected columns will not be handled correctly.
The Query Optimizer will not use some indexes when you select all of the columns in a table. Selecting only the columns that you are going to use allows the Optimizer use the appropriate indexes. However, if you select all of the columns, the indexes cannot be used.
For these reasons, it is best not to use the asterisk (*) in your queries. As an example, suppose that you need data from the OrderDate, SalesOrderNumber, CustomerId, Subtotal, TaxAmt, and TotalDue columns. The following query will obtain only the columns you need.
SELECT OrderDate,SalesOrderNumber,CustomerID,Subtotal,TaxAmt,TotalDue FROM Sales.SalesOrderHeader
The statement above returns all of the rows in the SalesOrderHeader table from the database, comprising 31,465 rows; it can take a few seconds to return the data. When a table contains a small number of rows, the time it takes to return the data is acceptable. But if a table contains millions of OrderHeaders, returning all of this data is not an option. Therefore, you must retrieve only the rows you are going to use. When selecting your data, answer the following two questions: What columns do you need? What rows do you need? By correctly answering these questions, you can create server-friendly queries and efficiently obtain the most useful information.
Returning to the example above, suppose that the answers to these two questions are that you need the OrderDate, SalesOrderNumber, CustomerId, Subtotal, TaxAmt, and TotalDue columns, but you only need the data from those shipped on July 8, 2001. The query you must write to obtain this particular information should look like the following:
SELECT OrderDate,SalesOrderNumber,CustomerID,Subtotal,TaxAmt,TotalDue FROM Sales.SalesOrderHeader WHERE ShipDate='20010708'
Below are the results of executing this query (note that the OrderDate is displayed, not the ShipDate).
You obtained only forty-three of the 31,465 rows contained in the SalesOrderHeader table, and these rows contain only the columns you specifically need. With this new knowledge about writing queries, you can optimize your queries to obtain benefits on the server layer.
The previous figure displays the standard way to represent dates by using four digits for the year (yyyy), two more for the month (mm), and two more for the day (dd). This format helps to avoid internationalization issues when using dates. If necessary, you can also add hours, minutes, and seconds when using this format (yyyymmdd hh:mm:ss.ddd).
You may occasionally need to combine more than one condition to obtain the data you need. To combine conditions, you can use the AND and OR operators.
The AND operator is used when you want multiple conditions to be true. Suppose that you want to access the orders shipped on July 8, 2001, but only those orders from the customer with Customer Id number 676. You can obtain this information by combining conditions on the WHERE clause using the AND operator. The statement to obtain this data will look like the following:
SELECT OrderDate,SalesOrderNumber,CustomerID,Subtotal,TaxAmt,TotalDue FROM Sales.SalesOrderHeader WHERE ShipDate='20010708' and CustomerID=676
The OR operator is used when you want at least one of the conditions to be true. Suppose that you need to obtain the orders shipped on July 8, 2001, as well as those shipped on July 9, 2001. You can again obtain this information by combining restrictions on the WHERE clause using the OR operator. The statement to obtain this data will look like the following:
SELECT OrderDate,SalesOrderNumber,CustomerID,Subtotal,TaxAmt,TotalDue FROM Sales.SalesOrderHeader WHERE ShipDate='20010708' or ShipDate='20010709'
In other situations, you might need to combine both AND and OR operators to obtain the desired data. To combine both operators, you must pay attention to operator precedence . Operator precedence can be demonstrated with the following example. Suppose that you need to obtain the orders shipped on July 8, 2001, in which the Total Due is greater than $10K or less than $2K. The statement to obtain this data will be:
SELECT OrderDate,SalesOrderNumber,CustomerID,Subtotal,TaxAmt,TotalDue FROM Sales.SalesOrderHeader WHERE ShipDate='20010708' and (TotalDue<=2000 or TotalDue>=10000)
The parentheses in the example above are important because they set the operator precedence. The result becomes quite different if parentheses are not used. When parentheses are included in the statement, you obtain all of the orders shipped with a Total Due amount of less than $2,000 or greater than $10,000 from only those orders shipped on July 8, 2001. When parentheses are not included, the AND operator is evaluated first, returning all orders shipped on July 8, 2001, with a Total Due amount of less than $2,000. The OR operator is then evaluated, and you obtain all orders shipped with a Total Due amount greater than $10,000 regardless of the date the order was shipped. You can see how important it is to use parentheses to specify the operator precedence when combining AND and OR operators.
Databases allow you to store NULL values in some fields. A NULL value means that a field contains no value or that the value is unavailable. The IS NULL clause is used to find NULL values.
Suppose that you need to access all of the Order Headers that have a NULL value for the CurrencyRate field. In this situation, you can use the IS NULL sentence as shown below.
SELECT OrderDate,SalesOrderNumber,CustomerID,Subtotal,TaxAmt,TotalDue FROM Sales.SalesOrderHeader WHERE CurrencyRateID is null
You can also use a more familiar syntax, such as the following:
SELECT OrderDate,SalesOrderNumber,CustomerID,Subtotal,TaxAmt,TotalDue FROM Sales.SalesOrderHeader WHERE CurrencyRateID=null
When using the above syntax, you do not receive any rows by default because the server uses ANSI (American National Standards Institute) rules. ANSI rules specify that, when comparing two NULL values, the result is neither true nor false; therefore, there are no rows to be returned. You can change the behavior of the server to have the same behavior as the IS NULL clause by using SET ANSI_NULLS OFF. The following example demonstrates how to change the behavior of the server.
SET ANSI_NULLS OFF SELECT OrderDate,SalesOrderNumber,CustomerID,Subtotal,TaxAmt,TotalDue FROM Sales.SalesOrderHeader WHERE CurrencyRateID=NULL
This query will return the same rows that you receive when using the IS NULL clause. Yet, the best way to query NULL values is to use the IS NULL clause, as ANSI recommends.
You may need to obtain only the rows in a table that have values. By combining the NOT clause with IS NULL, you can retrieve the rows that have a value for the specified columns. In the following example, you retrieve the rows that have an actual value in the CurrencyRate column from the SalesOrderHeader table.
SELECT OrderDate,SalesOrderNumber,CustomerID,Subtotal,TaxAmt,TotalDue FROM Sales.SalesOrderHeader WHERE NOT CurrencyRateID IS NULL
Because data stored in a database may not be in the right representation for a particular situation, you can transform it to make it useful. In the following example, the CASE clause is used to specify a new field that is related to another field stored in a database. A new type called FreightType has been created, and this type can have three valuesSoft, Normal, and Harddepending on the value of Freight stored in the database. The FreightType will be Soft if freight is less than 25, Normal if between 25 and 700, and Hard if greater than 700.
SELECT ShipDate,SalesOrderNumber,CustomerID,Subtotal,TaxAmt,TotalDue, CASE WHEN Freight<25 THEN 'Soft' WHEN Freight between 25 and 700 THEN 'Normal' ELSE 'Hard' END FreightType FROM Sales.SalesOrderHeader WHERE ShipDate='20010708'
The following figure illustrates the result of executing this query.
The CASE clause is a powerful tool to use in queries. You can also create a CASE statement for calculated fields. Because you can index the calculated columns in a database, these indexes can then be used to improve performance on some queries using these calculated values.
Suppose that you need to obtain the Subtotal, TaxAmt, and Total for all orders in the SalesOrderHeader table from July 2001. This can be accomplished in two different ways. The first is as follows :
SELECT Subtotal,TaxAmt,TotalDue FROM Sales.SalesOrderHeader WHERE YEAR(ShipDate)=2001 AND MONTH(ShipDate)=7
|More Info|| |
System functions, such as year and month , will be introduced later in this chapter.
The other method is found below.
SELECT Subtotal,TaxAmt,TotalDue FROM Sales.SalesOrderHeader WHERE ShipDate BETWEEN '20010701' AND '20010731'
There are no differences between the results of these two queries, for both obtain the orders shipped during July 2001. However, there are differences in the execution of these queries. The first query retrieves the date and month; after obtaining all rows, the calculation and comparison are completed. The second query makes only the comparison, not the calculation. If an index exists for these columns, it will be used in the second query but not in the first one. Therefore, the second query will be faster.
You can create an index using the following command:
CREATE INDEX ix_SalesOrderHeader_shipdate ON Sales.SalesOrderHeader (Shipdate)
As shown on the following page, the first sentence needs to scan the entire index to obtain the month and year and then compare them with the constants. The second sentence performs an index seek operation, whereby only the needed records of the index will be read, thus achieving better performance and consuming less server resources.
You can compare the execution of the two sentences by selecting both sentences on SQL Server Management Studio and displaying the estimated execution plan by pressing Crtl + L or by choosing Display Estimated Execution Plan from the Query menu.
Various strategies exist to optimize T-SQL queries by using indexes. Avoiding the use of functions, operations, and transformations on fields makes these strategies more efficient. After designing your queries, you can use the database Tuning Advisor to find indexes, find partitioning strategies, and find other recommendations to improve execution performance.