3 4
You can use the SELECT statement to retrieve data from a SQL Server database and to present the data back to the user in one or more result sets. A result set is a tabular arrangement of the data that is retrieved by executing the SELECT statement. Like a table, the result set comprises columns and rows. This lesson provides an overview of the main components of a SELECT statement and how those components—when used in a complete SELECT statement—can be used to retrieve specific data from a SQL Server database and present that data in a result set.
A SELECT statement in Transact-SQL enables you to retrieve existing data from a SQL Server database. Most SELECT statements describe four primary properties of a result set:
For example, the following SELECT statement retrieves the product ID, name, and unit price of any products in the Products table whose unit price exceeds $40:
SELECT ProductID, ProductName, UnitPrice FROM Products WHERE UnitPrice > 40 ORDER BY UnitPrice ASC
The SELECT clause in the previous example defines which column values should be retrieved, and the FROM clause identifies the table that contains these columns. The WHERE clause limits the result set to those products whose UnitPrice value is greater than $40. The ORDER BY clause specifies that the result set is to be sorted in ascending sequence based on the value in the UnitPrice column.
The full syntax of the SELECT statement is complex, but the main clauses can be summarized as follows:
SELECT select_list
[INTO new_table_name]
FROM table_list
[WHERE search_conditions]
[GROUP BY group_by_list]
[HAVING search_conditions]
[ORDER BY order_list [ASC | DESC] ]
The remainder of this lesson discusses each clause in detail and provides examples of how you can define these clauses to retrieve specific data from a SQL Server database. For further details about each clause, refer to SQL Server Books Online.
The SELECT clause includes the SELECT keyword and the select list. The select list is a series of expressions separated by commas. Each expression defines a column in the result set. The columns in the result set are in the same order as the sequence of expressions in the select list.
The select list can also contain keywords that control the final format of the result set.
The DISTINCT keyword eliminates duplicate rows from a result set. For example, the Orders table in the Northwind database contains duplicate values in the ShipCity column. To get a list of the ShipCity values with duplicates removed, enter the following code:
SELECT DISTINCT ShipCity, ShipRegion FROM Orders ORDER BY ShipCity
The TOP n keyword specifies that the first n rows of the result set are to be returned. If ORDER BY is specified, the rows are selected after the result set is ordered. The n placeholder is the number of rows to return (unless the PERCENT keyword is specified). PERCENT specifies that n is the percentage of rows in the result set that are returned. For example, the following SELECT statement returns the first 10 cities in alphabetic sequence from the Orders table:
SELECT DISTINCT TOP 10 ShipCity, ShipRegion FROM Orders ORDER BY ShipCity
You can improve the readability of a SELECT statement by giving a table an alias (also known as a correlation name or range variable). A table alias can be assigned either with or without the AS keyword:
In the following example, the alias p is assigned to the Publishers table:
USE pubs SELECT p.pub_id, p.pub_name FROM publishers AS p
IMPORTANT
A select list can include many types of information, such as a simple expression or a scalar subquery. The following example shows many of the items that you can include in a select list:
SELECT FirstName + ' ' + LastName AS "Employee Name", IDENTITYCOL AS "Employee ID", HomePhone, Region FROM Northwind.dbo.Employees ORDER BY LastName, FirstName ASC
In this statement, the employees' first and last names are combined into one column. A space is added between the first and last names. The name of the column that will contain the employee names is Employee Name. The result set will also include the identity column, which will be named Employee ID in the result set; the HomePhone column; and the Region column. The result set is ordered first by last name and then by first name.
The INTO clause enables you to specify that the result set will be used to create a new table with the name defined in the clause. A SELECT...INTO statement can be used to combine data from several tables or views into one table. You can also use it to create a new table containing data selected from a linked server. The following example uses a SELECT statement to retrieve values from the FirstName and LastName columns of the Employees table:
SELECT FirstName, LastName INTO EmployeeNames FROM Employers
The result set that is generated by the statement creates the EmployeeNames table. The new table will contain the FirstName column and the LastName column, and those columns will contain the values from the Employees table. The result set is not displayed in the Results pane unless you specifically query the new table.
The FROM clause is required in every SELECT statement in which data is being retrieved from tables or views. You can use the FROM clause to list the tables and views containing the columns referenced in the select list and in the WHERE clause. You can give the table or view names aliases by using the AS clause. You can also use the FROM clause to join tables by specifying join conditions in the ON clause.
The FROM clause is a comma-separated list of table names, view names, and JOIN clauses. The following SELECT statement uses the FROM clause to specify the Shippers table:
SELECT * FROM Shippers
You can also use the FROM clause to specify joins between two tables or views. Lesson 2 will discuss joins in more detail.
The WHERE and HAVING clauses in a SELECT statement control the rows from the source tables that are used to build the result set. The WHERE and HAVING clauses are filters. They specify a series of search conditions, and only those rows that meet the terms of the search conditions are used to build the result set. Those rows that meet the search conditions are said to be qualified to participate in the result set. For example, the WHERE clause in the following SELECT statement returns only those rows where the region is Washington state:
SELECT CustomerID, CompanyName FROM Northwind.dbo.Customers WHERE Region = 'WA'
The HAVING clause is typically used in conjunction with the GROUP BY clause, although it can be specified without GROUP BY. The HAVING clause specifies more filters that are applied after the WHERE clause performs its filtering. The following SELECT statement includes a WHERE clause, a GROUP BY clause, and a HAVING clause:
SELECT OrdD1.OrderID AS OrderID, SUM(OrdD1.Quantity) AS "Units Sold", SUM(OrdD1.UnitPrice * OrdD1.Quantity) AS Revenue FROM [Order Details] AS OrdD1 WHERE OrdD1.OrderID in (SELECT DISTINCT OrdD2.OrderID FROM [Order Details] AS OrdD2 WHERE OrdD2.UnitPrice > $100) GROUP BY OrdD1.OrderID HAVING SUM(OrdD1.Quantity) > 100
In this SELECT statement, the WHERE clause returns only those orders that are selling a product with a unit price exceeding $100, and the HAVING clause further restricts the result to only those orders that include more than 100 units. The GROUP BY clause limits the rows for each distinct value in the OrderID column.
The GROUP BY clause is used to produce aggregate values for each row in the result set. When used without a GROUP BY clause, aggregate functions report only one aggregate value for a SELECT statement.
The GROUP BY keywords are followed by a list of columns, known as the grouping columns. The GROUP BY clause restricts the rows of the result set. There is only one row for each distinct value in the grouping column or columns. Each result set row contains summary data related to the specific value of its grouping columns.
SQL Server places restrictions on the items that can be specified in the select list when a SELECT statement contains a GROUP BY clause. The select list can contain the grouping columns and expressions that return only one value for each value in the grouping columns, such as aggregate functions (vector aggregates) that have a column name as one of their parameters.
Typically, the HAVING clause is used with the GROUP BY clause, although HAVING can be specified separately. You can group by an expression as long as it does not include aggregate functions.
In a GROUP BY clause, you must specify the name of a table or view column, not the name of a result set column assigned with an AS clause. You can list more than one column in the GROUP BY clause to nest groups; that is, you can group a table by any combination of columns.
Understanding the correct sequence in which the WHERE, GROUP BY, and HAVING clauses are applied helps in coding efficient queries:
For any search conditions that could be applied either before or after the grouping operation, it is more efficient to specify them in the WHERE clause. This action reduces the number of rows that have to be grouped. The only search conditions that should be specified in the HAVING clause are those search conditions that must be applied after the grouping operation has been performed.
The ORDER BY clause sorts a query result by one or more columns (up to 8060 bytes). A sort can be ascending (ASC) or descending (DESC). If neither is specified, ASC is assumed. If more than one column is named in the ORDER BY clause, sorts are nested.
The following statement sorts the rows in the Titles table, first by publisher (in descending order), then by type (in ascending order within each publisher), and finally by price (also ascending, because DESC is not specified):
USE Pubs SELECT Pub_id, Type, Title_id, Price FROM Titles ORDER BY Pub_id DESC, Type, Price
In this exercise, you will use SELECT statements to retrieve data from the Pubs database. Each SELECT statement after the first one will build on the previous one as you become more specific about the data that is included in the result set. To perform this exercise, you should be logged into your Windows 2000 Server computer as Administrator.
USE Pubs SELECT * FROM Titles
In this statement, you are identifying the database containing the table you want to view and you are using a SELECT statement to retrieve all data from the Titles table in the Pubs database. The asterisk ( * ) in the select list indicates that data should be retrieved from all columns in the table.
The result set appears in the Grids tab of the Results pane.
USE Pubs SELECT Title_id, Title, Price, Ytd_sales FROM Titles
In this statement, you are using a SELECT statement to retrieve data from the Title_id, Title, Price, and Ytd_sales columns in the Pub database.
The result set appears in the Grids tab of the Results pane.
USE Pubs SELECT Title_id, Title, Price, Ytd_sales FROM Titles WHERE Price > 10
The SELECT statement will now retrieve only those rows whose value in the Price column is greater than $10.
The result set appears in the Grids tab of the Results pane.
USE Pubs SELECT Title_id, Title, Price, Ytd_sales FROM Titles WHERE Price > 10 ORDER BY Price DESC, Title
The result set returned by this SELECT statement will be ordered first by price, in descending order, and then by title, in ascending order.
The result set appears in the Grids tab of the Results pane.
USE Pubs SELECT Type, AVG(Price) AS AvgPrice FROM Titles WHERE Price > 10 GROUP BY Type ORDER BY AvgPrice DESC
The result set returned by this SELECT statement will group together those rows with the same Type value. Rows that do not meet the conditions in the WHERE clause will be eliminated before any grouping is done. The values from the Price column will be averaged for each grouping, and that average will be inserted in the result set under the AvgPrice column. The values in the AvgPrice column will be listed in descending order.
The result set appears in the Grids tab of the Results pane.
USE Pubs SELECT Type, AVG(Price) AS AvgPrice INTO TypeAvgPrice FROM Titles WHERE Price > 10 GROUP BY Type ORDER BY AvgPrice DESC
The SELECT statement will create a new table named TypeAvgPrice. The table will contain the Type column and the AvgPrice column, and those columns will contain the values from the result set.
A message appears in the Messages tab of the Results pane, providing the number of rows that have been affected.
SELECT * FROM TypeAvgPrice
The contents of the TypeAvgPrice table are displayed in the Grids tab of the Results pane.
DROP TABLE TypeAvgPrice
A message appears in the Messages tab of the Results pane, stating that the command has completed successfully.
The SELECT statement is used to retrieve data from a SQL Server database and to present the data back to the user in one or more result sets. The main clauses of a SELECT statement are the SELECT clause, the INTO clause, the FROM clause, the WHERE clause, the GROUP BY clause, the HAVING clause, and the ORDER BY clause. The SELECT clause includes the SELECT keyword and the select list. The select list defines the columns in the result set generated by a SELECT statement. The INTO clause enables you to specify that the result set will be used to create a new table with the name defined in the clause. The FROM clause is a comma-separated list of table names, view names, and JOIN clauses. You can use the FROM clause to list the tables and views containing the columns referenced in the select list and in the WHERE clause. You can also use the FROM clause to join types by specifying join conditions. The WHERE and HAVING clauses in a SELECT statement control the rows from the source tables that are used to build the result set. The GROUP BY clause is used to produce aggregate values for each row in the result set.