SELECT | Microsoft SQL Server 2005: The Complete Reference: Full Coverage of all New and Improved Features

The SELECT statement is the Swiss Army fishing pole of the SQL language. You will use it all the time in T-SQL and to build solid SQL Server solutions. The neat thing about this pole is that the entire fishing tackle tool box is wrapped up in its reel. With it you can strip, pare, screw, saw, hammer, combine, pick, scrape, dig, cut, and otherwise expose your data. For all intents and purposes, without the SELECT statement there is no such thing as a SQL query

You can be as creative as a honey badger trying to expose a beehive, but you can also get stung. As with any programming language, the longer and more often you scratch and claw at it, the better you will become. At first sight, a complex SELECT statement can seem impossible to decipher, but once broken down into its constituent parts, it becomes as exposed as the bones of a boiled carp.

The anatomy of the SELECT statement is bound by the order of the clauses. The optional clauses can be added or omitted to serve your purpose, but when used, they must be placed in the appropriate order. The abridged T-SQL SELECT, which is also used to infuse views, is as follows:

 SELECT (select_list)    INTO (new_table) FROM (table source)    WHERE (search_condition)    GROUP BY (group_by_expression)    HAVING (search_condition)    ORDER BY (order_expression) [ ASC / DESC ]

I prefer not to kick off by expanding every section in one behemoth of a SELECT syntax. I believe that if you are new at SQL queries, it makes more sense to look at the bare-bones SELECT and move up from there. After you have mastered basic selects, you can start using the optional constructions and add fat where you need to.

If you are not familiar with the T-SQL SELECT statement, the following treatise takes you through the statement line by line, clause by clause, argument f or argument. And if you are versed only in SQL-92, bear in mind that several new elements have been added, such as XML support.

SELECT 101

The SELECT keyword in the preceding code listing is marked in bold because it is the only part of the T-SQL SELECT statement that is required. All other clauses and arguments are optional. SELECT does not need a FROM clause when you are only specifying constants, variables, and arithmetic expressions in the select list. As soon as you specify column names, you need FROM, which is discussed later in this chapter. The simplest T-SQL select statement might be something as follows:

 SELECT 'Hello Universe'

As I am sure you have guessed, this statement does not need any table-based data to complete successfully. It simply returns “Hello Universe” as a single value in a one-row result set. Move from characters to integers and you can perform simple arithmetic after the SELECT, which also just puts the result into a result set, albeit just one value like this:

 SELECT 1

The words that come immediately after the SELECT statement in the query (in the preceding example, it’s the number “1”) are known as the select list. The select list contains the values or column names or domain names in a table you are querying. You can list as many values as you might need, but you must make sure the values are separated by the correct operators.

The rules of precedence also apply to the SELECT statement. For example, the SELECT statements

 SELECT 5-3/(8+9) SELECT 5-3/8+9

return different results thanks to rules of precedence (which are discussed in Chapter 10).

When you use divergent data types in the SELECT, obviously type rules apply. You certainly will obtain an error if you try something like dividing “cat” by “dog.”

You can have a SELECT statement that queries SQL Server as a system and is not “aimed” at any tables per se. For example, the line

 SELECT @@SERVERNAME

returns the name of the server you are connected to as a result set of one record (see Appendix for many functions and constructions that return result set data from SQL Server’s system tables). But the statement

 SELECT @@SERVERNAME + @@SERVICENAME

returns servernameservicename as a single value, or MCSQL00MSSQLSERVER, which is the server I am using for this little demo. In order to retrieve each value as a column, the items in the select list must be separated correctly by the comma. Thus the statement

 SELECT @@SERVERNAME, @@SERVICENAME

is returned as follows:

(No column name)	(No column name)
I MCSQL00	MSSQLSERVER

To add column names to the result set, the statement should look like this:

 SELECT 'SERVER NAME' = @@SERVERNAME, 'SERVICE NAME' = @@SERVICENAME

which will return the following result set:

SERVER NAME	SERVICE NAME
MCSQL00	MSSQLSERVER

And you can use square brackets instead of the single quotes to designate the column names, which is often preferable if you need the single quotes for various concatenations. In other words, the statement

 SELECT [SERVER NAME] = @@SERVERNAME, [SERVICE NAME] = @@SERVICENAME

returns the identical result set. The square brackets are also essential for identifying actual column names that are the combination of several words. For example, the column name total fences jumped by the counting sheep will cause SQL Server to blow a fuse unless it is enclosed in square brackets; thus,

 SELECT [Total fences jumped by the counting sheep] = 196

You would need a psych consult if you named columns like in the preceding example, which is extreme for illustration. Column names should be easy to reference in your queries.

You can also use T-SQL functions in the select list to massage the data. For example, you can trim spaces from the left and right of character data and even manipulate the character strings. You can concatenate and do snazzy stuff, like adding commas between values and so forth. Here’s another take on the preceding query before we move on to the harder stuff: The statement

 SELECT [Server and Service Name] = @@SERVERNAME + ', ' + @@SERVICENAME

will give you the following in return:

Server and Service Name
MCSQL00, MSSQLSERVER

Many Happy Returns

Just when you thought there was nothing between you and the select list just described, we come across three “qualifying” arguments that let you determine just how many rows of data satisfying the query should be returned in the result set. But these elements are essential tools used to fulfill those “wise use” goals we spoke about earlier. The arguments are listed in the following syntax in bold:

 SELECT [ ALL | DISTINCT ] [ TOP n [ PERCENT ] [ WITH TIES ] ]  . . . select_list FROM . . . table source

They must also be placed immediately after the SELECT keyword and before any item in the select list, including the * (star), which specifies all columns. The rows are returned in arbitrary order, as they were entered into the table, unless you specify a particular order.

ALL

The ALL (SELECT ALL) argument is the default and can be omitted. It specifies that all rows that satisfy the query criteria must be returned in the result set, even duplicate rows (integrity rules aside) and NULLs.

DISTINCT

The DISTINCT (SELECT DISTINCT) argument specifies that only unique rows can appear in the result set. In T-SQL, the NULL values are considered equal with respect to DISTINCT (see Chapter 10 for various discussions of NULL usage in T-SQL). NULL is never equal because an unknown or missing value can never be compared to another unknown or missing value. However, for the purposes of DISTINCT, the NULL values are returned. In other words, if you have two rows of all NULL data, only one will be returned.

TOP

In SQL Server 2000, the TOP argument (SELECT TOP) replaced the now passé SET ROWCOUNT command used in the days prior to version 7.0. The syntax for TOP is as follows:

 [     TOP (expression) [PERCENT]     [ WITH TIES ] ]

In a SELECT statement it forces the query to return only the first n rows that satisfy the query, in the order they were inserted into the table. The placeholder n must be an integer between 0 and 4294967295. For example, the query

 SELECT TOP 2 Agent_Name FROM Agents

will return A1 and A2 from the table. Be careful not to forget the name of the column you are querying, or use the * condition, because it is easy to think you can just query the table as SELECT TOP 2 FROM Agents. You can also specify more than the actual number of rows in the table; SQL Server will return all available rows less than the value of n instead of returning an error.

If you follow the TOP argument with the PERCENT option keyword, the number represented by placeholder n becomes a percentage value instead of an integer. In other words, the query

 SELECT TOP 50 PERCENT Agent_Name FROM Agents

will return 50 percent of the rows in the table (the TOP 50 percent, that is). As I am sure you have figured out, the PERCENT option is useful if you do not know how many records will be returned in the result set. Still, asking for the TOP 2 percent might be asking for more rows than you care to return to the client. With terabyte and exabyte databases fast becoming standard issue on the Internet, returning the TOP one percent might be enough rows to turn the network resources into something resembling mango compote. (Use the @@ROWCOUNT function to check the actual number of rows returned, and if there are too many rows you can suggest a more sensible percent value to the user).

And what about rules for PERCENT? The percent value must be between 0 and 100; anything above will error out.

The TOP can also be suffixed with the WITH TIES option. This option forces the query to return additional rows with the same value in ORDER BY columns appearing as the last of the TOP n (PERCENT) rows. As an essential rule, the WITH TIES qualifier can only be specified if your query specifies the ORDER BY clause. If this sounds confusing, have a look at the result set of the following SELECT TOP:

 SELECT TOP 5 Agent_Name FROM Agents

The query returns the following result set:

Agent_Name
A1
A2
A3
A4
A5

Now suppose the sixth row was also “A5” and you specified WITH TIES followed by ORDER BY on Agent_NAME like this:

  SELECT TOP 5 WITH TIES Agent_Name FROM Agents ORDER BY Agent_Name

The result set would now be as follows for a TOP 5 query:

Agent_Name
A1
A2
A3
A4
A5
A5

Since the sixth row was also A5, WITH TIES brings it with. If you selected without the WITH TIES option TOP would honor the parameter of “5” and return only the five rows shown here.

If you have reason to use TOP accompanied by WITH TIES, it is important to understand that these arguments act on the data returned in the result sets and have nothing to do with what lies below in the base table. Also, WITH TIES only works if the query is capped with the ORDER BY clause on the pertinent column and any rows that tie at the end of the result set.

If you need to, you can also tag ASC or DESC onto the end of the query to specify returning the TOP items in ascending order or descending order. The results will be very different because the column is sorted ASC or DESC on the base table and not on the final result of the query.

You can also specify DISTINCT with TOP to drop any duplicate rows in the result set.

TOP Expression as a Variable

The TOP expression indicated in the syntax described earlier can be declared as an expression in T-SQL supporting SQL Server 2005. To a declare a TOP variable the syntax is as follows:

 DECLARE @MyTopVar INT SET @MyTopVar = 25 SELECT TOP (@MyTopVar) UserID FROM CRM. Users

The TOP clause with variable expressions can also be used in INSERT, UPDATE and DELETE statements.

Understanding the Select List

As mentioned earlier, items specified in the select list, separated by commas, represent the columns from which data is selected for the result set. If we look at the syntax of the select list, we see that it can be simple, as demonstrated previously, or complex enough to give you a hernia.

 {* | { table_name | view_name | table_alias}.* | { column_name | expression | IDENTITYCOL | ROWGUIDCOL } [ [ AS ] column_alias ] | column_alias = expression } [ ,...n ]

The first * (star) operator forces the query to select all of the columns for extraction to the result set. The * should rarely be used, if ever, in SQL Server thin-client solutions. Large result sets do not sit well in browsers and devices where memory is scarce. Some exceptions might fly if the table, or a view above it, only holds a few rows and few columns, such as my Agents table shown earlier. The * scope can work for result sets held at the server, but make sure not to force it up against a table with many columns. And if you specify more than one table in the FROM clause, the * can end up returning more data than you care to deal with.

On the other hand, one big round trip to the server might be better than two dozen if the client can handle a thousand rows. It all depends on the solution you need. You thus need to balance “data ecology” with “network ecology”

Unless you specify an order (using GROUP BY) that you want to see in the result set, the columns are returned in the order in which they exist in the table or view. I will get back to the ORDER BY option later, but you should be aware that it applies to how the results are sorted, not how the columns are ordered in the result set.

The syntax table_name | view_name | table_alias. *means that you can prefix columns with the names of the tables or view, or their aliases. This practice is suggested to avoid ambiguity, especially when dealing with more than one table that has the same column names. For example, the statement SELECT Orders. CustomerID FROM Orders is the same thing as SELECT CustomerID FROM Orders. It also helps the SQL Server optimizer. (See Chapter 16 on the subject of identifiers.)

The * attached to the table name not only limits the scope of the * to the specified table or view, but it has the same result as SELECT * demonstrated earlier. The column_name placeholder is the name of a column to return in the result set, as mentioned earlier. However, the column name or names placeholder can also be a constant, a function, or any combination of column names and functions connected by operators and subqueries, or both.

IDENTITYCOL returns the identity column values to the result set. (See Chapter 2 and Chapter 15 for more information on IDENTITY columns.) If more than one table in the FROM clause has a column with the IDENTITY property, IDENTITYCOL must be qualified with the specific table name, such as T1.IDENTITYCOL.

The ROWGUIDCOL returns the values of the global unique identifier column. As in the IDENTITY column, if more than one table in the FROM clause has a ROWGUIDCOL property, the ROWGUIDCOL must be qualified with the specific table name, such as T1.ROWGUIDCOL.

The placeholder column_alias refers to an alternative name that you can use in the place of the actual column name returned in the result set. The ability to alias is useful and lets you return a more suitable column name in the result set than the actual column name in the base table. Often the base table column names are not suitable for a result set and display at the client. For example, the column name Cust_Lname is hardly a visually pleasing column name. To provide an alias for this clumsy column name, you need to precede the alias with the AS directive. For example, the statement

 SELECT Cust_Lname AS 'Last Name' FROM Customers

will return the following result set:

Last Name
Smith
Smit
Smitz

You can also use the alias to specify column names returned in the result sets of expressions. For example, the statement querying the Qty column in the order details table

 SELECT SUM(Qty) AS 'Fruit Cakes' FROM [order details] WHERE product ID = 5

will return the following result set:

Fruit Cakes
298

You can also use the alias feature in an ORDER BY clause. Also, if the query expression is part of a DECLARE CURSOR statement, the column alias cannot be used in the FOR UPDATE clause. By the way, using the equal sign (=) as follows also works:

 SELECT 'Fruit Cakes' = SUM(Qty) FROM [order details] WHERE product ID = 5

However, you might find your code easier to read using AS.

SELECT…INTO

The INTO clause of the T-SQL SELECT statement creates a new table to hold the result set data. The syntax is as follows:

 [ INTO new_table ]

The new_table placeholder specifies the name of a new table to be created. The new table is composed of the columns and rows in the result set. It does not contain any of the attributes of the base table. For example, if you used an alias for the names of result set columns, then these aliases carry to the new table.

The INTO functionality creates the new table based on the expressions in the select list. This alone determines the format of the new table, and each new column has the same name, data type, and value as the corresponding expression in the select list.

There are, however, some provisions. If you include a computed column in the select list, the corresponding column in the new table does not inherit the computed column attribute, and the values in the new column are the values that were computed at the time SELECT…INTO was executed.

If you are moving up from earlier versions of SQL Server, you’ll remember that you had to enable the select into/bulkcopy option. In SQL Server 2005, however, the select into/ bulkcopy option has no bearing on whether you can create a table with SELECT INTO; it only has a bearing on whether you can create a permanent table. The amount of logging for certain bulk operations, including SELECT INTO, depends on the recovery model in effect for the database (see Chapter 7).

You can choose to execute the system stored procedure sp_dboption to turn on the select into/bulkcopy option before executing the SELECT statement; however, you should be aware of the following behavior:

If the select into/bulkcopy option is enabled for the database in which the table is to be created, a permanent table is created. The table name must be unique in the database and must conform to the rules for identifiers as a fully qualified table name.
If the select into/bulkcopy is not enabled for the database where the table is to be created, you will not be able to create a permanent table using SELECT…INTO. The tables you will create have to be local or global temporary tables that must begin with a number sign (#).

Permissions, of course, apply, which means you can only execute a SELECT statement with the INTO clause if you have the CREATE TABLE permission in the target database. SELECT…INTO cannot be used with the COMPUTE clause or inside an explicit transaction.