Selecting Information from the Data
After data is in the database, it is inevitable that the data will need to be accessed, changed, and
SELECT
statements can be made very complex with the use of options that can join many tables together and with functions that can calculate and summarize data at the same time.
SELECT
statements also can often be as simple as one line of code that retrieves the
To start out simple, you will often be retrieving all the data from a particular table. Even if the final query is not intended to get all the data, you can often begin the data analysis by examining all the rows and
SELECT * FROM MYSERVER.Northwind.dbo.Employees
Note the asterisk used to obtain all columns from the
Employees
table. It is also worth noting the use of the four-part
Four-part
SELECT * FROM Employees SELECT * FROM dbo.Employees SELECT * FROM Northwind.dbo.Employees SELECT * FROM Northwind..Employees
Although often a query will go after all the data in a table, there are a considerable number of options available for the query statement. You can choose some of the columns of the table, provide
The exam objectives require you to know how to access data in different ways, providing for different looks into the data with user-friendly views and other reporting
Narrowing the Scope of Data RetrievalThe first portion of the SELECT statement identifies which columns will come from a table. When specifying column names, be sure to use a comma-delimited list and don't place a comma after the final column name. When specifying column names, you can use an asterisk ( * ), designating that all columns are to be returned from the specified table or tables. The asterisk can be provided by itself as seen previously or supplied in addition to columns selected. The following query selects a few of the columns from the Employees table: SELECT EmployeeID, FirstName, LastName, Title FROM Employees
You can
SELECT EmployeeID AS 'Employee ID',
'First Name' = FirstName,
LastName 'Last Name', Title
FROM Employees
SELECT EmployeeID AS 'Employee ID',
'First Name' AS FirstName,
LastName AS 'Last Name',
Title
FROM Employees
Notice that the previous column aliases have been enclosed within single quotation marks. This enclosure needs to be made when the column alias includes spaces. The alias name needs to be
Sometimes you need to show two columns as one by combining two columns together. When you do this, you are using a method called string concatenation. Concatenation can be thought of as joining strings together just as you can combine words into phrases. The operator used to perform the concatenation is the plus ( + ) sign. You can create a singular name column by combining the last name and first name values:
SELECT TitleOfCourtesy + ' ' + FirstName + ' ' + LastName
AS 'Employee Name',
Title
FROM Employees
As you can see, multiple strings can be
The
SELECT INTO
statement can perform a data insertion and create the table for the data in a single operation. The new table is
SELECT TitleOfCourtesy + ' ' + FirstName + ' ' + LastName
AS 'Employee Name',
Title
INTO HRTable
FROM Employees
The
INTO
clause creates a table, so it is important that the table does not exist when you're using the command. If the
Use of the
DISTINCT
clause eliminates duplicate rows from any resultset. A
SELECT
query may return records with equal values, but using
DISTINCT
eliminates duplicates and
Figure 5.2. Listing DISTINCT cities.
The DISTINCT keyword is optional when calculations are performed using SUM , AVG , and COUNT . When DISTINCT is used, duplicate values are eliminated before the calculation is performed. Limiting the number of rows returned from queries is common and serves a number of purposes. This narrowing of data is referred to as horizontal filtering.
The
WHERE
clause is used to limit the number of rows in the resultset based on defined restrictions. These restrictions are specified as conditional arguments, such as
Salary>10000
,
LastName LIKE 'G%'
, or
State = 'FL'
. SQL Server will always execute the
WHERE
action first to discover which rows should be
Similar to the WHERE clause is the HAVING clause. In a SELECT statement these clauses control the rows from the source tables that are used to build the resultset. WHERE and HAVING are filters. They specify a series of search conditions, and only those rows that meet the terms of the search conditions are used to build the resultset. To address how these clauses are used, we must understand the conditions that can be applied within these clauses. Conditional Filtering of Data
Filtering data is used for determining the data to be selected based on conditional requirements. Essentially, all conditions come down to one of three possible
Operators play an important part in determining the content of any conditional operations. An operator is a symbol specifying an action performed on one or more expressions. In SQL Server these operators are divided into a few elementary categories, as presented in the
Comparison OperatorsComparison operators can be used with character, numeric, or date data expressions. Their purpose is to determine a Boolean result based on the comparison of the expressions. Table 5.1 lists all comparison operators that are valid in SQL Server. Table 5.1. Comparison Operators
Whereas comparison operators evaluate the differences between two or more values, arithmetic operators aid in processing mathematical functions against values. Arithmetic OperatorsArithmetic operators perform mathematical operations on two expressions. A complete listing of the available arithmetic operators and their uses is shown in Table 5.2. Table 5.2. Arithmetic Operators
Although these operators usually perform actions against numeric data, the plus ( + ) and minus ( - ) operators can also be used to perform arithmetic operations on date values. Logical Operators
Logical operators are also known as Boolean operators. The three logical operators are
AND
,
OR
, and
NOT
. Their meanings are pretty straightforward:
AND
adds an additional filter condition to the one specified and returns
trUE
only when both or all conditions specified are met. The
OR
logical operator adds another filter condition to the existing condition as well, but it returns
trUE
when either condition is met.
NOT
Figure 5.3. Compound conditions using the logical operators AND, OR, NOT .
Brackets can significantly alter the results of operations that use logical operators. Test all conditions and alter the precedence by using brackets. According to the order of operations, bracketed comparisons are performed first. The logical operators are evaluated in the order: () first, then NOT , then AND , and finally OR . Bitwise and Unary OperatorsBitwise operators are used on int , smallint , or tinyint data. The ~ (bitwise NOT ) operator can also use bit data. All bitwise operators perform an operation on one or more specified integer values as translated to binary expressions within T-SQL statements. The bitwise NOT operator changes binary 1s to 0s and 0s to 1s. Bitwise operators perform bit manipulations between two expressions of any of the data types of the integer data type category. The set of operators is shown in Table 5.3. Table 5.3. Bitwise Operators
{% if main.adsdop %}{% include 'adsenceinline.tpl' %}{% endif %} The operands for bitwise operators can be any of the data types of the integer or binary string data type categories, with the exception that both operands cannot be any of the data types of the binary string data type category. Unary operators (see Table 5.4) perform an operation on only one expression of any of the data types of the numeric data type category. The + (positive) and - (negative) operators can be used on any expression of any of the data types of the numeric data type category. The ~ (bitwise NOT ) operator can be used only on expressions of any of the data types of the integer data type category. Table 5.4. Unary Operators
BETWEEN, IN, and EXISTS
Ranges can also be specified when using the
WHERE
clause with the help of the
BETWEEN
keyword. Simply put,
BETWEEN
provides a range of values within which the data should lie;
SELECT * FROM Products WHERE UnitPrice BETWEEN 10 AND 20 If the intent was to exclude the value 20 , the query would be written like this: SELECT * FROM Products WHERE UnitPrice BETWEEN 10.00 AND 19.99
You can also
SELECT * FROM Customers WHERE Country IN ('UK', 'USA')
This example limits the values of
Country
to only
UK
and
USA
. Customers who live in the
The use of IN is similar to that of EXISTS . When a subquery is introduced with the keyword EXISTS , it functions as an existence test. The WHERE clause of the outer query tests for the existence of rows returned by the subquery. The subquery does not actually produce any data; it returns a value of trUE or FALSE . The following is an example of the use of EXISTS to find only those customers in the United States:
SELECT DISTINCT Pub_Name FROM Publishers
WHERE EXISTS (SELECT * FROM titles
WHERE Pub_id = Publishers.Pub_id
AND Country = 'USA')
Existence tests can be useful in solving some problems, but because of the use of subqueries, the query itself is very inefficient and can usually be accomplished more
Similar Conditioning Using LIKE
You can retrieve rows that are based on portions of character strings by using the
LIKE
predicate. The
LIKE
predicate determines whether a given character string matches a specified pattern. The types of data a
LIKE
statement can work with are
char
,
varchar
,
nvarchar
,
nchar
,
datetime
,
smalldatetime
, and
text
. A pattern specified in the
LIKE
predicate can include regular characters and wildcard characters. During pattern matching, regular characters must exactly match the characters specified in the column value. Wildcard characters can be matched with any character or set of
Table 5.5. The Wildcard Characters Allowed in T-SQL
If your application repeatedly calls the
LIKE
predicate and performs
SELECT CustomerID, ContactName FROM Customers WHERE CustomerID LIKE 'A%' You can also use the NOT keyword with the LIKE predicate, which simply retrieves a query that does not contain records matching the specified elements in the LIKE clause. With character matching it is sometimes more efficient to exclude characters with the use of NOT . It is common to use a negative test in particular when looking for values that represent true data, as in NOT NULL . Selecting Rows Based on NULL Values
A
NULL
value is a value given to a field that that has no value. Many people confuse
NULL
values with
Many queries involving
NULL
tests depend on the ANSI connection settings for the session. When you
SET ANSI NULLS ON
, a comparison in which one or more of the expressions is
NULL
returns
UNKNOWN
as the result of the comparison. A value that is not known cannot be compared against another value. Use the
IS NULL
or
IS NOT NULL
instead to test for
NULL
values. T-SQL does support an extension of the traditional behavior that allows for comparison operators to return
TRUE
or
FALSE
as a comparison result. If you
SET ANSI NULLS OFF
, comparisons to
NULL
return
TRUE
when the value being compared against is also a
NULL
. The comparison returns
FALSE
for any other value. Regardless of the
ANSI NULLS
setting,
NULL
values are always
Putting It All in ORDER
Putting data in order provides for a more meaningful display of data and enables the data to be presented in a manner that meets additional reporting requirements set by most front-end applications. Data can be ordered in either
ORDER BY
determines the sequence of data based on column(s) selected and sequencing requested: ascending (
ASC
) or descending (
DESC
). Descending orders rows from highest to lowest; ascending orders rows from
After data selection has been determined, the actual data sent is ready for an ordering process if one has been defined. The ordering of data is optional, and if it is not present in a request, the data is sent in an order determined at the data source. Ordering data is a necessity when using the TOP condition. Returning TOP rowsThe TOP clause limits the number of rows returned in the resultset to a specified number or percentage at the top of a sorted range. Select top 50: Returns the top 50 rows Select top 50 percent: Returns the top 50% of the rows
There is an alternative to
TOP
. You can also limit the number of rows to return using
SET ROWCOUNT
N
. The difference between these two is that the
TOP
keyword applies to the single
SELECT
statement in which it is specified.
SET ROWCOUNT
stays in effect until another
SET ROWCOUNT
statement is executed, such as
SET ROWCOUNT 0
to
You can optionally specify that the TOP keyword is to use the WITH TIES option, in which case any number of records can possibly be displayed. WITH TIES displays all records that are equivalent to the last matching element. If you are looking for the top 10 employees and there is a tie for 10th between two employees, 11 or more records are displayed. If the tie is for 9th or a higher position, only 10 records are listed. Of course, after you begin placing data into the desired order, the next thing would be to group the output and perform calculations based on the group. Grouping allows the production of subtotals and also provides more usable output in applications that require grouped output. Displaying Groups in OutputThe GROUP BY clause of the SELECT statement creates groups within the data. These groups can be used to display data in a more orderly fashion or produce more meaningful results through the use of aggregate functions. The GROUP BY clause specifies the groups into which output is to be shown and, if aggregate functions are included, calculations of summary values are performed for each group. When GROUP BY is specified, either each column in any non-aggregate expression in the select list should be included in the GROUP BY list, or the GROUP BY expression must match exactly the select list expression. It is important to note that if the ORDER BY clause is not specified, groups returned using the GROUP BY clause are not in any particular order. It is recommended that you always use the ORDER BY clause to specify a particular ordering of the data. Data will still be collected into groups. Examine the following example: SELECT Country, Count(DISTINCT City) AS 'Number of Cities' FROM Customers GROUP BY Country
Countries are collected together and are placed in the order
SELECT Country, Count(DISTINCT City) AS 'Number of Cities' FROM Customers GROUP BY Country ORDER BY Count(DISTINCT City) DESC
You may not want all groups to be included in the output. To exclude groups from the recordset, you can utilize the
HAVING
clause, which operates against the groups of data in the same way that the
WHERE
clause acts against the individual rows. In the next example, as shown in Figure 5.4, the listing has been narrowed down by the
Figure 5.4. GROUP BY, HAVING , and ORDER BY used together.
COMPUTE and COMPUTE BY options of the SELECT statement also fall into the topic of grouping data. These two clauses can serve a purpose similar to that of the GROUP BY clause. GROUP BY will produce a single output in which there is a single row for each group. The output will contain only the grouping columns and aggregate functions, and the select list can contain only the grouping columns and aggregate functions. COMPUTE produces multiple results, the detail rows for each group containing the expressions from the select list and aggregate for the group, or the total aggregate for the statement. A COMPUTE list can contain expressions other than the grouping columns or aggregate functions, and aggregate functions are specified in the COMPUTE clause, not in the select list, as shown in the following example: SELECT DISTINCT Country, City FROM Customers ORDER BY Country, City COMPUTE Count(City) BY Country More Advanced Query GroupsCUBE and ROLLUP , along with GROUPING , can be used to effectively summarize data. The CUBE operator is primarily used in data warehousing applications. It is used to generate a resultset that is a multidimensional cube. Without getting too deep into a data warehousing discussion, a data cube is based on columns that are needed for data analysis. These columns within data cubes are more appropriately referred to as dimensions. The cube is a resultset containing a cross-tabulation of all the possible combinations of the dimensions. The CUBE operator is specified in the GROUP BY clause of a SELECT statement. The select list contains the dimensions and aggregate expressions. The resultset contains all possible combinations of the values in the dimensions, along with the aggregate values from the underlying rows. Figure 5.5 represents the output from using CUBE . Figure 5.5. A sample of CUBE output.
The query in Figure 5.5 returns a resultset that contains the units-on-hand
NULL
values can present a problem in distinguishing a
NULL
generated by the
CUBE
from a
NULL
in the actual data. You can solve this
Figure 5.6. GROUPING with CUBE .
Roll Up the Cube
The
ROLLUP
operator is similar to
CUBE
but somewhat more useful in standard applications that do not
Figure 5.7. A sample of ROLLUP output.
We have peered into a little of what data warehousing is about, but let's bring the focus back to the heart of the design exam. SQL Server is a relational database management system. This section focuses on the "relational" part. It covers how tables relate to one another. In Chapter 2, "Creating a Logical Data Model," you covered how primary keys and foreign keys are used in a database model, and how they impact storing of data. This chapter shows you how to retrieve data from a relational database. This involves using the various join types: INNER , RIGHT , LEFT , and CROSS . You're also going to learn about using a technique called derived tables to simplify query writing. |