Querying Data


You have now looked at the database concepts of normalization, relations, and joins. You have also been introduced to the Galactic database. We will use this relational database throughout the remainder of this book for our examples. Now it is time to look more specifically at how you retrieve the data from the database into a format that you can use for reporting. This is done through the database query.

A query is a request for some action on the data in one or more tables. An INSERT query adds one or more rows to a database table. An UPDATE query modifies the data in one or more existing rows of a table. A DELETE query removes one or more rows from a table. Because we are primarily interested in retrieving data for reporting, the query that we are going to concern ourselves with is the SELECT query, which reads data from one or more tables (it does not add, update, or delete data).

We will look at the various parts of the SELECT query. This is to help you become familiar with this important aspect of reporting. The good news is that Reporting Services provides a tool to guide you through the creation of queries including the SELECT query. That tool is the Query Builder.

If you are familiar with SELECT queries and are more comfortable typing your queries from scratch, you can bypass the Query Builder and type in your queries directly. If SELECT queries are new to you, the following section will help you become familiar with the SELECT query and what it can do for you. Rest assured, the Query Builder will allow you to take advantage of all the features of the SELECT query without having to memorize syntax or type a lot of code.

Note

If you have another query-creation tool you like to use instead of the Query Builder, you can create your queries with that tool and then copy them into the appropriate locations in the report definition.

The SELECT Query

The SELECT query is used to retrieve data from tables in the database. When a SELECT query is run, it returns a result set containing the data that has been selected. With very few exceptions, your reports will be built on result sets that are created by SELECT queries.

The SELECT query is often referred to as a SELECT statement. One reason for this is that it can be read very much like an English sentence or statement. As with a sentence in English, it is made up of clauses that modify the meaning of the statement.

The various parts, or clauses, of the SELECT statement allow you to control the data that is contained in the result set. Use the FROM clause to specify which table the data will be selected from. The FIELD LIST permits you to choose the columns that will appear in the result set. The JOIN clause allows you to specify additional tables that will be joined with the table in the FROM clause to contribute data to the result set. The WHERE clause enables you to set conditions that determine which rows will be included in the result set. Finally, you can use the ORDER BY clause to sort the result set, and the GROUP BY clause and the HAVING clause to combine detail rows into summary rows.

The FROM Clause

The SELECT statement in its simplest form includes only a FROM clause. Here is a SELECT statement that retrieves all rows and all columns from the Customer table:

SELECT * FROM dbo.Customer

The word “SELECT” is required to let the database know that this is going to be a SELECT query as opposed to an INSERT, UPDATE, or DELETE query. The asterisk (*) means that all columns will be included in the result set. The remainder of the statement is the FROM clause. It says that the data is to be selected from the Customer table. We will discuss the meaning of “dbo.” in a moment.

As was stated earlier, the SELECT statement can be read as if it were a sentence. This SELECT statement is read, “Select all columns from the Customer table.” If we run this SELECT statement in the Galactic database, the results would appear similar to Figure 3-24. The SELECT query is being run in the Query Builder window of Visual Studio. Note that the scroll bars on the right and on the bottom of the result set area indicate that not all the rows and columns returned can fit on the screen.

click to expand
Figure 3-24: The SELECT statement in its simplest form

You will note that the table name, Customer, has “dbo.” in front of it. The “dbo” is the name of the owner of the table. Usually this is the user who created the table. Here, “dbo” stands for “database owner,” meaning that the user who owns the database is also the user who owns the table. The dbo abbreviation is also another name for the system administrator login. In many cases, an administrative user, logged into the database, will create the database tables. Because of this, the table owner will more than likely be “dbo.”

In the Galactic database, the dbo.Customer table was created by the system administrator. If another user with a database login of “User2” also has rights to create tables in the Galactic database, they could also create a Customer table. This second table would be known as “User2.Customer.”

This situation, with two tables of the same name in the same database, does not happen very often and is probably not a great idea. It can quickly lead to confusion and errors. Even though this is a rare occurrence, the Query Builder needs to account for this situation. The Query Builder uses both the name of the table owner and the name of the table itself in the queries it builds and executes for you.

The FIELD LIST

In the previous example, the result set created by the SELECT statement contained all the columns in the table. In most cases, especially when creating reports, you will only need to work with some of the columns of a table in any given result set. Including all the columns in a result set when only a few columns are required wastes computing power and network bandwidth.

A FIELD LIST provides the capability you need to specify which columns to include in the result set. When a FIELD LIST is added to the SELECT statement,

it appears similar to the following:

SELECT CustomerNumber, Name, BillingCity FROM dbo.Customer

The bold portion of the SELECT statement indicates changes from the previous SELECT statement.

This statement returns only the Customer Number, Name, and Billing City columns from the Customer table. The result set created by this SELECT statement is shown in Figure 3-25.

click to expand
Figure 3-25: A SELECT statement with a FIELD LIST

In addition to the names of the fields to include in the result set, the FIELD LIST can contain a word that influences the number of rows in the result set. Usually, there is one row in the result set for each row in the table from which you are selecting data. However, this can be changed by adding the word “DISTINCT” at the beginning of the FIELD LIST.

When you use DISTINCT in the FIELD LIST, you are saying that you only want one row in the result set for each distinct set of values. In other words, the result set from a DISTINCT query will not have any two rows that have exactly the same values in every column. Here is an example of a DISTINCT query:

SELECT DISTINCT BillingCity FROM dbo.Customer

This query returns a list of all the billing cities in the Customer table. There are a number of customers with the same billing city, but these duplicates have been removed from the result set, as shown in Figure 3-26.

click to expand
Figure 3-26: A DISTINCT query

The JOIN Clause

When your database is properly normalized, you are likely to need data from more than one table in order to fulfill your reporting requirements. As we discussed earlier in this chapter, the way to get information from more than one table is to use a join. The JOIN clause in the SELECT statement allows you to include a join of two or more tables in your result set.

The first part of the JOIN clause specifies which table is being joined. The second part determines the two columns that are linked to create the join. Joining the Invoice Header table to the Customer table looks like this:

SELECT dbo.Customer.CustomerNumber,     dbo.Customer.Name,     dbo.Customer.BillingCity,     dbo.InvoiceHeader.InvoiceNumber,     dbo.InvoiceHeader.TotalAmount FROM dbo.Customer INNER JOIN dbo.InvoiceHeader  ON dbo.Customer.CustomerNumber = dbo.InvoiceHeader.CustomerNumber 

With the Customer table and the Invoice Header table joined, you have a situation where some columns in the result set have the same name. For example, there is a Customer Number column in the Customer table and a Customer Number column in the Invoice Header table. When you use the FIELD LIST to tell the database which fields to include in the result set, you need to uniquely identify these fields using both the table name and the column name.

If you do not do this, the query will not run and you will receive an error. Nothing prevents you from using the table name in front of each column name, whether it is a duplicate or not, as in this example. Using the table name in front of each column name makes it immediately obvious where every column in the result set is selected from. The result set created by this SELECT statement is shown in Figure 3-27.

click to expand
Figure 3-27: A SELECT statement with a JOIN clause

You can add a third table to the query by adding another JOIN clause to the SELECT statement. This additional table can be joined to the table in the FROM clause or to the table in the first JOIN clause. In this statement, we will add the Loyalty Discount table and join it to the Customer table:

SELECT dbo.Customer.CustomerNumber,     dbo.Customer.Name,     dbo.Customer.BillingCity,     dbo.InvoiceHeader.InvoiceNumber,     dbo.InvoiceHeader.TotalAmount,     dbo.LoyaltyDiscount.Discount FROM dbo.Customer INNER JOIN dbo.InvoiceHeader  ON dbo.Customer.CustomerNumber = dbo.InvoiceHeader.CustomerNumber INNER JOIN dbo.LoyaltyDiscount  ON dbo.Customer.CustomerNumber = dbo.LoyaltyDiscount.CustomerNumber 

The result set from this SELECT statement is shown in Figure 3-28. Notice that the result set is rather small. This is because Landmark, Inc. is the only customer currently receiving a loyalty discount. Because an INNER JOIN was used to add the Loyalty Discount table, only customers that have a loyalty discount are included in the result set.

click to expand
Figure 3-28: A SELECT statement with two JOIN clauses

To make our result set a little more interesting, let’s try joining the Loyalty Discount table with an OUTER JOIN rather than an INNER JOIN. Here is the same statement, except the Customer table is joined to the Loyalty Discount table with a LEFT OUTER JOIN:

SELECT dbo.Customer.CustomerNumber,     dbo.Customer.Name,     dbo.Customer.BillingCity,     dbo.InvoiceHeader.InvoiceNumber,     dbo.InvoiceHeader.TotalAmount,     dbo.LoyaltyDiscount.Discount FROM dbo.Customer INNER JOIN dbo.InvoiceHeader  ON dbo.Customer.CustomerNumber = dbo.InvoiceHeader.CustomerNumber LEFT OUTER JOIN dbo.LoyaltyDiscount  ON dbo.Customer.CustomerNumber = dbo.LoyaltyDiscount.CustomerNumber 

The result set for this SELECT statement is shown in Figure 3-29. Notice that the value for the Discount column is NULL in the rows for all the customers except for Landmark, Inc. This is to be expected because there is no record in the Loyalty Discount table to join with these customers. When there is no value in a column, the result set will contain a NULL value.

click to expand
Figure 3-29: A SELECT statement with an INNER JOIN and an OUTER JOIN

The WHERE Clause

Up to this point, the result sets have included all the rows in the table or all the rows that result from the joins. The FIELD LIST limits which columns are being returned in the result set. Nothing, however, placed a limit on the rows.

In order to limit the number of rows in the result set, you need to add a WHERE clause to your SELECT statement. The WHERE clause includes one or more logical expressions that must be true for a row before it can be included in the result set. Here is an example of a SELECT statement with a WHERE clause:

SELECT dbo.Customer.CustomerNumber,     dbo.Customer.Name,     dbo.Customer.BillingCity,     dbo.InvoiceHeader.InvoiceNumber,     dbo.InvoiceHeader.TotalAmount,     dbo.LoyaltyDiscount.Discount FROM dbo.Customer INNER JOIN dbo.InvoiceHeader  ON dbo.Customer.CustomerNumber = dbo.InvoiceHeader.CustomerNumber  LEFT OUTER JOIN dbo.LoyaltyDiscount  ON dbo.Customer.CustomerNumber = dbo.LoyaltyDiscount.CustomerNumber WHERE (dbo.Customer.BillingCity = 'Axelburg') 

The word ‘Axelburg’ (enclosed in single quotes) is a string constant. A string constant, also known as a string literal, is an actual text value. The string constant instructs SQL Server to use the text between the single quotes as a value rather than the name of a column or a table. In this example, only customers with a value of Axelburg in their Billing City column will be included in the result set, as shown in Figure 3-30.

click to expand
Figure 30: A SELECT statement with a WHERE clause

Note

Microsoft SQL Server 2000, in its standard configuration, insists on single quotes around string constants, such as ‘Axelburg’ in the previous SELECT statement. SQL Server 2000 assumes that anything enclosed in double quotes is a field name.

To create more complex criteria for your result set, you can have multiple logical expressions in the WHERE clause. The logical expressions are linked together with an AND or an OR. When an AND is used to link logical expressions, the logical expressions on both sides of the AND must be true for a row in order for that row to be included in the result set. When an OR is used to link two logical expressions, either one or both of the logical expressions must be true for a row in order for that row to be included in the result set.

This SELECT statement has two logical expressions:

SELECT dbo.Customer.CustomerNumber,     dbo.Customer.Name,     dbo.Customer.BillingCity,     dbo.InvoiceHeader.InvoiceNumber,     dbo.InvoiceHeader.TotalAmount,     dbo.LoyaltyDiscount.Discount FROM dbo.Customer INNER JOIN dbo.InvoiceHeader  ON dbo.Customer.CustomerNumber = dbo.InvoiceHeader.CustomerNumber LEFT OUTER JOIN dbo.LoyaltyDiscount  ON dbo.Customer.CustomerNumber = dbo.LoyaltyDiscount.CustomerNumber WHERE (dbo.Customer.BillingCity = 'Axelburg') AND (dbo.Customer.Name > 'C') 

Only customers with a value of Axelburg in their Billing City column and with a name that comes after C will be included in the result set. This result set is shown in Figure 3-31.

click to expand
Figure 3-31: A SELECT statement with two logical expressions in the WHERE clause

The ORDER BY Clause

Up to this point, the data in the result sets has shown up in any order that it pleases. As we discussed previously, this will probably not be acceptable for most reports. You can add an ORDER BY clause to your SELECT statement to obtain a sorted result set. This statement includes an ORDER BY clause with multiple columns:

SELECT dbo.Customer.CustomerNumber,     dbo.Customer.Name,     dbo.Customer.BillingCity,     dbo.InvoiceHeader.InvoiceNumber,     dbo.InvoiceHeader.TotalAmount,     dbo.LoyaltyDiscount.Discount FROM dbo.Customer INNER JOIN dbo.InvoiceHeader  ON dbo.Customer.CustomerNumber = dbo.InvoiceHeader.CustomerNumber LEFT OUTER JOIN dbo.LoyaltyDiscount  ON dbo.Customer.CustomerNumber = dbo.LoyaltyDiscount.CustomerNumber WHERE (dbo.Customer.BillingCity = 'Axelburg') AND (dbo.Customer.Name > 'C') ORDER BY dbo.Customer.Name DESC, dbo.InvoiceHeader.InvoiceNumber 

The result set created by this SELECT statement, shown in Figure 3-32, is first sorted by the contents of the Name column in the Customer table. The “DESC” that follows “dbo.Customer.Name” in the ORDER BY clause specifies the sort order for the customer name sort. DESC means that this sort is done in descending order. In other words, the customer names will be sorted from the end of the alphabet to the beginning.

click to expand
Figure 3-32: A SELECT statement with an ORDER BY clause

Several rows have the same customer name. For this reason, a second sort column is specified. This second sort is only applied within each group of identical customer names. For example, Twillig Companies has three rows in the result set. These three rows are sorted by the second sort, which is invoice number. No sort order is specified for the invoice number sort, so this defaults to an ascending sort. In other words, the invoice numbers are sorted from lowest to highest.

Constant and Calculated Fields

Our SELECT statement examples thus far have used an asterisk symbol or a FIELD LIST that includes only columns. A FIELD LIST can, in fact, include other things as well. For example, a FIELD LIST can include a constant value, as is shown here:

SELECT dbo.Customer.CustomerNumber,     dbo.Customer.Name,     dbo.Customer.BillingCity,     dbo.InvoiceHeader.InvoiceNumber,     dbo.InvoiceHeader.TotalAmount,     dbo.LoyaltyDiscount.Discount,     'AXEL' AS ProcessingCode FROM dbo.Customer INNER JOIN dbo.InvoiceHeader  ON dbo.Customer.CustomerNumber = dbo.InvoiceHeader.CustomerNumber LEFT OUTER JOIN dbo.LoyaltyDiscount  ON dbo.Customer.CustomerNumber = dbo.LoyaltyDiscount.CustomerNumber WHERE (dbo.Customer.BillingCity = 'Axelburg') AND (dbo.Customer.Name > 'C') ORDER BY dbo.Customer.Name DESC, dbo.InvoiceHeader.InvoiceNumber

The string constant ‘AXEL’ has been added to the FIELD LIST. This creates a new column in the result set with the value AXEL in each row. By including “AS ProcessingCode” on this line, we give this result set column a column name of ProcessingCode. Constant values of other data types, such as dates or numbers, can also be added to the FIELD LIST. The result set for this SELECT statement is shown in Figure 3-33.

click to expand
Figure 3-33: A SELECT statement with a constant in the FIELD LIST

In addition to adding constant values, you can also include calculations in the FIELD LIST. This SELECT statement calculates the discounted invoice amount based on the total amount of the invoice and the loyalty discount:

SELECT dbo.Customer.CustomerNumber,     dbo.Customer.Name,     dbo.Customer.BillingCity,     dbo.InvoiceHeader.InvoiceNumber,     dbo.InvoiceHeader.TotalAmount,     dbo.LoyaltyDiscount.Discount,     dbo.InvoiceHeader.TotalAmount -        (dbo.InvoiceHeader.TotalAmount *                dbo.LoyaltyDiscount.Discount)                       AS DiscountedTotalAmount FROM dbo.Customer INNER JOIN dbo.InvoiceHeader  ON dbo.Customer.CustomerNumber = dbo.InvoiceHeader.CustomerNumber LEFT OUTER JOIN dbo.LoyaltyDiscount  ON dbo.Customer.CustomerNumber = dbo.LoyaltyDiscount.CustomerNumber WHERE (dbo.Customer.BillingCity = 'Axelburg') AND (dbo.Customer.Name > 'C') ORDER BY dbo.Customer.Name DESC, dbo.InvoiceHeader.InvoiceNumber

The result set for this SELECT statement is shown in Figure 3-34. Notice that the value for the calculated column, DiscountedTotalAmount, is NULL for all the rows that are not for Landmark, Inc. This is because we are using the value of the Discount column in our calculation. The Discount column has a value of NULL for every row except for the Landmark, Inc. rows.

click to expand
Figure 3-34: A SELECT statement with a calculated column in the FIELD LIST

A NULL value cannot be used successfully in any calculation. Any time you try to add, subtract, multiply, or divide a number by NULL, the result is NULL. The only way to receive a value in these situations is to give the database a valid value to use in place of any NULLs it might encounter. This is done using the ISNULL( ) function, as shown in the following statement:

SELECT dbo.Customer.CustomerNumber,     dbo.Customer.Name,     dbo.Customer.BillingCity,     dbo.InvoiceHeader.InvoiceNumber,     dbo.InvoiceHeader.TotalAmount,     dbo.LoyaltyDiscount.Discount,     dbo.InvoiceHeader.TotalAmount -        (dbo.InvoiceHeader.TotalAmount *           ISNULL(dbo.LoyaltyDiscount.Discount,0.00))                       AS DiscountedTotalAmount FROM dbo.Customer INNER JOIN dbo.InvoiceHeader  ON dbo.Customer.CustomerNumber = dbo.InvoiceHeader.CustomerNumber LEFT OUTER JOIN dbo.LoyaltyDiscount  ON dbo.Customer.CustomerNumber = dbo.LoyaltyDiscount.CustomerNumber WHERE (dbo.Customer.BillingCity = 'Axelburg') AND (dbo.Customer.Name > 'C') ORDER BY dbo.Customer.Name DESC, dbo.InvoiceHeader.InvoiceNumber

Now, when the database encounters a NULL value in the Discount column while it is performing the calculation, it substitutes a value of 0.00 and continues on with the calculation. The database only performs this substitution when it encounters a NULL value. If any other value is in the Discount column, it uses that value. The result set from this SELECT statement is shown in Figure 3-35.

click to expand
Figure 3-35: A SELECT statement using the ISNULL( ) function

The GROUP BY Clause

Our sample SELECT statement appears to resemble a run-on sentence. You have seen, however, that each of these clauses is necessary to change the meaning of the statement and provide the result set that is desired. We will add just two more clauses to the sample SELECT statement before we are done.

There are times, as you are analyzing data, that you only want to see information at a summary level rather than viewing all the detail. In other words, you want the result set to group together the information from several rows to form a summary row. Additional instructions must be added to our SELECT statement in two places in order for this to happen.

First of all, you need to specify which columns are going to be used to determine when a summary row will be created. These columns are placed in the GROUP BY clause. Consider the following SELECT statement:

SELECT dbo.Customer.CustomerNumber,     dbo.Customer.Name,     dbo.Customer.BillingCity,     COUNT(dbo.InvoiceHeader.InvoiceNumber) AS NumberOfInvoices,     SUM(dbo.InvoiceHeader.TotalAmount) AS TotalAmount,     dbo.LoyaltyDiscount.Discount,     SUM(dbo.InvoiceHeader.TotalAmount -        (dbo.InvoiceHeader.TotalAmount *           ISNULL(dbo.LoyaltyDiscount.Discount,0.00)))                       AS DiscountedTotalAmount FROM dbo.Customer INNER JOIN dbo.InvoiceHeader  ON dbo.Customer.CustomerNumber = dbo.InvoiceHeader.CustomerNumber LEFT OUTER JOIN dbo.LoyaltyDiscount  ON dbo.Customer.CustomerNumber = dbo.LoyaltyDiscount.CustomerNumber WHERE (dbo.Customer.BillingCity = 'Axelburg') AND (dbo.Customer.Name > 'C') GROUP BY dbo.Customer.CustomerNumber, dbo.Customer.Name,      dbo.Customer.BillingCity, dbo.LoyaltyDiscount.Discount ORDER BY dbo.Customer.Name DESC

The Customer Number, Name, Billing City, and Discount columns are included in the GROUP BY clause. When this query is run, each unique set of values from these four columns will result in a row in the result set.

Second, you need to specify how the columns in the FIELD LIST that are not included in the GROUP BY clause are to be handled. In the sample SELECT statement, the Invoice Number and Total Amount columns are in the FIELD LIST but are not part of the GROUP BY clause. The calculated column, Discounted Total Amount, is also in the FIELD LIST but is not present in the GROUP BY clause. In the sample SELECT statement, these three columns are the non-group-by columns.

The SELECT statement is asking for the values from several rows to be combined into one summary row. The SELECT statement needs to provide a way for this combining to take place. This is done by enclosing each non-group-by column in a special function called an aggregate function, which performs a mathematical operation on values from a number of rows and returns a single result. Aggregate functions include:

  • SUM( ) Returns the sum of the values

  • AVG( ) Returns the average of the values

  • COUNT( ) Returns a count of the values

  • MAX( ) Returns the largest value

  • MIN( ) Returns the smallest value

The SELECT statement in our group by example uses the SUM( ) aggregate function to return the sum of the invoice amount and the sum of the discounted amount for each customer. It also uses the COUNT( ) aggregate function to return the number of invoices for each customer. The result set from this SELECT statement is shown in Figure 3-36. Note that when an aggregate function is placed around a column name in the FIELD LIST, the SELECT statement can no longer determine what name to use for that column in the result set. You need to supply a column name to use in the result set, as shown in this SELECT statement.

click to expand
Figure 3-36: A SELECT statement with a GROUP BY clause

Note

When you’re using a GROUP BY clause, all columns in the FIELD LIST must either be included in the GROUP BY clause or be enclosed in an aggregate function. In the sample SELECT statement, the Customer Number column is all that is necessary in the GROUP BY clause to provide the desired grouping. However, because the Name, Billing City, and Discount columns do not lend themselves to being aggregated, they are included in the GROUP BY clause along with the Customer Number column.

The HAVING Clause

The GROUP BY clause has a special clause that can be used with it to determine which grouped rows will be included in the result set. This is the HAVING clause. The HAVING clause functions similar to the WHERE clause. The WHERE clause limits the rows in the result set by checking conditions at the row level. The HAVING clause limits the rows in the result set by checking conditions at the group level.

Consider the following SELECT statement:

SELECT dbo.Customer.CustomerNumber,     dbo.Customer.Name,     dbo.Customer.BillingCity,     COUNT(dbo.InvoiceHeader.InvoiceNumber) AS NumberOfInvoices,     SUM(dbo.InvoiceHeader.TotalAmount) AS TotalAmount,     dbo.LoyaltyDiscount.Discount,     SUM(dbo.InvoiceHeader.TotalAmount -        (dbo.InvoiceHeader.TotalAmount *           ISNULL(dbo.LoyaltyDiscount.Discount,0.00)))                      AS DiscountedTotalAmount FROM dbo.Customer INNER JOIN dbo.InvoiceHeader  ON dbo.Customer.CustomerNumber = dbo.InvoiceHeader.CustomerNumber LEFT OUTER JOIN dbo.LoyaltyDiscount  ON dbo.Customer.CustomerNumber = dbo.LoyaltyDiscount.CustomerNumber WHERE (dbo.Customer.BillingCity = 'Axelburg') AND (dbo.Customer.Name > 'C') GROUP BY dbo.Customer.CustomerNumber, dbo.Customer.Name,      dbo.Customer.BillingCity, dbo.LoyaltyDiscount.Discount HAVING COUNT(dbo.InvoiceHeader.InvoiceNumber) >= 2 ORDER BY dbo.Customer.Name DESC

The WHERE clause says that a row must have a Billing City column with a value of Axelburg and a Name column with a value greater than C before it can be included in the group. The HAVING clause says that a group must contain at least two invoices before it can be included in the result set. The result set for this SELECT statement is shown in Figure 3-37.

click to expand
Figure 3-37: A SELECT statement with a HAVING clause




Microsoft SQL Server 2000 Reporting Services
Microsoft SQL Server 2000 Reporting Services Step by Step (Pro-Step by Step Developer)
ISBN: 0735621063
EAN: 2147483647
Year: 2003
Pages: 109

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net