Retrieving with SQL SELECT

< Day Day Up >

Retrieving with SQL `SELECT`

By far, the most often used SQL statement is the SELECT statement, which retrieves data so you can view or otherwise use data. SELECT, in and of itself, does not act upon the data in any way.

The SELECT statement can do a lot and is very flexible. Consequently, its syntax structure can be complex:

 SELECT ALL | DISTINCT | TOP fieldlist | * FROM datasource [WHERE condition] [GROUP BY col1[, col2...]] [HAVING condition] [ORDER BY col1 [ASC | DESC] [, col2... [ASC | DESC]]

If you omit the All | DISTINCT component, SQL assumes ALL. Similarly, ASC is assumed in an ORDER BY clause if omitted. The asterisk character (*) in the SELECT clause retrieves all columns in the underlying data source, but it's better to avoid doing so unless that's what you really mean to do. The more data you retrieve, the slower the query performs. The FROM clause is the only mandatory clause.

The simplest SELECT uses none of the optional clauses:

 SELECT * FROM datasource

The fieldlist argument represents a list of field names from the underlying data source, separated by a comma character, as follows:

 SELECT col1[AS alias][, col2 [AS alias]...]

where col1, col2, and so on represent the list of field names. You can list as many field names as you want to return. The AS clause in the SELECT clause is optional and allows you to refer to the field using some name other than the column's actual field name. For instance, the following statement refers to the LName and FName fields as LastName and FirstName:

 SELECT LName AS LastName, FName AS FirstName FROM Employees

The SQL Predicates

The ALL, DISTINCT, and TOP predicates determine which records are retrieved. ALL is the default and retrieves all the records in the underlying data source. DISTINCT limits the results to unique records in the fields listed. For instance, the following statement limits the results to only one record for each record that contained the same values in both the LName and FName fields:

 SELECT DISTINCT LName AS LastName, FName AS FirstName FROM Employees

In other words, if there were two Mary Smiths listed in the Employees table, the resulting recordset would contain only one record for both.

CAUTION

The DISTINCT predicate returns a recordset that can't be updated, so don't use this predicate when modifying data.

The difference between DISTINCT and DISTINCTROW is this: DISTINCTROW eliminates duplicates based on all the columns in the data source, regardless of the fields retrieved. DISTINCT eliminates duplicates only in the retrieved fields.

SQL's TOP predicate lets you limit the number of records retrieved using the following syntax:

 SELECT TOP n [PERCENT] col1[, col2...]

where n is the number of records you want to retrieve. For instance, the following statement retrieves the first and last name values for the first 10 records in the Employees table:

 SELECT TOP 10 LName, FName FROM Employees

The PERCENT keyword interprets n as a percentage of records instead of a literal number. The following statement retrieves 10 percent of the records:

 SELECT TOP 10 PERCENT LName, FName FROM Employees

The first example always retrieves 10 records, whereas the number of records retrieved by the second statement varies, depending upon the total number of records in the data source. If there are 50 employee records, the second statement retrieves just the first five records; if there are 1,000 records, the second statement retrieves the first 100.

The SQL `FROM` Clause

The FROM clause is mandatory and its full form follows:

 FROM datasource | onetable join manytable ON onetable.primarykey = manytable.foreignkey

where onetable and manytable refer to two related tables in a one-to-many relationship. The more complicated form lets you retrieve data from related tables. The join argument identifies the type of join as INNER JOIN, RIGHT, and LEFT. There are other SQL joins, but these are the three supported by Jet SQL. For instance, the following statement retrieves all the client records and all the projects for each client whereby ClientID and ClientIDFK match:

 FROM Clients INNER JOIN Projects ON Clients.ClientID = Projects.ClientIDFK

If a client doesn't have a project, the INNER JOIN won't retrieve that client record. Similarly, if a project isn't attached to a particular client (which really shouldn't happen), that project won't appear in the resulting recordset.

To return all clients, even those without a project, you use a LEFT join. Similarly, to return all projects, even those not assigned to a client, you use a RIGHT join. The LEFT join returns all records from the one table and any matching records from the many table. The RIGHT join returns all records from the many side and any matching records from the one table.

The SQL `WHERE` Clause

Often, the data retrieved or otherwise acted upon depends on some condition that you express in the way of criteria. In similar fashion, the SQL WHERE clause lets you narrow down the data that's retrieved, updated, deleted, and so on.

This clause takes the form

 WHERE conditionalexpression

where conditionalexpression can take many forms, but the gist is that conditionalexpression expresses some value or condition that the stored value in question must meet to be included in the query's result. This expression can be a simple comparison to a literal value such as

 SELECT * FROM Clients WHERE ClientID = 1

which returns any record whereby the ClientID value equals the value 1. On the other hand, this argument can be an extremely complex set of conditions combined by the SQL And and Or operators. For instance, let's suppose you want all the records for a specific client whose project start dates occur in the month of January, 2004. In that case, you use the statement

 SELECT * FROM Projects WHERE ClientID = 1 And StartDate Between #1/1/2004# And #1/31/2004#

Note that you need to use pound signs to delimit dates and quotation marks to delimit strings. SQL's WHERE clause is flexible enough to handle criteria that refers to columns that aren't in the SELECT clause, as long as the columns are in the underlying data source. For instance, you can limit the previous statement to just the client information, but still retrieve the same records using this statement

 SELECT ClientID, ProjectName FROM Projects WHERE ClientID = 1 And StartDate Between #1/1/2004# And #1/31/2004#

To see records that contain a Null value, you must include an expression that explicitly says so. For instance, the expression

 WHERE ClientID = 1 And StartDate Between #1/1/2004# And #1/31/2004# Or StartDate Is Null

returns records for client 1 whereby the StartDate is Null.

The SQL `ORDER BY` Clause

Grabbing the right records isn't always enough; order is important in many cases. SQL's ORDER BY clause sorts by text, numeric, and date/time values. Don't even bother referencing any other type of column, because the effort will return an error. To specify a sort order, use the ORDER BY clause in the following form:

 ORDER BY col1 [ASC | DESC][, col2 [ASC | DESC],...]

The ASC | DESC component specifies an ascending or descending sort, where ascending is the default. If you omit this component, SQL applies an ascending sort. For instance, the following statement returns client records sorted by the start date value, in ascending order:

 SELECT ClientID, ProjectName FROM Projects WHERE ClientID = 1 And StartDate Between #1/1/2004# And #1/31/2004# ORDER BY StartDate

Like the WHERE clause, you can sort by a column that isn't specified in the SELECT clause, as long as the column is in the underlying data source. Keep in mind that the sort columns take precedence from left to right.

The SQL `GROUP BY` Clause

Like an aggregate function, the GROUP BY clause defines a group and most often performs some type of calculation that summarizes that group. For instance, you might use a GROUP BY clause to return subtotals for each client rather than returning all the detail values for each client.

The syntax for the GROUP BY clause follows:

 GROUP BY col1[, col2...]

where col1, col2, and so on reference actual columns or calculated fields. The one condition with the calculated fields is that the expression cannot contain an aggregate function or constant. Another condition is that all fields in the SELECT clause must be present as follows:

An argument to an aggregate function
In the GROUP BY clause as a column reference

What isn't apparent at first is that each column in the resulting recordset either defines the group or evaluates the group. It's important to note that a column doesn't have to be part of the SELECT clause to be included in the GROUP BY clause as long as the column is in the underlying data source. Like the ORDER BY clause, precedence is given from left to right. For an example, suppose you want to group client project records by start date to count them. In this case, you might use the following statement:

 SELECT Projects.ClientID, Projects.ProjectName,  Count(Projects.StartDate) AS CountOfStartDate FROM Projects WHERE (((Projects.ClientID)=1) AND  ((Projects.StartDate) Between #1/1/2004# And #1/31/2004#)) GROUP BY Projects.ClientID, Projects.ProjectName

Notice that each column in the SELECT clause appears either as part of an aggregate function (in this case, the Count function) or is listed in the GROUP BY clause.

The SQL `HAVING` Clause

SQL's HAVING clause limits the results of a group produced by a GROUP BY clause. You'll not see a HAVING clause in an SQL statement without an accompanying GROUP BY clause. However, a statement can contain a GROUP BY clause without a HAVING clause. The most important thing to remember about the HAVING clause is that Jet applies the clause after the data is grouped. That means you can use this clause to eliminate records after the fact, which is helpful when you're summarizing a grouped column and want to use summarized values as a condition for eliminating specific records.

The HAVING clause takes the form

 HAVING condition

which is certainly simple enough. The condition argument can be one or more expressions combined using the SQL And and Or operators in the more complex form

 HAVING expression1 AND | OR expression2...

The following expression is a good example of using the HAVING clause to limit a group:

 SELECT EmployeeID, DateWorked, Sum(Hours) AS SumOfHours FROM Timeslips GROUP BY EmployeeID, DateWorked HAVING Sum(Hours) > 8

The group is based on EmployeeID and DateWorked. However, the HAVING clause retrieves only those records where the sum of the Hours field is greater than eight hours for any given day. Notice that the HAVING clause evaluates the result of the Sum aggregate function Sum(Hours) in the SELECT clause. Doing so isn't absolutely necessary, but it's a common way to use it.

TIP

You can use the WHERE and HAVING clauses to limit the results of a grouped recordset, but keep in mind that they work differently. WHERE eliminates records before they're grouped and HAVING eliminates records after they're grouped. Often, you'll get the same results using either clause, but be careful because they're not interchangeable.

< Day Day Up >