Anatomy of a SELECT Statement


The syntax definition for an SQL statement is

 SELECT [ALL | DISTINCT] result [FROM table-list] [WHERE expr] [GROUP BY expr-list] [HAVING expr] [compound-op select]* [ORDER BY sort-expr-list] [LIMIT integer [(OFFSET|,) integer]] 

The only required item in a SELECT statement is the result, which can be one of the following:

  • The * character

  • A comma-separated list of one or more column names

  • An expression

The latter two bullet points should be combined into: "A comma-separated list of one or more expressions." The original two points make it seem as if the following would be an error:

 SELECT a+1, b+1 FROM ab; 

but this would be okay:

 SELECT a, b FROM ab; 

In fact, both are valid.

Using the * character or a list of columns makes no sense without a FROM clause, but in fact an expression whose arguments are constants rather than database items can be used alone in a SELECT statement, as in the following examples.

 sqlite> SELECT (60 * 60 * 24); 86400 sqlite> SELECT max(5, 20, -4, 8.7); 20 sqlite> SELECT random(); 220860261 

If the FROM list is omitted, SQLite effectively evaluates the expression against a table that always contains a single row.

The FROM list includes one or more table names in a comma-separated list, each with an optional alias name that can be used to qualify individual column names in the result. Where aliases are not used, the table name in full can be used to qualify columns.

For instance, the two following SELECT statements are identicalthe latter uses a table alias m for mytable.

 SELECT mytable.myfield FROM myfield; SELECT m.myfield FROM mytable m; 

The WHERE Clause

The WHERE clause specifies one or more conditions used to impose restrictions on the dataset returned by a SELECT. It is used both to limit the number of rows returned and to indicate a relationship used to join two tables together.

The general usage to impose a condition on the rows in a table is as follows:

 SELECT result FROM table-list WHERE expr; 

Expression expr generally involves a comparison of some kind on a table column, as shown in the following example:

 SELECT * FROM mytable WHERE myfield = 'somevalue'; 

Table 3.1 shows the relational operators that can be used in a WHERE clause.

Table 3.1. Relational Operators

Operator

Meaning

a = b

a is equal to b

a != b

a is not equal to b

a < b

a is less than b

a > b

a is greater than b

a <= b

a is less than or equal to b

a >= b

a is greater than or equal to b

a IN (b, c)

a is equal to either b or c

a NOT IN (b, c)

a is equal to neither b nor c


When you perform a comparison using a relational operatorand particularly the greater-than and less-than operators, < and >the data type of the column comes into play.

The following example shows how comparisons between the numbers 8, 11, and 101 differ greatly when performed as string operations. As integers, the order is as you would expect, however as strings, '101' is less than '11', which is in turn less than '8'. The individual character values in the string are compared from left to right in turn to determine which is the greatest value.

 sqlite> CREATE TABLE compare (string TEXT, number INTEGER); sqlite> INSERT INTO compare (string, number) values ('101', 101); sqlite> SELECT number FROM compare WHERE number > 11; 101 sqlite> SELECT string FROM compare WHERE string > '11'; 8 

However, note that if you use a relational operator with a number argument that is not contained in quotes, an integer comparison is performed regardless of the column data type.

 sqlite> SELECT string FROM compare WHERE string > 11; 101 

Selecting from multiple tables without a WHERE clause produces a Cartesian product of the datasets and is usually not a desirable result. With two tables, each record in table1 is paired with each record in table2. The total number of rows returned is the product of the number of rows in each table in the FROM list.

The following example shows the result of a Cartesian product of three tables, each containing just two rows. In total, eight rows are returned (2x2x2).

 sqlite> SELECT table1.myfield, table2.myfield, table3.myfield    ...> FROM table1, table2, table3; table1.myfield  table2.myfield  table3.myfield --------------  --------------  -------------- Table 1 row 1   Table 2 row 1   Table 3 row 1 Table 1 row 1   Table 2 row 1   Table 3 row 2 Table 1 row 1   Table 2 row 2   Table 3 row 1 Table 1 row 1   Table 2 row 2   Table 3 row 2 Table 1 row 2   Table 2 row 1   Table 3 row 1 Table 1 row 2   Table 2 row 1   Table 3 row 2 Table 1 row 2   Table 2 row 2   Table 3 row 1 Table 1 row 2   Table 2 row 2   Table 3 row 2 

In this example each table has a field called myfield, so each column in the result has to be qualified with the appropriate table name. This is not necessary where a column name is unique across all tables in the FROM list; however, it is good practice to always qualify column names to avoid ambiguity. If a column name could refer to more than one table, SQLite will not make the decision. Instead an error is raised as shown in the following example:

 sqlite> SELECT myfield    ...> FROM table1, table2, table3; SQL error: ambiguous column name: myfield 

To join two tables on a common fieldknown as an equi-join because the relationship is an equalitythe general syntax is

 SELECT result FROM table1, table2 WHERE table1.keyfield1 = table2.keyfield2 

SQLite supports outer joins via the LEFT JOIN keyword, whereby each row in the left tablethe one specified first in the SELECT statementis combined with a row from the right table. Where the join condition does not produce a match between the two tables, rows from the left table are still returned but with NULL values for each column that should be in the right table.

The general syntax for a LEFT JOIN is as follows:

 SELECT result FROM table1 LEFT [OUTER] JOIN table2 ON table1.keyfield1 = table2.keyfield2 [WHERE expr] 

The LEFT JOIN operator can be written as LEFT OUTER JOIN as a matter of preference; the OUTER keyword is optional.

GROUP BY and Aggregate Functions

The GROUP BY clause is used to aggregate data into a single row where the value of one or more specified columns is repeated. This feature can be used to reduce the number of records to only find unique values of a column, but is particularly useful when used in conjunction with the SQLite's aggregate functions.

The GROUP BY clause takes a list of expressionsusually column names from the resultand aggregates data for each expression. In the vegetables table we created previously we had more than one green vegetable, so grouping on the color column will return each value only once.

 sqlite> SELECT color    ...> FROM vegetables    ...> GROUP BY color; color ---------- green orange 

More interesting is to use the aggregate function count() to show how many records there are for each value of color:

 sqlite> SELECT color, count(color)    ...> FROM vegetables    ...> GROUP BY color; color       count(color) ----------  ------------ green       2 orange      1 

Using count(fieldname) will return the number of rows containing a non-NULL value in that field. If you want to return a count of the total number of rows, regardless of any NULL values, count(*) will do this, as the following example shows:

 sqlite> CREATE TABLE mytable (    ...>   field1 CHAR,    ...>   field2 INTEGER    ...> ); sqlite> INSERT INTO mytable VALUES ('foo', 5); sqlite> INSERT INTO mytable VALUES ('foo', 14); sqlite> INSERT INTO mytable VALUES ('bar', 25); sqlite> INSERT INTO mytable VALUES ('bar', 8); sqlite> INSERT INTO mytable VALUES ('bar', NULL); sqlite> SELECT field1, count(field2), count(*)    ...> FROM mytable    ...> GROUP BY field1; field1      count(field2)  count(*) ----------  -------------  ---------- bar         2              3 foo         2              2 

There are also a number of aggregate functions for performing summary calculations on grouped data, as shown in the following example:

 sqlite> SELECT field1, sum(field2), min(field2), max(field2), avg(field2)    ...> FROM mytable    ...> GROUP BY field1; field1      sum(field2)  min(field2)  max(field2)  avg(field2) ----------  -----------  -----------  -----------  ----------- bar         33           8            25           16.5 foo         19           5            14           9.5 

Table 3.2 lists all the aggregate functions available in SQLite.

Table 3.2. Aggregate Functions

Function

Meaning

avg(column)

Returns the mean average of all values in column

count(column)

Returns the number of times that a non-NULL value appears in column

count(*)

Returns the total number of rows in a query, regardless of NULL values

max(column)

Returns the highest of all values in column, using the usual sort order

min(column)

Returns the lowest of all values in column, using the usual sort order

sum(column)

Returns the numeric sum of all values in column


HAVING Clause

The HAVING clause is a further condition applied after aggregation takes place. In contrast to a WHERE clause, which applies a condition to individual elements in a table, HAVING is used to restrict records based on the summary value of a grouping.

To return only rows from the vegetables table where there is more than one of the same color, we can do this:

 sqlite> SELECT color, count(*)    ...> FROM vegetables GROUP BY color    ...> HAVING count(*) > 1; color       count(*) ----------  ---------- green       2 

It is actually not necessary for count(*) to appear in the result, as shown in the following example:

 sqlite> SELECT color    ...> FROM vegetables GROUP BY color    ...> HAVING count(*) > 1; color ---------- green 

Column Aliases

The column headings displayed in the output of a SELECT statement are usually the same as the items specified in the result section. For a straight column, the name of the column is displayed. For an expression, however, the expression text is used.

Although the column headings are only displayed in sqlite when .headers is set to on, it is important to know what each column's name is so that all the columns can be referenced correctly from within a programming API. A column alias is specified with the AS keyword to explicitly give a new name to a selected column.

In the following example, we give a column alias to the result of the count(*) aggregate function. In addition to renaming the column header, we can use the alias name in the HAVING clause, which can sometimes aid readability of code.

 sqlite> SELECT color, count(*) AS num_colors    ...> FROM vegetables GROUP BY color    ...> HAVING num_colors > 1; color       num_colors ----------  ---------- green       2 



    SQLite
    SQLite
    ISBN: 067232685X
    EAN: 2147483647
    Year: 2004
    Pages: 118
    Authors: Chris Newman

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net