So far, we have shown you the basics of SQL queries, but the language is far more powerful and flexible than the simple queries demonstrate. We will now introduce some of the more powerful features of the language.
The WHERE keyword lets us associate expressions to restrict the function of several SQL queries, including SELECT, UPDATE, and DELETE. While we have shown basic expressions of the form
ColumnName operator value
there are means by which expressions can be combined to form more elaborate expressions. The ones we will use are the AND and OR keywords. These behave similarly to AND and OR operators in other programming languages. The former is considered trUE if both sides of the condition are met, while the latter evaluates to trUE if either side of the condition is met. These let us refine our queries as needed. For example, to find all users with the first name Gaia who were born after 1980, we can enter this:
SELECT user_name, user_email FROM Users WHERE full_name LIKE 'Gaia%' AND birthdate >= '1980-01-01';
Similarly, we could look for all users with the name John or Jon:
SELECT user_name, birthdate FROM Users WHERE full_name LIKE 'John%' OR full_name LIKE 'Jon%';
To combine multiple expressions of this sort, parentheses can and should be used to clarify the order of evaluation:
SELECT user_name, full_name, birthdate FROM Users WHERE (full_name LIKE 'John%' OR full_name LIKE 'Jon%') AND birthdate >= '1980-01-01';
Specifying Sets and Ranges of Values
When we want to specify a range of values in our expressions, SQL provides two useful keywords for thisIN and BETWEEN.
The IN keyword allows us to specify a set of scalar values (no wildcards are permitted) against which the value of a column should be evaluated. There is a match if the column value is in the set specified:
SELECT * FROM Messages WHERE forum_id IN (1, 3, 7, 4);
The BETWEEN keyword lets us specify a range of values against which a value should match. The functionality is reasonably transparent with numbers and dates:
SELECT * FROM Users WHERE birthdate BETWEEN '1970-01-01' AND '1970-12-31'; SELECT * FROM Messages WHERE message_id BETWEEN 1000 AND 5000;
With strings, however, it can be less so. As we mentioned before, the database uses the information it has on sorting order (collation) to determine whether a string value falls within a range of strings. Depending on what collation has been specified, this tells it not only how to deal with English language ASCII letters, but also those from other languages. Thus, we can write a query
SELECT * FROM Users WHERE user_name BETWEEN 'a' AND 'm';
that returns the set of usernames between a and m. When first entering this query, you might receive a strange error about character sets and collation orders, such as the following one returned by MySQL:
ERROR 1270 (HY000): Illegal mix of collations (utf8_general_ci,IMPLICIT), (latin1_swedish_ci,COERCIBLE), (latin1_swedish_ci,COERCIBLE) for operation 'BETWEEN'
The most likely explanation for this error is that the client program that you are connecting to the server with is using one character set and collation (see Chapter 9, "Designing and Creating Your Database") while the database is using another. (We typically set them to be Unicode/utf-8 on the server.) To rectify this, you need to set the client program to use the same character set and collation as the database. In the MySQL client program mysql, this is done as follows:
mysql> set character_set_connection = @@character_set_database;
This setting of the character set also sets the collation to the default collation for that character set. If you have chosen something different, you can also set the collation of the connection as follows:
mysql> set collation_connection = @@collation_database;
One final complication is whether various database engines treat the range specified with BETWEEN as inclusive or exclusive. Some databases include in the set of valid matches values that are equal to the boundaries of the BETWEEN range, while others exclude values that match the boundaries. You should double check with any given database server before you rely on this functionality. (All of the DBMSes we discuss are inclusive.)
Fetching Unique Values
There are times when we only want to know the possible set of values in a column instead of all of the values (including any duplicates). In our Messages table, if we were to run the query
SELECT forum_id FROM Messages;
we would find ourselves with many potential duplicate entries for any forums with more than one posting. However, if we wanted to see a list of forums in which there was at least one message posted, we could use the SELECT DISTINCT query in SQL:
SELECT DISTINCT forum_id FROM Messages;
This query returns only individual forum_id values with the duplicates eliminated for us.
SQL Functions in Queries
In addition to scalar values and columns in your queries, SQL allows you to use various functions, including those provided by the server and those written by users on systems that allow stored procedures (user-defined functions compiled and kept on the server).
There are two classes of functions in SQL:
A function needs to be scalar to be used in an expression. Aggregate functions can be used to process the result of a query before it is finally returned to us, and thus are most commonly used with the results of a SELECT statement.
Unfortunately, these functions are another area where the various SQL servers vary greatly in implementation. While they usually have the same functionality, the name or way in which a particular function is used can differ significantly. We will endeavor to cover the most common ones here. When there is too much variation, we will show the MySQL method and refer you to Appendix B, "Database Function Equivalents," where we will briefly show how other servers might do the same thing.
When you call functions in SQL queries, no spaces are allowed between the function name and the parentheses used for arguments.
AVG(daily_precip) OK AVG (daily_precip) Not OK
We will start by introducing a few numeric functions since they are the easiest to understand and use. All of the following functions are aggregate functions that will be used to operate on a set of rows in a query and will return a single value.
The COUNT function can be used in two ways: COUNT(ColumnName) or COUNT(*). The former counts the number of values in the result set from the specified column that do not have NULL values. The second version simply counts the number of rows in the result set. For example
SELECT COUNT(*) FROM Users;
counts the number of users in our Users table. To count the number of users who do not have NULL full names (which is permitted by our table schema), we could execute the following query:
mysql> SELECT COUNT(full_name) FROM Users; +------------------+ | count(full_name) | +------------------+ | 4 | +------------------+ 1 row in set (0.00 sec)
To return a sum of the values in the result set for a given column, we will use the SUM function. If we had a table with the daily weather information for a particular place and wanted to compute the total rainfall for the year 2002, we could execute the following query:
SELECT SUM(daily_precip) FROM DailyWeatherReports WHERE date BETWEEN '2002-01-01' AND '2002-12-31';
MAX and MIN
To compute the maximum or minimum value of a column in a result set, you can use the MAX and MIN functions. To continue our weather example from the previous function, we could find the days with the highest and lowest precipitation in 2002 with the following queries:
SELECT MAX(daily_precip) FROM DailyWeatherReports WHERE date BETWEEN '2002-01-01' AND '2002-12-31'; SELECT MIN(daily_precip) FROM DailyWeatherReports WHERE date BETWEEN '2002-01-01' AND '2002-12-31';
We can combine these queries into a single query if we desire. The result set for this query would contain two columns:
SELECT MAX(daily_precip), MIN(daily_precip) FROM DailyWeatherReports WHERE date BETWEEN '2002-01-01' AND '2002-12-31';
To compute the average value of a column in a result set, the AVG function is provided. It is used as follows:
SELECT AVG(daily_precip) FROM DailyWeatherReports WHERE date BETWEEN '2002-01-01' AND '2002-12-31';
There are many useful functions for operating on strings, most of which are scalar functions that can be used in a number of places.
To extract a portion of a string in SQL, you use the SUBSTRING function (SUBSTR in Oracle). It takes three argumentsthe value (or column name) on which to operate, the index of the first character to extract, and the number of characters to extract.
If we had a table with the states and provinces of the USA and Canada, we might execute the following to get the first 5 characters of each:
SELECT SUBSTRING(name, 1, 5) FROM states_provinces;
To find the states and provinces that begin with New, we could execute the following:
SELECT * FROM states_provinces WHERE SUBSTRING(name, 1, 3) = 'New';
To convert strings to lower- or uppercase, you use the LOWER and UPPER functions.
SELECT LOWER(user_name) FROM Users; SELECT UPPER(last_name) FROM customers;
It is unlikely you will use LOWER and UPPER as part of your expressions in a WHERE clause since, as we mentioned in the previous chapter, most of the collations and comparisons are case-insensitive.
Finding a String
We need a function to find the position of a particular string within a value (or column). While all of the servers with which we will work have such functionality, the exact name and usage differs. MySQL and Oracle both use the INSTR function:
For example, we could use the following to get just the first names of our message board users from the Users table, assuming it was the first word in the name:
SELECT user_name, SUBSTRING(full_name, 1, INSTR(full_name, ' ')) FROM Users;
If the specified string is not found by the INSTR function, it returns 0.
The ability to concatenate strings together exists in SQL and varies widely from server to server. Thus, we will cover the MySQL syntax here:
CONCAT(value1, value2, ..., valuen)
If we wanted to return a formatted string with a user's username and e-mail address from our Users table, we could execute the following query:
SELECT CONCAT('User Name: ', user_name, '\tUser Email:', user_email) FROM Users;
Trimming Your Strings
trIM is a function in SQL that removes leadings and trailing whitespace from your strings and behaves much like its PHP counterpart:
SELECT user_name, TRIM(full_name), user_email FROM Users WHERE user_name LIKE 'F%';
There are a number of helpful functions for date and time data types in SQL. They are not standard across implementations, but most of the basic functionality exists in all the major servers.
To find out what the time and date is right now in MySQL and PostgreSQL, you use a function called Now.
SELECT Now(); INSERT INTO Orders (prodid, user_id, when) VALUES(445455423, 32345, Now());
Year, Month, Day
There are a number of functions to extract values of months from date values. Again, these differ significantly between implementations. In MySQL, the YEAR function takes a date and returns the four-digit year that date represents. The MONTH function returns the month for a date, and the DAYOFMONTH, DAYOFWEEK, and DAYNAME functions return the day in a slightly different format.
Formatting Dates and Time
While all databases provide functionality to format date and time values, the implementations are extremely different. In MySQL, the function used for this is the DATE_FORMAT function. It takes both the value to format and a string with the format to use.
The instructions are a sequence of characters to be substituted by the database engine, as in the following example:
mysql> SELECT full_name, DATE_FORMAT(birthdate, '%W %D %M %Y') FROM Users WHERE birthdate <> '0000-00-00'; +----------------+---------------------------------------+ | full_name | DATE_FORMAT(birthdate, '%W %D %M %Y') | +----------------+---------------------------------------+ | Klaus Mueller | Tuesday 16th December 1980 | | Akira Tanaka | Wednesday 13th September 0000 | | Petunia Durbin | Monday 31st March 1975 | +----------------+---------------------------------------+ 3 rows in set (0.00 sec)
Some of the more common and interesting format codes are presented in Table 11-1. For a complete set of values, consult the MySQL documentation for date and time functions.
Grouping Aggregate Function Data
Aggregate functions would be even more useful if SQL could group the values in a table before executing the functions on them. To continue the example of our weather database, we saw previously that we could use the AVG function to compute the average rainfall in a year. If we wanted to see the average rainfall for a few years, we would have to execute a number of queries on the different year values.
It would be nicer if there was a way for SQL to do this for us. Fortunately, the GROUP BY clause allows us to tell SQL to group the values from a certain column before executing the aggregate function on each of these groups:
SELECT YEAR(date), AVG(daily_precip) FROM DailyWeatherReports GROUP BY YEAR(date);
The GROUP BY clause, along with the scalar function YEAR, allows us to group the table by common year values and then execute the AVG function on all the row groups with common date values. We can further refine our query by using the HAVING clause, which is a bit like a WHERE clause but operates to restrict which rows the GROUP BY sub-clause sees. For example, to return the list of years in which the average rainfall exceeded 50mm, we might write our query as follows:
SELECT YEAR(date), AVG(daily_precip) FROM DailyWeatherReports GROUP BY YEAR(date) HAVING AVG(daily_precip) > 50;
The ability to combine all of these features into statements can let us write extremely powerful queries. For example, to get the average rainfall between the years 1990 and 2000 for all cities with an average annual rainfall exceeding 50mm
SELECT YEAR(date), AVG(daily_precip) FROM DailyWeatherReports WHERE YEAR(date) BETWEEN 1990 AND 2000 GROUP BY YEAR(date) HAVING AVG(daily_precip) > 50;