After we have data in our tables, we will want to start retrieving them. This is done in SQL with the SELECT query. The query returns a set of rows (potentially empty) known as a result set. Even if we execute a query to tell us information about the table, such as a request for the number of rows or the average value of a certain field in a row, the results are returned as a result set for consistency.
We will use the Users, Messages, Replies, and Forums tables we created in the previous chapter as examples:
CREATE TABLE Users ( user_id INTEGER AUTO_INCREMENT PRIMARY KEY, user_name VARCHAR(50) NOT NULL, full_name VARCHAR(150), user_email VARCHAR(200) NOT NULL, birthdate DATE, INDEX (user_name) ); CREATE TABLE Forums ( forum_id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(200) NOT NULL, description TEXT, owner_id INT FOREIGN KEY (owner_id) REFERENCES Users (user_id) ); CREATE TABLE Messages ( message_id INTEGER AUTO_INCREMENT PRIMARY KEY, author_id INTEGER NOT NULL, date_posted DATETIME, title VARCHAR(150), body TEXT, forum_id INTEGER NOT NULL FOREIGN KEY (author_id) REFERENCES Users (user_id), FOREIGN KEY (forum_id) REFERENCES Forums (forum_id) ); CREATE TABLE Replies ( reply_id INTEGER AUTO_INCREMENT PRIMARY KEY, author_id INTEGER NOT NULL, message_id INTEGER NOT NULL, message_id INTEGER NOT NULL, date_posted DATETIME, title VARCHAR(150), body TEXT FOREIGN KEY (author_id) REFERENCES Users (user_id), FOREIGN KEY (message_id) REFERENCES Messages (message_id) );
In its most basic form, the query looks as follows:
SELECT what FROM location;
The what can be, among other things, one of the following:
The location where the data is fetched is typically the name of a table optionally prefixed by the database in which it resides and a period character (.).
For example, to see the list of forums available in our message board system, we might choose one of the following queries:
SELECT * FROM MessageBoard.Forums; SELECT name, description FROM Forums; SELECT Forums.forum_id,Forums.name FROM Forums;
The results are printed out for you by the client program:
mysql> SELECT forum_id, name FROM Forums; +----------+------------------------+ | forum_id | name | +----------+------------------------+ | 9 | General Discussion | | 10 | n000bs | | 11 | Technical Support | | 12 | Fans of Britney Spears | +----------+------------------------+ 4 rows in set (0.01 sec)
Refining Data Retrieval
When you do not want to fetch all of the data in a tablewhen you want to search for specific dataSQL lets you qualify the SELECT query as follows:
SELECT what FROM location WHERE column operator value;
Table 10-1 shows some of the most important operators.
Most of the operators are reasonably intuitive in their use. For example, use the following to find a list of all users born after 1990:
SELECT * FROM Users WHERE birthdate >= '1990-01-01';
Similarly, to find a list of all forums without a specified owner, we could execute the following query:
SELECT forum_id,name FROM Forums WHERE owner_id = -1;
One of the more interesting operators is the LIKE operator. This, combined with the SQL wildcard character (%)a character that matches against any number of missing characterscan create some truly powerful queries. For example, to find all the clowns in our Users table, we might execute the query:
SELECT user_name FROM Users WHERE full_name LIKE '%clown%';
String searches default to being case-insensitive, meaning that any names with Clown, CLOWN, or clown match our query.
The wildcard character can also be used with other data types, such as dates. To find all users born in 1971 (a most excellent year), we could execute
SELECT * FROM Users WHERE birthdate LIKE '1971-%-%';
(We will see in the section on functions that there are better ways to write this query.) To list all forums beginning with the letter m, we would use the following query:
SELECT forum_id,name FROM Forums WHERE name LIKE 'm%';
The LIKE keyword can be prefixed with the keyword NOT, which causes it to return all values except those that match the specified value:
SELECT * FROM Forums WHERE name NOT LIKE '%General%';
Finally, the IS NULL operator, which can also be used with the keyword NOT, lets us match those columns whose value is NULL.
SELECT * FROM Users WHERE full_name IS NOT NULL;
Combining Table Data Upon Retrieval
One of the characteristics of the normalized tables we created in the previous chapter is that only some of the information that we might like is available in every table. For example, instead of having all the data for users in our Messages table, we have a foreign key pointing to the Users table. However, when we fetch a message from the Messages table to print it on a page, we also want the username of the message poster, not just his numeric user_id. It would certainly be frustrating and inefficient if we had to execute the following code all the time:
SELECT title,body,date_posted,author_id FROM Messages WHERE message_id=594923; get author_id value and use it for: SELECT user_name FROM Users WHERE user_id = author_id_value;
Since the author_id field from the Messages table is the primary key of the Users table, we would prefer if there was a way to get the user's name at the same time we are fetching the message data.
Fortunately, SQL can do this for us via a join. In its most basic form, we can execute a full join, which returns the mathematical cross-product of the two tablesa table containing every possible combination of rows in to the two tables being joined, regardless of whether this makes sense. We do this by specifying two tables in the FROM portion of our SELECT statement:
SELECT * FROM Messages, Users;
If the Messages table has x rows and the Users table has y rows, this query returns a table with x x y rows. What we want to do is select a portion of this table to give us more useful output. For example, to get a basic list of all the messages in our system with the friendly username instead of the author_id, we would want to match all those rows where the user_id from the Users table was the same as the author_id in the Messages table:
SELECT Messages.title, Messages.date_posted, Users.user_name FROM Messages, Users WHERE Messages.author_id = Users.user_id;
By inserting a WHERE clause, we have selected only those rows from the full join that are meaningful or interesting to us (see Figure 10-1).
Figure 10-1. Joining columns from more than one table.
The latest operation is known as an inner join. It retrieves only those rows from the cross product of the two tables that satisfy a condition (for instance, where the user_id and author_id match up). For those struggling to understand how this works, SQL provides us with a more explicit syntax for joins that helps us manage how we think about them. This syntax uses a keyword called JOIN:
SELECT column1, column2, ...columnn FROM table1 INNER JOIN table2s ON table1.keyfield = table2.keyfield;
With this syntax, we specify the two tables to join, table1 and table2, and the condition to satisfy for the subset of rows we want from the join. We can rewrite the previous SELECT statement as follows:
SELECT Messages.title, Messages.date_posted, Users.user_name FROM Messages INNER JOIN Users ON Messages.author_id = Users.user_id;
If there is a row in Messages for which there is an author_id that is not in Users (that would be a bug in our web application) or there is a row in the Users table for which there is no author_id in the Messages table (that user has never posted a message), such rows will not be included in the result set.
We can still qualify these statements with the WHERE keyword. For example, to find all posts in the forum with a forum_id of 0 ("General Discussions"):
SELECT title, date_posted, Users.user_name FROM message INNER JOIN Users ON Messages.author_id = Users.user_id WHERE Messages.forum_id = 0;
In addition to inner joins, there are two other types of join you will commonly encounter called outer joins.
As an example in our message board system, we might want to fetch a list of all those users who have never posted a message. We could do this by doing a left outer join, listing the Users table as the primary table and the Messages table as the one to join. For the rows in the Users table where the user_id does not exist in the Messages table, we will get NULL in the columns from the Messages table. Thus, we could write the following query:
SELECT Users.user_id, Users.user_name FROM Users LEFT OUTER JOIN Messages ON Users.user_id = Messages.author_id WHERE Messages.message_id IS NULL;
Similarly, we could write an administrative query to find messages in the system that have an author_id of a user who does not exist. (Presumably the account has been deleted or has expired.)
SELECT Messages.message_id, Messages.author_id, Messages.title FROM Messages LEFT OUTER JOIN Users ON Messages.author_id = Users.user_id WHERE Users.user_id IS NULL;
If you take some time to think about the definitions for the two types of outer joins, you should eventually be able to see that the following two outer joins are equivalent:
SELECT columns FROM TableA LEFT OUTER JOIN TableB ON TableA.field = TableB.field; SELECT columns FROM TableB RIGHT OUTER JOIN TableA ON TableB.field = TableA.field;
If you are not comfortable with these definitions, do not worry. We will largely use INNER JOINs in this bookwe will typically only want matching results. It is worth populating a few tables with some data and playing with joins; you can switch around the primary and join table to see how the results are affected.
Sorting Retrieved Data
One of the more powerful features of modern relational database management systems is their ability to sort the results of a query. Optionally, we can include the keywords ORDER BY and information on exactly how we would like our data sorted at the end of our SELECT statements.
For example, to retrieve the title and posting date of the messages in our Messages table, sorting them by the posting date, we would run the following query:
SELECT title, date_posted FROM Messages ORDER BY date_posted;
This would return a set of data that could look as follows:
+----------------------------+---------------------+ | title | date_posted | +----------------------------+---------------------+ | This is my first message | 2004-11-20 16:12:18 | | I am writing more messages | 2004-11-21 16:16:22 | | I have to go to work now! | 2004-11-22 08:30:36 | | Any good jobs? | 2004-11-30 09:16:09 | +----------------------------+---------------------+ 4 rows in set (0.03 sec)
We can control how the sorting is done by adding the keyword ASC to indicate we would like to sort in ascending order or DESC to indicate we would like to sort in descending order after the name of the column controlling the sorting. ASC is the default in the absence of this keyword. The same result set would return the following if we sorted in descending order:
mysql> SELECT title, date_posted FROM Messages -> ORDER BY date_posted DESC; +----------------------------+---------------------+ | title | date_posted | +----------------------------+---------------------+ | Any good jobs? | 2004-11-30 09:16:09 | | I have to go to work now! | 2004-11-22 08:30:36 | | I am writing more messages | 2004-11-21 16:16:22 | | This is my first message | 2004-11-20 16:12:18 | +----------------------------+---------------------+ 4 rows in set (0.00 sec)
If we wish, we can sort on more than one field. If we wanted to sort our Users table by birth date and then sort the users alphabetically for those with the same birthday, we could execute the following query:
SELECT user_name,birthdate FROM Users ORDER BY birthdate DESC, user_name ASC;
This tells our database server to return all the records in the Users table sorted by birthdays with the most recent birthdays first, and then sort the user names alphabetically for those with the same birthday.
Fetching a Few Rows at a Time
If we have a table with tens of thousands of users and we want to display a page with those whose username starts with the letter a, we might still find ourselves with hundreds, if not thousands, of matching users. We would likely decide against showing them all at once in a single page and choose to show a certain number (for example, 25) at a time. To repeatedly execute SELECT queries that return thousands of rows and then discard all but 25 seems suboptimal.
To get around this, we can use the LIMIT clause. This lets us specify from which row in the result set to start returning rows and how many rows to return:
SELECT * FROM Users LIMIT x, y;
x is the (zero-based) index of the first row to include in the result set, and y is the number of rows to include. Thus, if we were displaying users having usernames starting with the letter a 25 names at a time and were trying to display the fourth page, we might write the following query:
SELECT user_name, full_name, birthdate FROM Users WHERE user_name LIKE 'a%' LIMIT 75, 25;
To fetch the first 10 records in the table, we would use LIMIT 0, 10. The LIMIT clause is not part of the ANSI SQL standard, but equivalent functionality is supported by most database servers.