Using a database involves defining the database, modifying the contents, and using the information. This section will show examples of SQL for these tasks.
How you create databases will depend on your Internet service provider (ISP). In the next chapter, we will suggest methods for Access and for MySQL that do not involve creating the database under program control. However, here is the SQL command:
CREATE DATABASE store
where create and database are keywords, and store would be replaced by whatever name you choose for your database. The next step is to define the tables for the database. Again, the suggested practice for Access and MySQL will differ. Here is the command for MySQL:
CREATE TABLE catalog
(prod_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
cost FLOAT(2), category CHAR(20), description CHAR(50))
This command creates a table named catalog in the current database. Establishing the current database is part of the connection process. The records in catalog have five fields. Note that the line spacing is to make this clearer and is not necessary.
The first field is named prod_id. Its datatype is an integer (whole number). The command indicates that it cannot be null. The auto_increment directive establishes that this field is to be assigned by MySQL. The last aspect of the prod_id field is that it is set to be the primary key. Telling the DBMS that this is a primary key means that the system builds an index using this field to speed up finding records using the primary key. You can designate other fields as being of type index. The designation of primary key would also put in a mechanism to check that this field contains unique values. However, since this field is also designated as auto_increment, the uniqueness is assured.
The prod_name, category, and description fields are each character strings, indicated here by CHAR. The size limits are different with the prod_name and category limited to 20 characters and the description field allowed 50. It is possible to specify a variable-length field and very large fields, but fixed-length character strings are most appropriate for this particular application.
The cost field is set to be a decimal number with two decimal places.
Field names, like variable names, are up to you, but some general rules apply. In Access, you can use spaces in the field names. However, this requires that you use square brackets in the SQL statements. For this reason, the practice here will be to use underscores and no spaces. Make names long enough to be recognizable but no longer, so that it is neither a strain to write them or a strain to remember what they represent.
In this section, you will see other instances of the create table statement. You can try to write these statements for the customers, orders, and ordered_item tables, or the course table that have been discussed. The prior discussion did not indicate the physical details such as the length of character fields, so you are on your own for deciding on these limits.
The alter statement can be used to alter the field definitions in a table, even if records have already been inserted. For example, you can add a field of information:
ALTER TABLE catalog ADD price FLOAT(2)
This particular statement would add a price field in addition to the cost field indicated in the original create table statement. Note that the existing records would not have the price field set—see the use of update statement later to set these values. New fields are added after the last field unless you add a clause to the alter statement:
ALTER TABLE catalog ADD price FLOAT(2) AFTER cost
This last alter statement positions the price right after the cost field.
The following statement directs the system to create an index for the prod_name field:
ALTER TABLE catalog ADD INDEX nameindex (prod_name)
This would be an appropriate thing to do if there were many queries against the database using the prod_name field. The use of indexes improves performance.
Let us assume that the database and its table are created. The next step is to put some records into the table. The SQL statement to add a record is INSERT. Here is an example of an insert statement, with made-up values.
INSERT INTO catalog VALUES (0, 'butterfly', 5.95, 12.50, 'animal', 'simple butterfly')
This statement inserts one record into the catalog table. The values in the parentheses following the term VALUES are the values for the fields, with the exception of the zero. The order of the values must correspond to the original definition of the table.
Recall that the prod_id field is set to be AUTO_INCREMENT. MySQL (and Access) will generate the next integer value for the prod_id field. The 5.95 is the cost for chocolate pie, and the 12.50 is the price.
In a program, the typical insert statement would use variables in place of the constants: “butterfly,” 5.95, 12.50, “animal3,” and “simple butterfly.” One of the intricacies of producing the query character strings is that you will need to insert the single quotation marks to denote character strings.
Other formats exist for the insert statement, some of which allow you to indicate only some of the fields. If there has been a lot of activity changing the definition of the table, it might be prudent to use the following form for insertions:
INSERT INTO catalog (prod_id, prod_name, cost, price, category, description) VALUES (0, 'dollar butterfly', 4.50, 9.00, 'money folds', 'dollar version of simple butterfly'
The order of the fields is up to you: that is, you need to name the fields to correspond with the order in which you give the values. In this situation, you could omit the prod_id and the system would generate the value.
What if you want to change one or more fields in an existing record or records? The update query performs that task. Let us assume that the record with prod_id equal to 5 needs to have the description changed. Yes, this is a fictional situation.
UPDATE catalog SET description='intermediate level butterfly model' WHERE prod_id=5
A much less artificial use of update would be the following:
UPDATE catalog SET price=cost*2
This sets the price field to be twice the cost field for all products. This would be appropriate if you did indeed add the price field to a table with data already in it. You could then make adjustments, such as the following showing another use of update.
UPDATE catalog SET price=price*1.07
Because no WHERE condition has been specified, this statement changes the price field of each record to 1.07 times the original price. To put it another way, the prices have been increased 7%. Another “wholesale” change would be the following. Let us assume that the catalog has more than 10 records.
UPDATE catalog set price=price*1.07 order by price limit 10
This statement orders the records by the price field. It then performs the update operation on the first 10 records (products). This has the effect of raising the prices on the lowest (cheapest) 10 items.
Of course, after adding the price field to the table, you can set the price field of new records by including a value for price in the insert statement.
The SQL statement to delete records from a database is used as follows:
DELETE FROM catalog WHERE prod_id=5
Another example would be:
DELETE FROM catalog WHERE cost<1.00
The next statement would remove all records from the table:
DELETE FROM catalog
Similarly, the following statement removes the table:
DROP TABLE catalog
As unlikely as it might seem, you might have cause to use the delete statements, especially during the development stages of implementation. However, once the system is up and running, it might be appropriate to limit the chances of making some catastrophic mistake such as deleting all the records. One approach is to make use of the permissions feature of a DBMS product to restrict what privileges users have. In MySQL, the grant statement allows you to set privileges to any or all of the following commands: ALTER, CREATE, DELETE, DROP, INSERT, SELECT, and UPDATE at a table or field level to specific users.
The select statement gathers information from the database, including making calculations involving the fields in one or more tables. The select statement has different clauses and can be quite complex as shown earlier in the explanation of joins. The simplest select query:
SELECT * FROM catalog
This will return what is called a recordset, a special datatype something like a table. It is made up of all the fields of all the records in the catalog table. You can restrict the fields chosen from the table.
SELECT prod_id, prod_name FROM catalog
The preceding returns a recordset made up of just the two fields (columns) cited.
You can also return a recordset revealing a calculated value based on the information in the table. There is a set of what are termed aggregate or summary functions for this purpose:
SELECT COUNT(*) FROM catalog
The preceding will return the number of records in the catalog table, and the next statement:
SELECT SUM(cost) FROM catalog
returns the sum of all the cost fields in the records of the catalog table. In each of these last two cases, the recordset contains exactly one row.
The select statement shows its power when you use conditions. You saw two examples in the previous section describing joins. The where clause can be used in statements involving just one table:
SELECT prod_name, price FROM catalog WHERE cost>5
This produces a list of products with their prices for all products in which the cost is over five dollars. (Note: nothing so far indicates that the currency is dollars as opposed to anything else. The choice of currency will be made apparent if and only if a display is made using dollar signs.) A where condition is made up of the keyword WHERE with a logical expression.
The examples so far have several models in the animal category.
SELECT prod_name, description FROM catalog WHERE category='animal'
You can also impose more than one condition and use expressions. The following would return the list of products, by name and description, in the animal category that, allowing for a tax rate of 8%, are over five dollars in price.
SELECT prod_name, description FROM catalog WHERE category='animal' and (price*1.08)>5
The summary functions and the functions in the conditions used so far can be found in most DBMSs. However, all such products have extensive libraries of operators and functions.
What if you wanted to have a list of all categories mentioned in your database?
SELECT DISTINCT category FROM catalog
The preceding statement will return the set of distinct categories. This is because the distinct term removes any duplicate entries. It can also be used to remove duplicate rows when you are requesting more than one field value. The rows must be duplicates in every field, so, given the presence of primary keys, this would be used when the fields mentioned did not include the primary key field.
The order of the results of a select query defaults to be the order of the items in the table. To specify the order, you use the order by clause. You have seen the case of ordering by cost. You can order by any field. The order of text fields are in alphabetical order:
SELECT prod_id, prod_name FROM catalog ORDER BY prod_name
Assuming the catalog is as given here, the recordset returned by the select query would be the following, which we present to you as an ordered list to emphasize the order. The numbers would not be part of the recordset.
1. 1, bird 2. 3, box 3. 2, crane 4. 4, ornament
If you wanted to count how many products you had in each category, you would use the group clause:
SELECT category, count(*) FROM catalog GROUP by category
This would return (assuming the original table before any butterflies were added):
1. Animal, 2 2. Box, 1 3. Modular, 1
You can change the default ascending order to descending by placing DESC at the end of the statement.
You can calculate more than one summary value for each group. The following will give you the sum of the costs for each group:
SELECT category, count(*), sum(cost) FROM catalog GROUP by category
This would produce:
1. Animal, 2, 5.50 2. Box, 1, 1 3. Modular, 1, 2
Select statements involving multiple tables have been introduced in the previous section. Here are more examples of joins. What if you wanted a list of all customers by name and date of order that ordered models in the animal category? What is the thought process for designing the select statement?
The catalog records contain the category field. That is, only the catalog table indicates anything about animal
The data we want is in the customer table: the cust_name and in the orders table: the date
Consequently, the tables required are the customer, catalog, and orders table.
This is a join of three tables. Let us do this first by where conditions. The start of the query is the following:
SELECT c.cust_name, o.date FROM customer as c, orders as o, catalog as p WHERE …
We interrupt “the thought process” to note that it makes sense to start writing down what will be parts of the query instead of trying to keep it all “in your head.”
The where conditions must connect the orders to the customers using the cust_id:
c.cust_id = o.cust_id
The where conditions must constrain the orders to the products
o.prod_id = p.prod_id
The where conditions must also constrain the products to be ones with category equal to “animal”:
p.category = 'animal'
The final select statement is:
SELECT c.cust_name, o.date FROM customer as c, orders as o, catalog as p WHERE c.cust_id = o.cust_id and o.prod_id = p.prod_id and p.category = 'animal'
You can do the following reasonability test on the select statement. It involved joining three tables. This would require at least two conditions. The third where condition came about by specifying the category to be animal.
Making this reasonability analysis puts us in a good position to create the “on” version for this task. What were “where” conditions become “on” conditions specifying how the tables are to be joined.
SELECT c.cust_name, o.date FROM customer as c JOIN orders as o ON c.cust_id = o.cust_id JOIN catalog as p ON o.prod_id = p.prod_id WHERE p.category = 'animal'
This statement will work as is, but it would be neater to use parentheses to set off one join at a time. One possibility is:
SELECT c.cust_name, o.date FROM (customer as c JOIN orders as o ON c.cust_id = o.cust_id) JOIN catalog as p ON o.prod_id = p.prod_id WHERE p.category = 'animal'
You can put a limit on the number of rows returned by the select query. The LIMIT clause works in two ways:
SELECT prod_name, prod_description FROM catalog LIMIT 2
The preceding limits the returned value to no more than two rows. If there are fewer records than the specified number, you will get the actual number of rows there actually are.
A common requirement is to produce a display with a limited number of records at a time. This is called paging. If you specify two numbers in the limit clause, you get the row indicated by the first number plus however many you indicate by the second number. The following example returns three rows, each containing the product name and the description, with the first row the third (zero indexing) from the database.
SELECT prod_name, description FROM catalog LIMIT 2, 3
If you are preparing these statements within a program, you will use variables for these two values. You would set the first variable to start at zero, and then increment it by the second value.
As stated previously, you can combine all these features of the select statement. For example, consider the statement:
SELECT p.prod_name, count(*) FROM catalog as p JOIN orders as o ON p.prod_id = o.prod_id GROUP BY o.prod_id
This produces a list of product names with the number of times they were ordered.
SELECT p.prod_name, count(o.qty) FROM catalog as p JOIN orders as o ON p.prod_id = o.prod_id GROUP BY o.prod_id
This gives a list of product names with the total quantity ordered.
What if you want to group the results, apply a summary function to each group, such as count or sum, and then apply a condition based on that result? That is, what if you want to just list the products that were ordered more than once? You cannot use a where condition because that action is done prior to grouping. Instead, SQL provides the HAVING clause feature:
SELECT p.prod_name, count(*) as pcount FROM catalog as p JOIN orders as ON p.prod_id = o.prod_id GROUP BY o.prod_id HAVING pcount>1
You can name the result of a summary function even if you do not use it in a having clause.
The previous examples demonstrated joins, using more than one table to extract information from the database. What if you need to use the same table in two different ways? The common example for this situation is a database of airline flights. In the Exercises, you will find questions using the course and prerequisite database mentioned in Chapter 5. Consider a database with two tables. One table holds the information on airports:
airport code: apc
The other table holds flight information:
from airport code: from_apc
to airport code: to_apc
scheduled departure time: depart
scheduled arrival time: arr
Note that the airport code is what we termed a naturally occurring value. Some of the codes such as JFK for JFK in New York City, New York, OAK for Oakland, California, and MIA for Miami, Florida are understandable. However, some, like EWR for Newark, NJ or MSY for New Orleans, LA, must have hidden, historical meaning. However, it would still make sense to use them in a database involving airline flights.
The departure and arrival times would either be stored in the local times or, more likely, Greenwich Mean Time. Dates and times are something that most DBMSs handle in a distinct fashion so you can compare and even do arithmetic. However, the products handle the dates and times in different ways. See the Exercises for an assignment.
What if you wanted to obtain a list of cities for the departure and the arrival for each flight? To follow the pattern of previous examples, we need to use the flights table and the airports table. If you think of joining the tables together using a common field, you notice that there is a problem! What is the common field to mention in the ON or WHERE clause? There are two possibilities for joining the tables together. The point is that it is necessary to use the airports table two distinct ways, and this is exactly what is done. The select statement to produce a list of flight codes, departure city, and arrival city is as follows:
SELECT flights.flight_code, d.city, a.city FROM flights, airports as d, airports as a WHERE flights.from_apc = d.apc AND flights.to_apc=a.apc
It is as if there were two tables. This passes the reasonability test: you have three (virtual) tables and you have two conditions connecting them. The join version of this query would be:
SELECT flights.flight_code, d.city, a.city FROM flights JOIN airports as d ON flights.from_apc = d.apc JOIN airports as a ON flights.to_apc=a.apc
Now suppose that you want to find out about the numbers of flights. Here is a way to derive the answer. The previous query gave a list of all flights with the departing and arriving cities. Is there a query to sort the list to make all the flights for a common route together? This requires use of the ORDER BY clause with two field names. Consider the following query:
SELECT flights.flight_code, d.city, a.city FROM flights, airports as d, airports as a WHERE flights.from_apc = d.apc AND flights.to_apc=a.apc ORDER BY d.city, a.city
This produces a list of flights, departing and arriving city ordered by the departing city first and then the arriving city. Therefore, all the flights for a particular route will appear together in the list.
Now, what do you need to do to get the number of flights, not the individual flight information? The answer is the use of the GROUP BY clause and the COUNT aggregate function. You can write a slight variant of the prior query to produce the number:
SELECT d.city, a.city, COUNT(*) FROM flights WHERE flights.from_apc = d.apc AND flights.to_apc=a.apc ORDER BY d.city, a.city GROUP BY d.city, a.city
Think of the GROUP BY clause as squashing the rows in a group and showing just what the group has in common or summary information. What the elements in the group have in common is what is named by the GROUP BY clause. The following would not make sense. Some systems would produce an error, but others would return an empty recordset.
SELECT flights.flight_code, d.city, a.city, COUNT(*) FROM flights WHERE flights.from_apc = d.apc AND flights.to_apc=a.apc ORDER BY d.city, a.city GROUP BY d.city, a.city
This is because the flights.flight_code information is not common across the group or derivable as a function of the group.