Subqueries | Java Oracle Database Development

SQL, as we have seen, has just a few types of DML statements and each of these has several optional clauses. This, I hope you've found, makes it easy to learn the basics. At the same time, the number of combinations of options that are possible can be overwhelming. This is one of the things that make SQL such a powerful and expressive language. One combination is especially formidable: the subquery, which is a SELECT statement within another SQL statement.

Subqueries are important in a relational database for the same reason that joins are. Because we store data in normalized tables, we frequently have to combine data from multiple tables or use data from one table to determine what data to use, change, or delete in another table.

The most important use of subqueries is with SELECT statements. Queries within subqueries are similar to joins and are, in fact, often interchangeable with them. But subqueries aren't just for queries they're much more versatile than that. They can also be used with CREATE TABLE to create and populate a new table from an existing one, with INSERT and UPDATE to copy data from one table to another and with DELETE to determine which rows in a table to delete.

Single-Value versus Multiple-Value Subqueries

Single-value subqueries are the most common and useful because they can be used almost anywhere that SQL expects a value. Single-value queries are commonly used in a condition, such as in a WHERE clause. The most common type of single-value subquery is one that uses an aggregate function, such as AVG(), MIN(), MAX(). We could use a single-value subquery to find, for example, which students scored higher than the class average.

Multiple-value subqueries are single-column, multiple-row subqueries essentially lists and can be compared with a single value, using the list operator IN or a comparison plus either the ANY or ALL operator. For example, if we have a table of people's addresses and a table of zip codes for different cities, we can find which people live in a specific city by using a multiple-value subquery; in this case, the subquery would obtain the list of zipcodes for the selected city and the outer query would compare each person's zipcode to the zipcodes in the list.

Both single-value subqueries and multiple-value subqueries are commonly used in the WHERE clauses of DELETE, UPDATE, and SELECT statements.

Multiple-column, multiple-row subqueries are essentially tables and can be used almost anywhere a table can, particularly in the FROM clause in a SELECT statement. They can also be used in place of the VALUES clause in an INSERT statement.

Correlated versus Noncorrelated Subqueries

The simplest way to use a subquery is to have it simply provide a value, multiple values, or multiple rows to the SQL statement that includes it. This type of subquery that operates entirely independently and unaware of the outer statement is called a noncorrelated subquery.

A subquery can also use values from the enclosing SQL statement. This is most common when the outer statement is a SELECT statement, as well, and the subquery compares columns from the table in the outer statement's FROM clause with columns from a table in its own FROM clause. This type of subquery is called a correlated subquery. You might think of a correlated subquery as a kind of subroutine. As each row in the outer query is processed, it calls the subquery with a new set of values, which the subquery uses to calculate values that it returns to the outer statement.

Queries with Subqueries: A SELECT within a SELECT

By far, subqueries are most commonly used with SELECT statements, usually as part of a WHERE clause. In Oracle, they can also be used as part of the select list in the FROM clause and in a HAVING clause. We'll see WHERE clauses that use single-value and multiple-value subqueries some noncorrelated and some correlated.

Single-Value Subqueries

Within a SELECT statement, the most common way to use a subquery is as a single-value subquery in the WHERE clause as part of a comparison. To ensure that the subquery will return a single value, we can either set the appropriate conditions in the subquery's WHERE clause based on a specific primary key value or our knowledge of the data or we can use an aggregate function.

We've seen that we can use joins to combine information from multiple tables. We saw the following example when we were looking at joins: The WHERE clause joins the ANTIQUE_CARS and CAR_CLUB tables on the MODEL column. Then we set an additional condition, our query condition We're looking for the owner of a Stutz:

 SQL> SELECT C.OWNER, C.MODEL, A.MAKE   2  FROM ANTIQUE_CARS A JOIN CAR_CLUB C   3  ON A.MODEL=C.MODEL   4  WHERE A.MAKE='Stutz'; OWNER        MODEL        MAKE ------------ ------------ ------------ Lou          Bearcat      Stutz

This is how we can find out the same information using a single-value query:

 SQL> SELECT OWNER   2  FROM CAR_CLUB   3  WHERE MODEL=(SELECT MODEL   4               FROM ANTIQUE_CARS   5               WHERE MAKE='Stutz'); OWNER ------------ Lou

There is one important difference between these results, and that is, when we use a join, we have access to information in both of the tables. Presumably, we wanted to know only who the owner of the Stutz is, but if we had wanted to list the MAKE from the ANTIQUE_CARS column in the results, as we did with the join, we couldn't do that with a single subquery like this. Aside from this limitation, a join and a subquery of this type are roughly comparable in terms of efficiency and ease of use. Choosing one form over the other is largely a matter of personal preference.

There are other cases, however, where a subquery is more appropriate to use than a join. This is especially true when we want to use an aggregate function.

Suppose that we wanted to list any fruits in our FRUITS table that cost less than the average. Our first thought might be that we can do this with a simple query, as follows:

 SQL> SELECT FRUIT, PRICE FROM FRUITS   2  WHERE PRICE<AVG(PRICE); WHERE PRICE<AVG(PRICE)             * ERROR at line 2: ORA-00934: group function is not allowed here

The problem, in general, is that we can't mix aggregate functions such as AVG() with nonaggregated columns. We can, however, use a subquery (a single-value, noncorrelated subquery) in the WHERE clause to calculate the average:

 SQL> SELECT FRUIT, PRICE FROM FRUITS   2  WHERE PRICE < (SELECT AVG(PRICE) FROM FRUITS); FRUIT           PRICE ---------- ---------- Apple              .5 Apple              .5 Mango             1.5 Mangosteen          2 Orange              1

The basic format for a SELECT statement with a single value subquery is:

 SELECT select_list FROM table_list WHERE expression comparison_operator       (SELECT {aggregate_function(column)}|column       FROM table_list       WHERE condition) [ORDER BY order_list];

The subquery should return a single value, either because the WHERE clause in the subquery ensures that only a single row is selected by referencing the primary key, for example or because the aggregate function rolls up the values from multiple rows into a single value.

Multiple-Value Subqueries

For some types of queries, we need to use a subquery that returns a list of values: a single-column, multiple-row result. Then we can use a list operator, such as IN, or the combination of a comparison operator and ANY or ALL to compare each row in the outer query with this list.

The basic format for a multiple-value query is:

 SELECT select_list FROM table_list WHERE expression {[NOT] IN}|                  {comparison_operator ANY|ALL}       (subquery) [ORDER BY order_list];

Let's take a look at another of the join examples and see how we can accomplish the same thing with a subquery:

 SQL> SELECT *   2  FROM ANTIQUE_CARS, CAR_CLUB   3  WHERE ANTIQUE_CARS.MODEL=CAR_CLUB.MODEL; MODEL        MAKE         OWNER        MODEL ------------ ------------ ------------ ------------ Bearcat      Stutz        Lou          Bearcat Terraplane   Hudson       Bob          Terraplane

In this example, we joined the ANTIQUE_CARS table with the CAR_CLUB table to find out who in the car club owns an antique car. We can find out the same information by using a multiple-value subquery with the IN list operator. The subquery returns a list of all the entries in the MODEL column in the ANTIQUE_CARS table, and, for each row in the CAR_CLUB table, the outer query attempts to find the MODEL in the subquery list:

 SQL> SELECT *   2  FROM CAR_CLUB   3  WHERE MODEL IN (SELECT MODEL   4                  FROM ANTIQUE_CARS); OWNER        MODEL ------------ ------------ Lou          Bearcat Bob          Terraplane

We can also find out which members of the club own cars that are not antiques; we simply add the NOT operator to IN:

 SQL> SELECT *   2  FROM CAR_CLUB   3  WHERE MODEL NOT IN (SELECT MODEL   4                      FROM ANTIQUE_CARS); OWNER        MODEL ------------ ------------ Sally        Corvette Jane         Mustang

One advantage of using subqueries is that they are more flexible than joins, and they are easier to understand and modify. If you remember, a join (at least conceptually) starts out with the Cartesian product every possible combination of rows from the two tables which the condition filters to the required (and manageable) result. Negating the join condition in the join that is equivalent to this subquery does not produce an equivalent result:

 SELECT * FROM ANTIQUE_CARS, CAR_CLUB       WHERE ANTIQUE_CARS.MODEL!=CAR_CLUB.MODEL;

The result of running this is not listed here, because this join does not return the list of club members who do not own antique cars; it returns the Cartesian product, excluding only the two rows where ANTIQUE_CARS.MODEL and CAR_CLUB.MODEL are equal 14 rows of meaningless results.

We can use subqueries with the operators ALL or ANY in combination with a comparison operator. A comparison operator plus ALL or ANY is effectively a list operator, such as IN.

ALL means, in effect, that the comparison is applied to each item in the list and the results are logically combined with AND the comparison must be true for ALL items:

X > ALL(list): is true if X is greater than every item in list; i.e., X is greater than the largest item in the list
X >= ALL(list): X is greater than or equal to the largest item in list
X < ALL(list): is true if X is less than every item in list; i.e., X is less than the smallest item in the list
X <= ALL(list): X is less than or equal to the smallest item in list
X = ALL(list): is true if X is equal to every item in list or, stating it conversely, every item in the list is equivalent to X
X != ALL(list): is true if X is not equal to any item in list; there is no item in the list equivalent to X

ANY means, in effect, that the comparison is applied to each item in the list and the results are logically combined with OR the comparison must be true for at least one item. Oracle also has the operator SOME, which is equivalent:

X > ANY(list): is true if X is greater than any (some) item in list or, stating it conversely, there is at least one item in the list smaller than X
X >= ANY(list): there is at least one item in the list smaller than or equal to X
X < ANY(list): is true if X is less than any item in list or, stating it conversely, there is at least one item in the list larger than X
X <= ANY(list): there is at least one item in the list larger than or equal to X
X = ANY(list): is true if X is equal to any item in list; this is equivalent to IN
X != ANY(list): is true if X is not equal to any item in the list. This means that there is at least one value in the list that is not equal to X.

Correlated Subqueries

In a correlated query, the subquery uses values from the outer query. The most common use of a correlated query is with the EXISTS operator. In its most basic form, it is an alternative to using IN. Here is a query with EXISTS that is equivalent to an example with IN that we saw above:

 SQL> SELECT OWNER, MODEL   2  FROM CAR_CLUB C   3  WHERE EXISTS (SELECT *   4                FROM ANTIQUE_CARS   5                WHERE MODEL=C.MODEL); OWNER        MODEL ------------ ------------ Bob          Terraplane Lou          Bearcat

Notice that we use the asterisk (*) as the select list in the subquery. That is because we aren't returning anything from the subquery to the outer query other than whether or not there is a row that matches the condition in the WHERE clause.

Also notice that in the subquery we need to qualify the columns from the outside query's table. Within the subquery, if there are columns with the same name in both the subquery's table and the outer query's table, the column is assumed to belong to the subquery's table if it isn't qualified as belonging to the outer query's table. We can use the table name or, more conveniently as we do here, an abbreviated alias.

The advantage of using a correlated subquery over an uncorrelated one is that we can use more than one column in our matching condition, and we can match on more complex conditions.

Suppose that we want to find out which fruits in the FRUIT table are ripe at least, as far as we can tell using the following FRUIT_COLORS table:

 FRUIT_COLORS FRUIT      VARIETY          RIPE_COLOR Apple      Granny Smith     Green Apple      McIntosh         Red Orange     Mandarin         Orange Banana     Cavendish        Yellow

We can do that with a correlated subquery that matches rows in the outer query to rows in the subquery. What we want to know is whether, for each set of FRUIT and COLOR in the FRUIT table, there exists a matching entry in the FRUIT_COLORS table. For example, APPLE and GREEN match, but ORANGE and GREEN do not.

 SQL> SELECT FRUIT, COLOR   2  FROM FRUITS F   3  WHERE EXISTS   4    (SELECT *   5     FROM FRUIT_COLORS   6     WHERE F.COLOR=RIPE_COLOR AND F.FRUIT=FRUIT); FRUIT      COLOR ---------- ---------- Apple      Green Apple      Red Orange     Orange

We could also use an alias, F, for the FRUITS table to make the statement less cluttered:

 SELECT FRUIT, COLOR FROM FRUITS F WHERE EXISTS   (SELECT *    FROM FRUIT_COLORS    WHERE F.COLOR=RIPE_COLOR AND F.FRUIT=FRUIT);

Also notice again that we use (*) for the columns we aren't interested in any particular column, just whether or not a row matching the WHERE condition exists.

We also could have used a join to get the same information:

 SQL> SELECT F.FRUIT, F.COLOR   2  FROM FRUITS F, FRUIT_COLORS FC   3  WHERE F.COLOR=FC.RIPE_COLOR   4  AND F.FRUIT=FC.FRUIT; FRUIT      COLOR ---------- ---------- Apple      Green Orange     Orange Apple      Red

The join is more efficient because the subquery is repeated for each row in our query. The join also has another advantage: We can easily use values from the second table in a select list. Once we've matched a fruit and color from one table with a row in the other table, we can identify the variety:

 SQL> SELECT F.FRUIT, F.COLOR, FC.VARIETY   2  FROM FRUITS F, FRUIT_COLORS FC   3  WHERE F.COLOR=FC.RIPE_COLOR   4  AND F.FRUIT=FC.FRUIT; FRUIT      COLOR      VARIETY ---------- ---------- ---------------- Apple      Green      Granny Smith Orange     Orange     Mandarin Apple      Red        McIntosh

It's possible to do this with subqueries, but it's messy. We have to use two separate subqueries one in the WHERE clause to set the condition and another similar one in the SELECT statement to get the result:

 SQL> SELECT FRUIT, COLOR,   2  (SELECT VARIETY FROM FRUIT_COLORS   3      WHERE F.COLOR=RIPE_COLOR AND F.FRUIT=FRUIT)   4      AS VARIETY   5   FROM FRUITS F   6   WHERE EXISTS   7     (SELECT *   8     FROM FRUIT_COLORS   9          WHERE F.COLOR=RIPE_COLOR AND F.FRUIT=FRUIT); FRUIT      COLOR      VARIETY ---------- ---------- ---------------- Apple      Green      Granny Smith Apple      Red        McIntosh Orange     Orange     Mandarin

Using Subqueries with CREATE, INSERT, UPDATE, and DELETE

Subqueries within CREATE, INSERT, UPDATE, and DELETE statements aren't as indispensable as they are within SELECT statements. They are sometimes convenient, however, so we'll take a brief look at a few of these uses for subqueries.

CREATE TABLE

You can substitute the column list in a CREATE TABLE statement with AS plus a subquery if you want to duplicate an existing table. The names and types of the columns returned by the subquery determine the names and types of columns in the table. The rows returned by the subquery are inserted into the table after the table is created. The simplest case, creating a copy of a table with the same columns and rows, has this basic format:

 CREATE TABLE target_table AS SELECT * from source_table;

You can specify a select list if you want to copy only some columns or want to create new columns, using functions and expression. You can add a WHERE clause if you want to copy only certain rows. (If you wanted to copy structure but no rows, you could specify an impossible WHERE clause, such as WHERE 1=2.) This more general format is:

 CREATE TABLE target_table AS SELECT select_list FROM table_list WHERE condition;

Suppose we wanted to create a new car club exclusively for owners of antique cars. While we're at it, we'll include not only the make but also the model in the table this means we'll use a join in our subquery. We could create a new table like this:

 SQL> CREATE TABLE ANTIQUE_CAR_CLUB   2     AS SELECT C.OWNER, C.MODEL, A.MAKE   3             FROM CAR_CLUB C, ANTIQUE_CARS A   4             WHERE C.MODEL=A.MODEL; Table created.

Let's take a look at our new, fully furnished table:

 SQL> SELECT * FROM ANTIQUE_CAR_CLUB; OWNER        MODEL        MAKE ------------ ------------ ------------ Lou          Bearcat      Stutz Bob          Terraplane   Hudson

An important thing to note, however, is that CREATE TABLE with a subquery copies only the rows and columns of a table. It does not copy any constraints, such as primary or foreign keys, or indexes. Another thing to note is that this statement causes an automatic commit, as is typical for DDL statements; this commit includes the rows that are inserted.

INSERT

Subqueries can be used in INSERT statements to populate one table with data from another table. This is easiest if the tables have the same structure they must have the same number of columns, compatible by types, column by column and you want to copy all the rows from one table:

 INSERT INTO target_table; SELECT * FROM source_table;

As I've warned elsewhere, it isn't a good idea to depend on the columns in a table being in a specific order. Columns sometimes get added or dropped. Although optional, it's best to specify both the column list for the insert table and the select list for the subquery. The general format is:

 INSERT INTO target_table (column list) SELECT select_list FROM table_list WHERE condition;

Let's suppose that we want to create a unified list of cars by combining the ANTIQUE_CARS and MODERN_CARS. First we'll create the table using CREATE TABLE with the subquery we just saw:

 SQL>    CREATE TABLE ALL_CARS   2     AS SELECT * FROM ANTIQUE_CARS; Table created.

It already has the cars from the ANTIQUE_CARS, so let's add the cars from MODERN_CARS:

 SQL> INSERT INTO ALL_CARS (MAKE, MODEL)   2  SELECT MAKE, MODEL FROM MODERN_CARS; 3 rows created.

Now let's see what we've got:

 SQL> select * from all_cars; MODEL        MAKE ------------ ------------ Terraplane   Hudson Firedome     DeSoto Bearcat      Stutz Firedome     DeSoto Mustang      Ford Corvette     Chevrolet Cherokee     Jeep 7 rows selected.

As we expect, this new table now contains the combined information from the two tables we used to create it.

UPDATE

We can use subqueries in two ways with UPDATE. First, we can use a single-value subquery with a SET clause to obtain values from other tables. Second, we can use a subquery in the WHERE clause to determine which record or records get updated.

Let's look at the first use, updating a field using a value obtained by a subquery. The basic format is:

 UPDATE table_name SET column_name1= {value1|subquery1}    [, column_name2= {value2|subquery2} [,...]] WHERE condition;

Before trying this and any of the remaining examples in UPDATE and DELETE, you may want to COMMIT your changes up until this point so that you can ROLLBACK the changes afterward.

Suppose that a car club member, Bob, sold his Stutz and bought a DeSoto. Assuming that, in this closed universe, the only DeSoto model is the one in the ANTIQUE_CARS table, we can update the CAR_CLUB by obtaining the MODEL from ANTIQUE_CARS using a subquery:

 SQL> UPDATE CAR_CLUB   2  SET MODEL = (SELECT MODEL   3               FROM ANTIQUE_CARS   4               WHERE MAKE='DeSoto')   5  WHERE OWNER='Bob'; 1 row updated.

We can also use a subquery in an UPDATE statement's WHERE clause. Let's double the price of each fruit in the FRUITS table that is ripe, according to the FRUIT_COLORS table. For this, we'll use a correlated subquery:

 SQL> UPDATE FRUITS F   2     SET PRICE=PRICE*2   3     WHERE EXISTS (SELECT * FROM FRUIT_COLORS C   4                   WHERE F.FRUIT=C.FRUIT   5                   AND F.COLOR=C.RIPE_COLOR); 3 rows updated. SQL> SELECT FRUIT, PRICE FROM  FRUITS; FRUIT           PRICE ---------- ---------- Apple               1 Apple               1 Mango             1.5 Mangosteen          2 Durian             15 Orange              2 6 rows selected.

DELETE

In a DELETE statement, we can use a subquery in the WHERE clause to determine which rows will be deleted. The basic format is:

 DELETE FROM table_name WHERE condition(subquery);

Let's remove any fruits from the FRUIT_COLORS table that don't appear in the FRUITS table. (Actually, there is only one, banana.)

 SQL> DELETE FROM FRUIT_COLORS   2     WHERE FRUIT NOT IN (SELECT FRUIT   3                         FROM FRUITS); 1 row deleted.