| ||
So, you need to know how to access a database using SQL. SQL is the most common form of database access. Additionally, SQL can be used to cater to merging relational databases with XML documents and other XML technology. Lets first briefly describe the basics of the SQL language.
SQL works by reading records from the tables in a relational database.
A table is the structure applied to repetitions of some data item. For example, in order to store customers in a database, you need to create a table for those customers. Those customers have names and addresses. A customer table consists of a field for the customer name and a field for the address of the customer.
A record is the name and address of each customer. Each record is a repetition of the structure of the table (the name and address fields).
SQL consists of a number of sections:
Queries: The SELECT command is used to read data from a relational database. There are various options when using the SELECT command, which can be used to alter a query:
SELECT command: Retrieve all records from a single table.
WHERE clause: Retrieve some records from a table by filtering the records using a WHERE clause. The filter can be used to include a subset of a tables rows, or exclude a subset of a tables rows, or both.
ORDER BY clause: Retrieve records from a table, but change the order that those records are stored in, using the ORDER BY clause.
GROUP BY clause: Before returning records from a database, summarize them into fewer rows using the GROUP BY clause. You can also use the HAVING clause to filter grouped records.
JOIN clause: The JOIN clause can be used to query records from more than one table. The JOIN clause allows you to merge records from two tables based on common values. There are various types of joins creating intersections, outer joins, and various other types of joins.
Subquery : A subquery allows one query to be embedded within another query, such that one query is the calling query, and the other query is executed by the calling query. It is also possible to pass values from the calling query to the subquery. Passing values from calling query to subquery allows the calling query to determine the records that the subquery retrieves from the database, for each record retrieved by the calling query.
UNION clause : The UNION clause can be used to create a composite of two queries where all records are returned from both queries as a merge of the data in two tables. Unlike a join or a subquery, both of which allow matching of values between two tables, a composite query does not allow any matching.
Changing data: INSERT , UPDATE , and DELETE commands can be used to change data in a database:
INSERT command: Add new records to a table using the INSERT command.
UPDATE command: Change existing records in a table using the UPDATE command.
DELETE command: Delete existing records in a table using the DELETE command.
Metadata change commands: Records stored in database tables are the actual data stored in the database, such as the names of customers. The metadata in a relational database is the structure of that data, such as the customer table. The table is the box used to store the customers in. In my example, the customer table contains a field for the customer name and a field for the address of the customer. So the metadata is the data about the data. There are various commands that allow changes to metadata. For example, to create a table for customers you would use a CREATE TABLE command. To change the customer table you could use an ALTER TABLE command. To drop a customer table you would use a DROP TABLE command.
Metadata commands are not covered in this book as they are too far removed from the topic of XML. This book is about XML and databases. XML documents are all about data, not changing the structure of data.
Lets go into some detail on the use of relational database access commands. Lets begin by examining queries.
A query is a question asked of a database. The answer to the question is that records are returned from tables stored in that database. There are various methods of querying data.
The simplest method of writing a query is using a basic SELECT statement, consisting of a SELECT statement and a FROM clause. The SELECT statement determines what is retrieved. The FROM clause determines where data is retrieved from, enhancing the SELECT statement. This is the syntax of the SELECT statement:
SELECT { [alias.]field expression [alias.]* [, field ] } FROM table [alias];
This query finds all records in a table called COUNTRY , in the demographics database:
SELECT * FROM COUNTRY;
The * in the SELECT statement fields list finds all fields in a table.
See Appendix B for details of the demographics database, which is used for examples throughout this book.
This is a partial result of the preceding query (both records and fields have been removed from the output):
COUNTRY_ID REGION_ID COUNTRY CO POPULATION AREA ---------- ---------- -------------------------------- -- ---------- ---------- 1 1 Algeria AG 32930091 2381741 2 1 Angola AO 12127071 1246699 3 1 Benin BN 7862944 110619 4 1 Botswana BC 1639833 585371 5 1 Burkina Faso UV 13902972 273799 6 1 Burundi BY 8090068 25649 7 1 Central African Republic CT 4303356 622980 8 1 Congo CG 62660551 2267599 9 1 Djibouti DJ 486530 21979 10 1 Equatorial Guinea EK 540109 28050 11 1 Ethiopia ET 74777981 1119683
You can also get a single field using a query like this:
SELECT COUNTRY FROM COUNTRY;
And this is a partial result:
COUNTRY -------------------------------- Afghanistan Albania Algeria American Samoa Angola Argentina Armenia Australia Austria Azerbaijan Bahamas
Or you can specify to retrieve a number of fields:
SELECT COUNTRY, POPULATION, AREA FROM COUNTRY;
Again, this is the partial result:
COUNTRY POPULATION AREA -------------------------------- ---------- ---------- Algeria 32930091 2381741 Angola 12127071 1246699 Benin 7862944 110619 Botswana 1639833 585371 Burkina Faso 13902972 273799 Burundi 8090068 25649 Central African Republic 4303356 622980 Congo 62660551 2267599 Djibouti 486530 21979 Equatorial Guinea 540109 28050 Ethiopia 74777981 1119683
You can even execute an expression, as shown previously in the syntax definition of the SELECT statement. This query finds the population density, or the number of people per square kilometer, for each country and rounds it up to zero decimal places:
SELECT COUNTRY, ROUND (POPULATION / AREA, 0) "Population Density" FROM COUNTRY;
The result is shown here. The division by zero error is caused because some of the AREA values in the table are zero. Dividing any number by zero is impossible and thus mathematically undefined:
COUNTRY Population Density -------------------------------- ------------------ Algeria 14 Angola 10 Benin 71 Botswana 3 Burkina Faso 51 Burundi 315 Central African Republic 7 Congo 28 Djibouti 22 Equatorial Guinea 19 Ethiopia 67 Gabon 6 Gambia 0 Ghana 97 Guinea 39 ERROR: ORA-01476: divisor is equal to zero
In the preceding query and query result, the field header is renamed using the quoted string Population Density , appearing after the expression. The quoted value changes the name of the header and not the field name in the query. The AS clause can be used to change a field name, within the current query. The AS clause will be discussed later on in this chapter when it is appropriate.
Before going further it seems sensible to find a simple resolution for the division by zero issue. If you are executing these examples against a database add a function something like this:
CREATE OR REPLACE FUNCTION ZD(pINT IN INTEGER) RETURN VARCHAR2 IS BEGIN RETURN NVL(NULLIF(pINT, 0), 1); END; /
The preceding function will compile for an Oracle database. If using SQL Server or a different relational database, you may have to recode for appropriate syntax.
The query result using the ZD function is as follows :
SELECT COUNTRY, ROUND (POPULATION / ZD(AREA)) "Population Density" FROM COUNTRY;
And a partial query result using the ZD function is as follows:
COUNTRY Population Density -------------------------------- ------------------ Algeria 14 Angola 10 Benin 71 Botswana 3 Burkina Faso 51 Burundi 315 Central African Republic 7 Congo 28 Djibouti 22 Equatorial Guinea 19 Ethiopia 67
The SELECT statement can also use what is called an alias . An alias allows renaming a table during the execution of a query. An alias is defined in the FROM clause, and then used in the list of fields selected to refer to the table found by the FROM clause:
SELECT C.COUNTRY, ROUND (C.POPULATION / C.AREA, 0) "Population Density" FROM COUNTRY C;
The division by zero error should appear again.
In the preceding query, the COUNTRY table is aliased as the letter C . Each of the fields retrieved from the COUNTRY table is then referenced using an alias.field notation. The result of the preceding query will be identical to the previous example. You will learn the real usefulness of using aliases when you read about join queries later on in this chapter.
A join query is a query that joins fields and records together between two tables, returning records from both tables.
The WHERE clause enhances a SELECT statement by filtering out unwanted records returned by a SELECT statement. This is the basic syntax of the WHERE clause:
SELECT ... FROM ... [ WHERE [table.alias.] { field expression } comparison [table.alias.] { field expression } ];
The SELECT statement and the FROM clause are mandatory. The WHERE clause is optional and is thus enclosed in square brackets as in [ WHERE ˆ ] .
This query finds all countries with a population of over 100 million people:
SELECT COUNTRY, TO_CHAR(POPULATION, '9,999,999,990') "Population" FROM COUNTRY WHERE POPULATION > 100000000;
The preceding query uses a special TO_CHAR function in order to format the output of the large num bers into a more readable format.
Also in the preceding query, note the use of the single quote (') character. Which types of quotation char acters are used, and in which types of commands, varies from one database engine to another.
This is the result of the preceding query:
COUNTRY Population -------------------------------- -------------- Nigeria 131,859,731 Mexico 107,449,525 China 1,313,973,713 Indonesia 245,452,739 Japan 127,463,611 Bangladesh 147,365,352 India 1,095,351,995 Pakistan 165,803,560 United States 298,444,215 Russia 142,893,540 Brazil 188,078,227
The WHERE clause can apply multiple filters to selected records. Multiple filters are placed into the same query by joining the two filters together using logical operators called AND , OR , and NOT . Logical operators are shown in the following syntax:
SELECT ... FROM ... [ WHERE filter [ { AND OR } [ NOT ] [table.alias.] { field expression } comparison [table.alias.] { field expression } [ { AND OR } [ NOT ] ... ] ];
This query finds all countries with a population over 100 million people and a population density of over 100 people per square kilometer. The two different comparisons are joined together using the AND logical operator (or conjunction):
SELECT COUNTRY, TO_CHAR(POPULATION, '9,999,999,990') "Population", ROUND(POPULATION / ZD(AREA)) "Population Density" FROM COUNTRY WHERE POPULATION > 100000000 AND ROUND(POPULATION / ZD(AREA) , 0) > 100;
Notice how the expression ROUND (POPULATION / ZD(AREA), 0) is used in the WHERE clause as well as retrieved by the SELECT clause. Also, the division by zero error is removed using the AREA field filter.
This is the result of the preceding query:
COUNTRY Population Population Density -------------------------------- -------------- ------------------ Nigeria 131,859,731 145 China 1,313,973,713 141 Indonesia 245,452,739 135 Japan 127,463,611 323 Bangladesh 147,365,352 1100 India 1,095,351,995 368 Pakistan 165,803,560 213
And heres another example using the OR operator in addition to the AND operator:
SELECT COUNTRY, TO_CHAR(POPULATION, '9,999,999,990') "Population", AREA "Km sq.", ROUND(POPULATION / ZD(AREA), 0) "Population Density" FROM COUNTRY WHERE ((POPULATION > 100000000 AND (ROUND(POPULATION / ZD(AREA), 0) > 100)) OR (POPULATION < 100000000 AND (ROUND(POPULATION / ZD(AREA), 0) > 500)));
This query finds heavily populated countries based on population density, or just high density (a smaller country with a high population density). For example, Singapore has a relatively low population but is a very small country, more like a city-state. Singapore is very densely populated because all of its people are packed like sardines onto a small island:
COUNTRY Population Km sq. Population Density -------------------------------- -------------- ---------- ------------------ Mauritius 1,240,827 1849 671 Nigeria 131,859,731 910771 145 Barbados 279,912 430 651 Bermuda 65,773 49 1342 Malta 400,214 321 1247 China 1,313,973,713 9326411 141 Indonesia 245,452,739 1811831 135 Japan 127,463,611 394744 323 Singapore 4,492,150 624 7199 Taiwan 23,036,087 32261 714 Bahrain 698,585 619 1129 Bangladesh 147,365,352 133911 1100 India 1,095,351,995 2973190 368 Pakistan 165,803,560 778720 213
There are also types of comparisons you can use as shown in the following query, including <= , >= , BETWEEN n AND m, LIKE 'pattern' , and of course = for equality:
SELECT REGION_ID, COUNTRY, TO_CHAR(POPULATION, '9,999,999,990') "Population", AREA "Km sq.", ROUND(POPULATION / ZD(AREA), 0) "Population Density" FROM COUNTRY WHERE ((POPULATION <= 1000000 AND ROUND(POPULATION / ZD(AREA), 0) <= 100) OR (ROUND(POPULATION / ZD(AREA), 0) BETWEEN 0 AND 10)) AND REGION_ID = 1 AND COUNTRY LIKE '%a%';
The preceding query finds heavily populated countries in region 1 (the continent of Africa), as long as the country name contains a letter a somewhere in its name. The result is this:
REGION_ID COUNTRY Population Km sq. Population Density ---------- ------------------------ -------------- ---------- ------------------ 1 Angola 12,127,071 1246699 10 1 Botswana 1,639,833 585371 3 1 Central African Republic 4,303,356 622980 7 1 Equatorial Guinea 540,109 28050 19 1 Gabon 1,424,906 257669 6 1 Gambia 0 10000 0 1 Ivory Coast 0 0 0 1 Libya 5,900,754 1759540 3 1 Mali 11,716,829 1219999 10 1 Mauritania 3,177,388 1030400 3 1 Namibia 2,044,147 823291 2 1 Zaire 0 0 0 12 records
Another important factor to consider in SQL at this stage is the concept of precedence. Precedence of execution is the sequence in which the different filters are applied to the records retrieved by the SELECT statement. The previous query is duplicated here but formatted slightly differently, with everything spread out more clearly according to precedence of execution established by use of the round brackets:
SELECT REGION_ID, COUNTRY, TO_CHAR(POPULATION, AREA "Km sq.", ROUND(POPULATION / ZD(AREA), 0) "Population Density" FROM COUNTRY WHERE ((POPULATION <= 1000000 AND ROUND (POPULATION / ZD(AREA) , 0) <= 100) OR (ROUND (POPULATION / ZD(AREA) , 0) BETWEEN 0 AND 10)) AND REGION_ID = 1 AND COUNTRY LIKE '%a%';
This is the sequence in which the preceding query WHERE clause comparisons are executed, based on the way that the round brackets determine precedence of execution:
The top-most ROUND function against the <= 100 comparison is executed:
The ROUND function is executed first as determined by expression precedence. At this point, I am dis cussing WHERE clause filtering comparison precedence.
AND ROUND (POPULATION / ZD(AREA) , 0) <= 100
The second comparison ( BETWEEN 0 AND 10 ), containing the ROUND function is executed:
ROUND (POPULATION / ZD(AREA) , 0) BETWEEN 0 AND 10
The POPULATION <= 1000000 is then logically combined with the result of the comparison in Step 1:
(POPULATION <= 1000000 AND ROUND (POPULATION / ZD(AREA) , 0) <= 100)
The results of the comparisons in Steps 3 and 2 are then combined using the OR logical operator, meaning that either Step 2 or Step 3 can be true to produce a true result:
((POPULATION <= 1000000 AND ROUND (POPULATION / ZD(AREA) , 0) <= 100) OR (ROUND (POPULATION / ZD(AREA) , 0) BETWEEN 0 AND 10))
Finally the two AND logical operators, executed as the AND REGION_ID = 1 comparison, followed by the AND COUNTRY LIKE '%a%' comparison, are executed to produce a resulting true or false result for every record retrieved by the SELECT statement:
AND REGION_ID = 1 AND COUNTRY LIKE '%a%'
Technically, ZD(AREA) is computed first, but since the ZD function is used everywhere for a population density calculation, I have assumed it a part of the contained expression in this situation. The objective is to give you an overall picture of the process.
Just to demonstrate the use of round brackets and how they apply precedence, the query that follows is the same as the previous query, except that I have removed all the round brackets, which determine the precedence of execution of comparisons in the WHERE clause (excluding those encapsulating the ROUND function as an expression):
SELECT REGION_ID, COUNTRY, TO_CHAR(POPULATION, '9,999,999,990') "Population", AREA "Km sq.", ROUND(POPULATION / ZD(AREA), 0) "Population Density" FROM COUNTRY WHERE POPULATION <= 1000000 AND ROUND(POPULATION / ZD(AREA), 0) <= 100 OR ROUND(POPULATION / ZD(AREA), 0) BETWEEN 0 AND 10 AND REGION_ID = 1 AND COUNTRY LIKE '%a%';
In this situation the AND operator has higher precedence than the OR operator and thus all operators are processed as they are encountered (left to right and top to bottom). As shown in the code that follows, the result of the query is completely different. This is a partial result and even this shows extra countries, which are not even in Africa:
REGION_ID COUNTRY Population Km sq. Population Density ---------- ------------------------ -------------- ---------- ------------------ 1 Angola 12,127,071 1246699 10 1 Botswana 1,639,833 585371 3 1 Central African Republic 4,303,356 622980 7 1 Djibouti 486,530 21979 22 1 Equatorial Guinea 540,109 28050 19 1 Gabon 1,424,906 257669 6 1 Gambia 0 10000 0 1 Ivory Coast 0 0 0 1 Libya 5,900,754 1759540 3 1 Mali 11,716,829 1219999 10 1 Mauritania 3,177,388 1030400 3 1 Namibia 2,044,147 823291 2 1 Zaire 0 0 0 4 Bahamas 0 10070 0 6 Cyprus 784,301 9241 85 6 Iceland 299,388 100251 3 6 Northern Ireland 0 0 0 6 Scotland 0 0 0 6 Slovak Republic 0 48800 0 6 Yugoslavia 0 102136 0 10 Greenland 56,361 341701 0 11 Comoros Islands 0 0 0
In the preceding query, excluding comparison precedence round brackets, there are now 18 records instead of the original 9 records. Djibouti is in Africa but has too low a population, failing the LIKE operator test. The rest of the countries have too low a population, none are in Africa, some fail the LIKE test, and many fail the population density test. However, population density is not a prerequisite because it can be overridden by the low population size comparison. That clearly demonstrates why the round brackets are used to change the precedence of execution, of comparisons, in the WHERE clause.
The only other rule to remember is that the logical operators have an overriding order of precedence. NOT is executed first, followed by AND , and finally by OR . Use of NOT AND or NOT OR requires failure of a test to get a true result.
The precedence in WHERE clause comparisons determines the sequence in which those comparisons are executed. Mathematically speaking, an expression can be executed in a specific order.
An expression is a single valued, or multiple valued, calculation that when evaluated produces a single result. The number 2 is an expression because its result is 2. The string (5*10)+3 is also an expression because the result is 53. More abstractly, (x*y)+z is also an expression because it equals (x*y)+z.
The expression ((100/((x+y)*z))-5) is an expression in itself, and also contains multiple expressions within it, all producing a single result, where each result is passed back to the calling (or container) expression. This expression can be divided up as follows, where expressions proceed outwards with indentations. It begins with the expression x+y, followed by (x+y)*z, 100/((x+y)*z), (100/((x+y)*z))-5, and finally the round brackets around the entire expression. The round brackets are unnecessary but make the resulting expression look a little easier to read:
((100/ ((x+y) *z)) -5)
For an example of expression precedence, consider the following:
((100/((2+3)*2))-5) = ((100/((2+3)*2))-5) = ((100/(5*2))-5) = ((100/10)-5) = (10-5) = 5
Now remove all the brackets from the preceding expression and simple rules of mathematical expression apply, where * and / are executed before + and ˆ , but * has equal precedence to /, and + has equal precedence to ˆ . Additionally, the expression is evaluated (calculated) from left to right:
100/2+3*2-5 = 50+3*2-5 = 50+6-5 = 56-5 = 1
As you see in the preceding code, no round brackets and a different value results. As you saw in queries executed earlier, mathematical expressions are used in SQL, specifically when using the ROUND function in the ROUND (POPULATION / ZD(AREA), 0) calculation. This function executes the division first and then returns a result that is rounded up to the nearest zero. In this query the ROUND function is removed from the fields and expressions retrieved from the table:
SELECT COUNTRY, TO_CHAR(POPULATION, '9,999,999,990') "Population", AREA "Km sq.", POPULATION / ZD(AREA) "Population Density" FROM COUNTRY WHERE ((POPULATION <= 1000000 AND ROUND(POPULATION / ZD(AREA), 0) <= 100) OR (ROUND(POPULATION / ZD(AREA), 0) BETWEEN 0 AND 10)) AND REGION_ID = 1 AND COUNTRY LIKE '%a%';
The resulting query contains a population density figure that is more difficult to read at first glance:
COUNTRY Population Km sq. Population Density ------------------------ -------------- ---------- ------------------ Angola 12,127,071 1246699 9.72734477 Botswana 1,639,833 585371 2.80135675 Central African Republic 4,303,356 622980 6.90769527 Equatorial Guinea 540,109 28050 19.2552228 Gabon 1,424,906 257669 5.52998615 Gambia 0 10000 0 Ivory Coast 0 0 0 Libya 5,900,754 1759540 3.35357764 Mali 11,716,829 1219999 9.60396607 Mauritania 3,177,388 1030400 3.08364519 Namibia 2,044,147 823291 2.4828973 Zaire 0 0 0
One more point about expression precedence is that any function, such as the ROUND function, has higher precedence than all the arithmetic operators *, /, +, and ˆ . Also some versions of SQL, for some database engines, will use a POWER function. The POWER function raises one number to the power of another. For example 2 3 = 8, but could also be executed as POWER (2, 3) = 8, depending on the database engine. Some database engines even have a special exponent operator for executing exponents, sometimes using a ^ character, or even a ^^ character sequence. And thus 2^3 or 2^^3 might both be equal to 8, depending on the database engine in use. In this case, the precedence of arithmetic operators would be ROUND (or any other function), followed by ^ or ^^, followed by * and /, and finally + and ˆ .
Mathematically, exponentiation is actually the conversion of large numbers to exponential notation where the number 1,000,000,000,000 can also be represented as 1*10 12 , where 12 is the exponent, and the number 1*10 12 is an exponential notational expression of the number 1,000,000,000,000. In all the database engines I have worked with in the past, the term exponent is applied to raising a number to the power of another. The exponent is actually the number that another number is raised by, not the entire expression itself. Thus, in the expression 2 3 , 2 is the number, and 3 is the exponent.
The ORDER BY clause can be used to further enhance a SELECT statement by changing the order in which records are returned. This is the basic syntax of the ORDER BY clause:
SELECT ... FROM ... [ WHERE ... ] [ ORDER BY { field expression [ASC DESC] [ , ... ] } ];
The SELECT statement and the FROM clause are mandatory. The WHERE clause and the ORDER BY clause are both optional and thus enclosed in square brackets as in [ ORDER BY ˆ ] .
The ORDER BY clause always appears last in a query. The ORDER BY clause is applied to the filtered records after the application of the WHERE clause, not before records are filtered.
This is the same query used before, except that now the records are returned in order of decreasing population density, within increasing REGION_ID value:
SELECT REGION_ID, COUNTRY, TO_CHAR(POPULATION, '9,999,999,990') "Population", AREA "Km sq.", ROUND(POPULATION / ZD(AREA), 0) "Population Density" FROM COUNTRY WHERE ((POPULATION <= 1000000 AND ROUND(POPULATION / ZD(AREA), 0) <= 100) OR (ROUND(POPULATION / ZD(AREA), 0) BETWEEN 0 AND 10)) AND REGION_ID IN (1, 13) AND COUNTRY LIKE '%a%' ORDER BY REGION_ID ASC, ROUND(POPULATION / ZD(AREA), 0) DESC;
Like the WHERE clause, expressions can be used in the ORDER BY clause, as in ROUND(POPULATION / ZD(AREA), 0) DESC .
A new comparison is introduced in the form of REGION_ID IN (1, 13) . The IN operator allows val idation against a list of values. In this case, the REGION_ID can be 1 (Africa), or 13 (South America).
Here is the result of the preceding query with Equatorial Guinea and Bolivia being the most densely populated countries in Africa and South America, respectively:
REGION_ID COUNTRY Population Km sq. Population Density ---------- ------------------------ -------------- ---------- ------------------ 1 Equatorial Guinea 540,109 28050 19 1 Angola 12,127,071 1246699 10 1 Mali 11,716,829 1219999 10 1 Central African Republic 4,303,356 622980 7 1 Gabon 1,424,906 257669 6 1 Botswana 1,639,833 585371 3 1 Mauritania 3,177,388 1030400 3 1 Libya 5,900,754 1759540 3 1 Namibia 2,044,147 823291 2 1 Zaire 0 0 0 1 Ivory Coast 0 0 0 1 Gambia 0 10000 0 13 Bolivia 8,989,046 1084389 8 13 Guyana 767,245 196850 4 13 French Guiana 199,509 89150 2 13 Surinam 0 161471 0
The AS clause can be used to rename a field within the scope of a running query. In the previous query the quoted strings Population and Population Density were used to change only the field headers in the output. This next query changes the name of the population density expression ROUND(POPULATION / ZD(AREA), 0) , giving the expression a field name of its own. This allows access to the name of the expression in the ORDER BY clause, rather than duplicating the entire expression:
SELECT REGION_ID, COUNTRY, TO_CHAR(POPULATION, '9,999,999,990') "Population", AREA "Km sq.", ROUND(POPULATION / ZD(AREA), 0) AS DENSITY FROM COUNTRY WHERE ((POPULATION <= 1000000 AND ROUND(POPULATION / ZD(AREA), 0) <= 100) OR (ROUND(POPULATION / ZD(AREA), 0) BETWEEN 0 AND 10)) AND REGION_ID IN (1, 13) AND COUNTRY LIKE '%a%' ORDER BY REGION_ID ASC, DENSITY DESC;
The result of the preceding query is very similar to the previous querys output, as you can see in the code that follows, except that the field name of the calculated expression is now DENSITY as opposed to Population Density :
REGION_ID COUNTRY Population Km sq. DENSITY ---------- ------------------------ -------------- ---------- ---------- 1 Equatorial Guinea 540,109 28050 19 1 Angola 12,127,071 1246699 10 1 Mali 11,716,829 1219999 10 1 Central African Republic 4,303,356 622980 7 1 Gabon 1,424,906 257669 6 1 Botswana 1,639,833 585371 3 1 Mauritania 3,177,388 1030400 3 1 Libya 5,900,754 1759540 3 1 Namibia 2,044,147 823291 2 1 Zaire 0 0 0 1 Ivory Coast 0 0 0 1 Gambia 0 10000 0 13 Bolivia 8,989,046 1084389 8 13 Guyana 767,245 196850 4 13 French Guiana 199,509 89150 2 13 Surinam 0 161471 0
You cannot, however, do as in the following query and replace the density expression in the WHERE clause. The reason is very simple. The ORDER BY clause is always performed after all records have been retrieved from the database. Therefore the calculated expression exists as DENSITY at the point of execution of the ORDER BY clause. The WHERE clause on the other hand is executed as records are retrieved from the database. Consequently, the DENSITY field does not actually exist until after the WHERE clause has been applied to database access. The following query will return an error because the WHERE clause filters cannot find the field called DENSITY :
SELECT REGION_ID, COUNTRY, TO_CHAR(POPULATION, '9,999,999,990') "Population", AREA "Km sq.", ROUND(POPULATION / ZD(AREA), 0) AS DENSITY FROM COUNTRY WHERE ((POPULATION <= 1000000 AND DENSITY <= 100) OR ( DENSITY BETWEEN 0 AND 10)) AND REGION_ID IN (1, 13) AND COUNTRY LIKE '%a%' ORDER BY REGION_ID ASC, DENSITY DESC;
Most database engines allow ORDER BY clause specification not only by field names and expression, but also by the position of a field (or expression), in the list of items retrieved by the SELECT statement. So you could change the ORDER BY clause to that shown in the following query, and get the same result as in the previous query:
SELECT REGION_ID, COUNTRY, TO_CHAR(POPULATION, '9,999,999,990') "Population", AREA "Km sq.", ROUND(POPULATION / ZD(AREA), 0) AS DENSITY FROM COUNTRY WHERE ((POPULATION <= 1000000 AND ROUND(POPULATION / ZD(AREA), 0) <= 100) OR (ROUND(POPULATION / ZD(AREA), 0) BETWEEN 0 AND 10)) AND REGION_ID IN (1, 13) AND COUNTRY LIKE '%a%' ORDER BY 1 ASC, 5 DESC;
In the preceding query and ORDER BY clause, the REGION_ID field appears in the first position in the SELECT items list, and the population density expression appears fifth in the SELECT statement items list. You can even mix positional values in the ORDER BY clause with field names and expressions. Additionally, some databases do not require that ORDER BY clause items actually be retrieved, as in the following example:
SELECT REGION_ID, REGION FROM REGION ORDER BY ROUND(POPULATION / ZD(AREA), 0) DESC;
In the query that follows, it is clear to see that the density expression calculation is not returned by the query:
REGION_ID REGION ---------- -------------------------------- 9 Near East 4 Caribbean 7 Far East 6 Europe 2 Asia 5 Central America 8 Middle East 1 Africa 13 South America 10 North America 11 Oceania 12 Russian Federation 3 Australasia
The ORDER BY clause is simple compared with the WHERE clause.
The GROUP BY clause can be used to further enhance a SELECT statement by summarizing records returned from the SELECT statement. This is the basic syntax of the GROUP BY clause:
SELECT ... FROM ... [ WHERE ... ] [ GROUP BY expression [, ... ] ] [ ORDER BY ... ];
The SELECT statement and the FROM clause are mandatory. The WHERE clause, the ORDER BY clause, and the GROUP BY clause are all optional and thus enclosed in square brackets as in [ GROUP BY ˆ ] .
The ORDER BY clause always appears last in a query. Therefore the ORDER BY clause is applied to both filtered and summarized records, after the application of both the WHERE clause and the GROUP BY clause.
The next query reads the COUNTRY table. It finds global population figures, for all countries, in all regions . This particular query produces total regional populations sorted in decreasing order of regional population:
SELECT REGION_ID, TO_CHAR(SUM(POPULATION)/1000000000, '990.9')' billion' "Population" FROM COUNTRY GROUP BY REGION_ID ORDER BY SUM(POPULATION) DESC;
See Appendix B for details of the demographics database, which is used for examples throughout this book.
Here is the result of the preceding query including regions with relatively negligible (zero) populations:
REGION_ID Population ---------- -------------- 7 2.1 billion 9 1.5 billion 1 0.8 billion 6 0.5 billion 13 0.4 billion 10 0.3 billion 8 0.3 billion 12 0.3 billion 5 0.1 billion 2 0.0 billion 4 0.0 billion 3 0.0 billion 11 0.0 billion
The GROUP BY clause can be further extended by using the HAVING clause. The HAVING clause is applied as a filter to the results of the GROUP BY clause, allowing filtering of resulting summarized records. This is the basic syntax of the GROUP BY clause, this time with the HAVING clause added:
SELECT ... FROM ... [ WHERE ... ] [ GROUP BY expression [, ... ] [ HAVING condition ] ] [ ORDER BY ... ];
This query is adapted to remove regions with relatively negligible populations, creating a cut-off point at anything over half a billion people:
SELECT REGION_ID, TO_CHAR(SUM(POPULATION)/1000000000, '990.99')' billion' "Population" FROM COUNTRY GROUP BY REGION_ID HAVING SUM(POPULATION)/1000000000 >= 0.5 ORDER BY SUM(POPULATION) DESC;
Both the GROUP BY clause and the HAVING clause can contain expressions as well as simple field names, as in SUM(POPULATION) .
Here is the result of the preceding query showing the regions of the Far East, the Near East, and Africa:
REGION_ID Population ---------- --------------- 7 2.10 billion 9 1.50 billion 1 0.79 billion
Some database engines allow specialized clauses for the GROUP BY clause. These extensions are gener ally reserved for data warehouse OLAP (Online Application Processing). OLAP functionality is used to produce analytical projection reporting, such as sub- totaling rollups, cubic cross-tabs, and even spreadsheets.
The JOIN clause is used to merge the records of two separate queries together, based on common field values. This is the basic syntax of the JOIN clause:
SELECT ... FROM table [alias] [ JOIN table [alias] USING (common-field) ] [ WHERE ... ] [ GROUP BY ... ] [ ORDER BY ... ];
The common-field element determines that records from two tables are joined together using a field con taining values common to records in both tables.
The following query reads both the REGION and COUNTRY tables, joining the two with the JOIN clause:
SELECT REGION, COUNTRY FROM REGION JOIN COUNTRY USING (REGION_ID);
The preceding query applies the USING clause to join the REGION and COUNTRY tables based on a REGION_ID value. Every region has a unique value for REGION_ID . Whenever a new record is added to the COUNTRY table, then that country record is allocated a REGION_ID value based on the region the country is in. For example, Cameroon is a country in Africa, Georgia is a country in the Russian Federation, and South Korea is a country in the Far East region.
See Appendix B for details of the demographics database, which is used for examples throughout this book.
Figure B-1 in Appendix B , and the description of Entity Relationship Diagrams (ERDs), also in Appendix B , show that there can be many countries within each region.
Here is a partial result of the preceding query, showing some countries within their respective regions:
REGION COUNTRY ------------------------ ----------------------- Africa Cameroon Africa Cote Divoire Russian Federation Georgia Russian Federation Kazakhstan Far East Myanmar Middle East Palestinian Territories Europe Serbia And Montenegro Far East South Korea Russian Federation Ukraine Middle East United Arab Emirates Russian Federation Uzbekistan
The JOIN clause can be expanded upon by joining more than two tables using multiple JOIN clauses. The syntax is expanded upon as shown here:
SELECT ... FROM table [alias] [ JOIN table [alias] USING (common-field) [ JOIN table [alias] USING (common-field) ] ] [ WHERE ... ] [ GROUP BY ... ] [ ORDER BY ... ];
The following query joins regions to countries, and then countries to states using two JOIN clauses:
SELECT REGION, COUNTRY, STATE "State or Province" FROM REGION JOIN COUNTRY USING (REGION_ID) JOIN STATE USING (COUNTRY_ID);
Again, the USING clause is applied to join the REGION and COUNTRY tables using REGION ID , plus a second USING clause is applied to join the COUNTRY and STATE tables together.
Figure B-1 in Appendix B , and the description of Entity Relationship Diagrams (ERDs), also in Appendix B , show that there can be many countries within each region and many states within each country.
Here is a partial result of the preceding query, showing some states within the United States and Canada, in North America (only North America has states in this database):
REGION COUNTRY State or Province ------------------------ ------------------------ ---------------- North America United States Washington North America United States Wisconsin North America United States West Virginia North America United States Wyoming North America United States Nebraska North America United States New Jersey North America United States Vermont North America Canada British Columbia North America Canada Nova Scotia North America Canada Ontario North America Canada Quebec North America Canada Alberta
And going just a little further, the next query joins regions to countries, and then countries to states, followed by states to cities. This time the query uses four JOIN clauses:
SELECT REGION, COUNTRY, STATE "State or Province", CITY FROM REGION JOIN COUNTRY USING (REGION_ID) JOIN STATE USING (COUNTRY_ID) JOIN CITY USING (STATE_ID);
And once again, the USING clause is applied to join the REGION and COUNTRY tables using the REGION ID field. A second USING clause is applied to join the COUNTRY and STATE tables together using the COUNTRY_ID field. A third USING clause is applied to join the STATE and CITY tables together using the STATE_ID field.
Figure B-1 in Appendix B , and the description of Entity Relationship Diagrams (ERDs), also in Appendix B , show that there can be many cities within each state, within each country, within each region.
Here is a partial result of the preceding query, showing some cities within their respective states, in the United States and Canada, in North America:
REGION COUNTRY State or Province CITY -------------- -------------- ------------------ --------------- North America United States Texas Waco North America United States Utah Salt Lake City North America United States Virginia Norfolk North America United States Washington Seattle North America United States Washington Spokane North America United States Wisconsin Madison North America United States Wisconsin Milwaukee North America United States Wyoming Cheyenne North America Canada Alberta Burlington North America Canada Alberta Calgary North America Canada Alberta Edmonton
Another variation of JOIN clause syntax is the ON clause, substituting for the USING clause. This is the altered syntax of the JOIN clause:
SELECT ... FROM table [alias] [ JOIN table [alias] ON (left-field = right-field) [ JOIN table [alias] USING (left-field = right-field) ] ] [ WHERE ... ] [ GROUP BY ... ] [ ORDER BY ... ];
The USING clause can be applied when there are common fields between two tables, and those fields contain common values. You use the ON clause when two fields that are named the same in two tables do not contain common values. Or, when using aliases in a join query, the field names on the joined tables are effectively different because they are altered by two different table aliases. For example, alias-1.field1 is not the same as alias-2.field1 .
In the following example query, aliases rename fields in tables within the query, effectively giving fields different names. The USING clause would not be able to perform the join because it cannot determine that r.REGION_ID is the same as c.REGION_ID :
SELECT r.REGION, c.COUNTRY FROM REGION r JOIN COUNTRY c ON (c.REGION_ID = r.REGION_ID);
This is the resulting query:
REGION COUNTRY ------------------------ ------------------------ Africa Cameroon Africa Cote Divoire Russian Federation Georgia Russian Federation Kazakhstan Far East Myanmar Middle East Palestinian Territories Europe Serbia And Montenegro Far East South Korea Russian Federation Ukraine Middle East United Arab Emirates Russian Federation Uzbekistan
There are various different types of joins in that two tables can be joined together, finding different sets of records for each join type. These join types are generally called natural joins, inner joins, cross-joins, and various subtypes of outer joins.
A natural join uses the NATURAL keyword to allow the database to guess at the correct field to join two tables on. The syntax looks like this:
SELECT ... FROM table [alias] [ NATURAL JOIN table [alias] ... ] [ WHERE ... ] [ GROUP BY ... ] [ ORDER BY ... ];
A natural join does not require the ON or USING keywords.
The following query joins regions to countries, with a best guess using common field names between the two tables:
SELECT REGION, COUNTRY FROM REGION NATURAL JOIN COUNTRY;
The preceding query will not return any records. Examine the ERD in Appendix B and you will see that the REGION and COUNTRY tables have three common fields, in the form of the REGION_ID, POPULATION, and AREA fields. Obviously, population and square kilometer values for regions and countries are completely different, and the result is no records found by the preceding query.
The keyword INNER can be used to indicate an inner or intersecting join. All the join queries seen so far in this section are inner joins.
From a mathematical perspective, an intersection is made up of the common elements in two sets of values.
The syntax looks like this where either the USING or ON clause are required:
SELECT ... FROM table [alias] [ INNER JOIN table [alias] USING (common-field) INNER JOIN table [alias] ON (left-field = right-field) ] [ WHERE ... ] [ GROUP BY ... ] [ ORDER BY ... ];
As already stated, all working example join queries seen so far in this section are inner joins.
The INNER keyword is optional.
A special type of intersecting join is a self-join. A self-join is simply an intersection between records in the same table. You can join records within the same table. Commonly self-joins are used to join records in a table containing hierarchical data.
A cross-join returns all records from two tables, merging regardless of any common values. In other words, every record in one table is joined with every record in another table. A cross-join completely ignores common fields and common values.
Mathematically speaking, a cross-join is a Cartesian product which is a multiplication or product of all the elements in two sets.
The syntax for a cross-join looks like this:
SELECT ... FROM table [alias] [ CROSS JOIN table [alias] ] [ WHERE ... ] [ GROUP BY ... ] [ ORDER BY ... ];
The following example, as you can see, does not relate regions with countries:
SELECT REGION, COUNTRY FROM REGION CROSS JOIN COUNTRY;
This is the result of the preceding query. The United Kingdom is not in the Near East region, and Afghanistan is definitely not in North America:
REGION COUNTRY ------------------------ --------------------- Near East United Kingdom Near East United States Near East Uruguay Near East Uzbekistan Near East Venezuela Near East Vietnam Near East Yemen Near East Yugoslavia Near East Zaire Near East Zambia Near East Zimbabwe North America Afghanistan North America Albania North America Algeria North America American Samoa
Data warehouses sometimes use cross-joins.
An outer join can be used to return both intersecting records, plus records in one or either table, but not in the other table. There are three types of outer joins: a left outer join, a right outer join, and a full outer join.
A left outer join finds all records in the table on the left, plus all records in the table on the right, where records in the left-side table do not have to exist in the right-side table. This is the syntax for a left outer join:
SELECT ... FROM table [alias] [ LEFT OUTER JOIN table [alias] USING (common-field) LEFT OUTER JOIN table [alias] ON (left-field = right-field) ] [ WHERE ... ] [ GROUP BY ... ] [ ORDER BY ... ];
The USING or ON clause is required for outer joins.
This query finds all countries regardless of whether they have states or not:
SELECT COUNTRY, STATE "State or Province" FROM COUNTRY LEFT OUTER JOIN STATE USING (COUNTRY_ID);
This is the result of the preceding query. The United States has states, and Canada has provinces . Other countries do not have states, at least not in this database:
COUNTRY State or Province ------------------------ ----------------- United States Washington United States Wisconsin United States West Virginia United States Wyoming Canada Alberta United States Nebraska United States New Jersey United States Vermont South Korea India Maldive Islands Papua New Guinea
In an outer join, records without matching field values have non-retrievable field values set to NULL. Thus in the preceding query, State or Province values for the last four countries are all NULL values.
A right outer join finds all records in the table on the right, plus all records in the table on the left, where records in the right-side table do not have to exist in the left-side table. This is the syntax for a right outer join:
SELECT ... FROM table [alias] [ RIGHT OUTER JOIN table [alias] USING (common-field) RIGHT OUTER JOIN table [alias] ON (left-field = right-field) ] [ WHERE ... ] [ GROUP BY ... ] [ ORDER BY ... ];
This query simply reverses the sequence of the tables from the previous example query, again finding all countries regardless of whether they have states or not:
SELECT COUNTRY, STATE "State or Province" FROM STATE RIGHT OUTER JOIN COUNTRY USING (COUNTRY_ID);
The result for the preceding query is identical to the previous query result.
The last type of outer join is a full outer join. A full outer join finds all intersecting records, plus any records in the left-side table not in the right side, and any records in the right-side table not in the left. This is the syntax for a full outer join:
SELECT ... FROM table [alias] [ FULL OUTER JOIN table [alias] USING (common-field) FULL OUTER JOIN table [alias] ON (left-field = right-field) ] [ WHERE ... ] [ GROUP BY ... ] [ ORDER BY ... ];
The following query can potentially find all countries regardless of whether they have states or not, and vice versa:
SELECT COUNTRY, STATE "State or Province" FROM COUNTRY FULL OUTER JOIN STATE USING (COUNTRY_ID);
The database used in this book does not have any relationship that would demonstrate a full outer join, and thus showing a query result is pointless.
A subquery is a query that is embedded within, and called from, another query. The calling query is known as the calling query or the parent query. Subqueries can be nested to an arbitrary depth, depending usually on the specific database engine in use. Subqueries can return a single scalar value, multiple values, or even multiple records. A subquery can be correlated, passing values passed into a child query, which can affect records retrieved by the subquery, for each parent query record retrieved. There are various different ways that subqueries can be used and built:
The IN and EXISTS operators can contain embedded subqueries.
A subquery can be embedded in the FROM clause of a SELECT statement.
A subquery can be embedded in WHERE , ORDER BY and GROUP BY clauses.
Subqueries can be embedded into INSERT , UPDATE and DELETE statements.
INSERT , UPDATE and DELETE statements are used to add to, change and remove records in tables.
In this first case, a subquery finds all countries, as long as they have a related entry in the REGION table:
SELECT * FROM COUNTRY WHERE REGION_ID IN (SELECT REGION_ID FROM REGION);
The subquery is used in the WHERE clause in the preceding query.
This is a partial result of the preceding query. Not all fields are included in this result:
COUNTRY_ID REGION_ID COUNTRY CO POPULATION ---------- ---------- -------------------------------- -- ---------- 1 1 Algeria AG 32930091 2 1 Angola AO 12127071 3 1 Benin BN 7862944 4 1 Botswana BC 1639833 5 1 Burkina Faso UV 13902972 6 1 Burundi BY 8090068 7 1 Central African Republic CT 4303356 8 1 Congo CG 62660551 9 1 Djibouti DJ 486530 10 1 Equatorial Guinea EK 540109 11 1 Ethiopia ET 74777981
This next query embeds multiple layers of subqueries:
SELECT * FROM CITY WHERE CITY_ID IN (SELECT CITY_ID FROM STATE WHERE STATE_ID IN (SELECT STATE_ID FROM STATE WHERE COUNTRY_ID IN (SELECT COUNTRY_ID FROM COUNTRY WHERE REGION_ID IN (SELECT REGION_ID FROM REGION))));
This is a partial result of the preceding query:
CITY_ID COUNTRY_ID STATE_ID CITY POPULATION ---------- ---------- ---------- -------------------------------- ---------- 583 127 8 Phoenix 3800000 584 127 Richmond 1175000 585 127 37 Rochester 1050000 586 127 9 San Diego 3000000 587 127 29 St. Louis 2800000 588 127 14 Tampa 2650000 589 127 Virginia Beach 1675000 590 127 12 Washington 8050000 591 165 Tashkent 2400000 592 151 Barquisimeto 1150000 593 151 Maracaibo 2150000
A subquery can return a single scalar value, multiple values, or even multiple records. In this example, the subquery returns the REGION_ID value for North America only:
SELECT * FROM COUNTRY WHERE REGION_ID = (SELECT REGION_ID FROM REGION WHERE REGION='North America');
This is a partial result, finding only countries in North America:
COUNTRY_ID REGION_ID COUNTRY CO POPULATION ---------- ---------- -------------------------------- -- ---------- 125 10 Canada CA 33098932 126 10 Greenland GL 56361 127 10 United States US 298444215
In the next subquery the IN operator validates using two fields:
SELECT * FROM CITY WHERE (COUNTRY_ID, STATE_ID) IN (SELECT COUNTRY_ID, STATE_ID FROM COUNTRY JOIN STATE USING (COUNTRY_ID));
This is a partial result:
CITY_ID COUNTRY_ID STATE_ID CITY POPULATION ---------- ---------- ---------- -------------------------------- ---------- 583 127 8 Phoenix 3800000 585 127 37 Rochester 1050000 586 127 9 San Diego 3000000 587 127 29 St. Louis 2800000 588 127 14 Tampa 2650000 590 127 12 Washington 8050000 1 125 1 Vancouver 2200000 2 125 2 Halifax 3 125 3 Ottawa 1150000 4 125 3 Toronto 5150000 5 125 4 Montreal 3600000
A subquery can be correlated, using values passed into it from its parent query, which can affect records retrieved by the subquery:
SELECT * FROM COUNTRY WHERE REGION_ID IN (SELECT REGION_ID FROM REGION WHERE REGION_ID=COUNTRY.REGION_ID);
In the preceding query, the subquerys WHERE clause ( WHERE REGION_ID=COUNTRY.REGION_ID ) takes a value passed from the calling query into the subquery, determining which records the subquery retrieves for every record retrieved by the calling query. In other words, for every country found, the subquery uses the countrys REGION_ID value to find that countrys region.
This is a partial result:
COUNTRY_ID REGION_ID COUNTRY CO POPULATION ---------- ---------- -------------------------------- -- ---------- 1 1 Algeria AG 32930091 2 1 Angola AO 12127071 3 1 Benin BN 7862944 4 1 Botswana BC 1639833 5 1 Burkina Faso UV 13902972 6 1 Burundi BY 8090068 7 1 Central African Republic CT 4303356 8 1 Congo CG 62660551 9 1 Djibouti DJ 486530 10 1 Equatorial Guinea EK 540109 11 1 Ethiopia ET 74777981
Some database engines allow the use of an EXISTS operator, which can be used as a slight variation on the IN operator. The IN operator returns a value or a set of values. The EXISTS operator returns a true or false result (a Boolean). So the previous correlated query can be changed to this:
SELECT * FROM COUNTRY WHERE EXISTS (SELECT REGION_ID FROM REGION WHERE REGION_ID = COUNTRY.REGION_ID);
The result of the preceding query is the same as the result for the previous example query.
Some databases allow embedding of a subquery into the FROM clause of a SELECT statement. Unlike subqueries in other SQL statements, a FROM clause embedded subquery can return fields and records to the calling query:
SELECT r.REGION, c.COUNTRY FROM REGION r, (SELECT * FROM COUNTRY) c WHERE c.REGION_ID = r.REGION_ID;
This is the result of the query where the name of the region is pulled from the calling query, and the name of the country from the FROM clause embedded subquery:
REGION COUNTRY -------------------------------- -------------------------------- Africa Zambia Africa Zimbabwe Asia Burma Australasia Australia Australasia New Zealand Caribbean Bahamas Caribbean Barbados Caribbean Bermuda Caribbean Costa Rica
A subquery can be one of the expressions in a SELECT statement field selection list:
SELECT (SELECT r.REGION FROM REGION r WHERE r.REGION_ID = c.REGION_ID) ,c.COUNTRY FROM COUNTRY c;
The result of the preceding query will be the same as the previous query.
A subquery can even be placed into the ORDER BY clause of a SELECT statement, helping to determine the sorted order of the output of a query. What this query does is to sort regions and countries based on the currency exchange rate (in descending order), without returning the exchange rate in the query (there are other ways to do this of course):
SELECT r.REGION, c.COUNTRY FROM REGION r JOIN COUNTRY c ON (c.REGION_ID = r.REGION_ID) ORDER BY (SELECT RATE FROM COUNTRY WHERE COUNTRY_ID = c.COUNTRY_ID);
The result is shown here where the same fields are found, as in the previous query output result shown, except the sorted order changes the records returned:
REGION COUNTRY -------------------------------- -------------------------------- Europe United Kingdom Europe Switzerland North America Canada Australasia Australia Australasia New Zealand Europe Germany Far East Singapore South America Brazil Far East Malaysia
The preceding query, like most others in this chapter, is only a partial result of all records found by the query. The resort returns a completely different set of records for the first few records in the query result.
As with the ORDER BY clause, in some situations, after completion of execution of a subquery, the AS clause can be used to refer to retrieved items using an AS clause field name change. In the following query the subquery changes the names of the COUNTRY_ID field to COUNTRY , and the COUNTRY field to NAME . The renamed COUNTRY and NAME fields are then accessed in the parent query as c.COUNTRY (containing the COUNTRY_ID value), and c.NAME (containing the COUNTRY value):
SELECT r.REGION, c.COUNTRY, c.NAME FROM REGION r, (SELECT REGION_ID, COUNTRY_ID AS COUNTRY, COUNTRY AS NAME FROM COUNTRY) c WHERE c.REGION_ID = r.REGION_ID;
This is the result showing the field name changes in the header:
REGION COUNTRY NAME -------------------------------- ---------- -------------------------------- Africa 42 Zaire Africa 43 Zambia Africa 44 Zimbabwe Asia 45 Burma Australasia 46 Australia Australasia 47 New Zealand Caribbean 48 Bahamas Caribbean 49 Barbados Caribbean 50 Bermuda Caribbean 51 Costa Rica Caribbean 52 Cuba Caribbean 53 Dominican Republic
Subqueries can also be placed within INSERT , UPDATE , and DELETE statements, which allows changes to records in a database. These types of subqueries are discussed in a later section in this chapter, after the INSERT , UPDATE , and DELETE statements have been examined.
The UNION clause returns records from two queries. The resulting records can be treated as a single query in that they can have an ORDER BY clause applied to them, and be resorted. Additionally, each query can contain query clauses such as WHERE and GROUP BY clauses.
The only requirement for a UNION clause is that the number of fields in the two queries are the same, and that data types for each field are at least compatible. This query finds all regions and countries, regardless of any relationship between the two tables:
SELECT REGION_ID "ID-1", NULL "ID-2", REGION "Location" FROM REGION UNION SELECT REGION_ID "ID-1", COUNTRY_ID "ID-2", COUNTRY "Location" FROM COUNTRY ORDER BY 1, 2 NULLS FIRST;
The NULLS FIRST modifier is not available in all database engines. NULLS FIRST returns all null values before any non-null values, thus returning regions at the top of each list of countries, within that particular region.
The NULLS FIRST and NULLS LAST options will not be available in all database engines.
This is an interesting use of the UNION clause because it effectively sorts all countries within all regions without using a join, and with all records in the same query. This is the result:
ID-1 ID-2 Location ---------- ---------- -------------------------------- 2 Asia 2 45 Burma 3 Australasia 3 46 Australia 3 47 New Zealand 4 Caribbean 4 48 Bahamas 4 49 Barbados 4 50 Bermuda 4 51 Costa Rica 4 52 Cuba 4 53 Dominican Republic 4 54 Haiti 4 55 Jamaica 4 56 Martinique 4 57 Puerto Rico 5 Central America 5 58 El Salvador
Five of the tables in the demographics database (see Appendix B) are REGION, COUNTRY, LANGUAGE, POPULATION, and POPULATIONBYLANGUAGE. The tables look like this:
Table: REGION Field Null? Datatype ----------------------------- -------- -------------------- REGION_ID NOT NULL INTEGER REGION NOT NULL VARCHAR(32) POPULATION INTEGER AREA INTEGER
An Oracle Database VARCHAR2 data type is the same as VARCHAR. An Oracle Database NUMBER(38) data type is the same as an INTEGER data type.
Table: COUNTRY Field Null? Type ----------------- -------- ------------ COUNTRY_ID NOT NULL NUMBER(38) REGION_ID NOT NULL NUMBER(38) COUNTRY NOT NULL VARCHAR2(32) CODE NOT NULL CHAR(2) POPULATION NUMBER(38) AREA NUMBER(38) FXCODE CHAR(3) CURRENCY VARCHAR2(32) RATE FLOAT(126) Table: LANGUAGE Field Null? Datatype ----------------- -------- ------------ LANGUAGE_ID NOT NULL INTEGER LANGUAGE NOT NULL VARCHAR(16) Table: POPULATION Field Null? Datatype ----------------------------- -------- ---------- POPULATION_ID NOT NULL INTEGER COUNTRY_ID INTEGER YEAR INTEGER POPULATION INTEGER BIRTHS_PER_1000 FLOAT DEATHS_PER_1000 FLOAT MIGRANTS_PER_1000 FLOAT NATURAL_INCREASE_PERCENT FLOAT GROWTH_RATE FLOAT Table: POPULATIONBYLANGUAGE Field Null? Datatype ----------------------------- -------- ----------- LANGUAGE_ID NOT NULL INTEGER POPULATION_ID NOT NULL INTEGER MALE INTEGER FEMALE INTEGER
Use a SELECT statement to find records in which the female population is greater than the male population. Show both male and female populations in the year 1976, summed up for each language within each region, and sorted by each language within region. You dont need to display the year. You dont need to display the language. You dont need to return any fields from the COUNTRY and POPULATION tables, but you do need to get the name of the region from somewhere. The only numbers you should find are the male and female population values. This is how:
Create a SELECT statement with a list of fields:
SELECT REGION, LANGUAGE, FEMALE, MALE
Add the FROM clause with the first table:
SELECT REGION, LANGUAGE, FEMALE, MALE FROM REGION
Add the second table as a join to the first table:
SELECT REGION, LANGUAGE, FEMALE, MALE FROM REGION JOIN COUNTRY USING (REGION_ID)
Add a third table to the join:
SELECT REGION, LANGUAGE, FEMALE, MALE FROM REGION JOIN COUNTRY USING (REGION_ID) JOIN POPULATION USING (COUNTRY_ID)
Add fourth and fifth tables to the join:
SELECT REGION, LANGUAGE, FEMALE, MALE FROM REGION JOIN COUNTRY USING (REGION_ID) JOIN POPULATION USING (COUNTRY_ID) JOIN POPULATIONBYLANGUAGE USING (POPULATION_ID) JOIN LANGUAGE USING (LANGUAGE_ID)
A WHERE clause comparison ensures that you find only records in the year 1976, and where the female population exceeds the male population:
SELECT REGION, LANGUAGE, FEMALE, MALE FROM REGION JOIN COUNTRY USING (REGION_ID) JOIN POPULATION USING (COUNTRY_ID) JOIN POPULATIONBYLANGUAGE USING (POPULATION_ID) JOIN LANGUAGE USING (LANGUAGE_ID) WHERE FEMALE > MALE AND YEAR = 1976
Summarize by language within groups by applying the SUM functions to both the FEMALE and MALE fields, and then aggregating in the GROUP BY clause using the REGION and LANGUAGE fields:
SELECT REGION, LANGUAGE, SUM(FEMALE), SUM(MALE) FROM REGION JOIN COUNTRY USING (REGION_ID) JOIN POPULATION USING (COUNTRY_ID) JOIN POPULATIONBYLANGUAGE USING (POPULATION_ID) JOIN LANGUAGE USING (LANGUAGE_ID) WHERE FEMALE > MALE AND YEAR = 1976 GROUP BY REGION, LANGUAGE
Sort the record output by language and region:
SELECT REGION, LANGUAGE, SUM(FEMALE), SUM(MALE) FROM REGION JOIN COUNTRY USING (REGION_ID) JOIN POPULATION USING (COUNTRY_ID) JOIN POPULATIONBYLANGUAGE USING (POPULATION_ID) JOIN LANGUAGE USING (LANGUAGE_ID) WHERE FEMALE > MALE AND YEAR = 1976 GROUP BY REGION, LANGUAGE ORDER BY REGION, LANGUAGE;
Last, you need to qualify the field name in the query because with so many tables in the join, SQL will get confused trying to figure out what to join with, and where to select fields from. So you need to add aliases and change all the USING clauses to more explicit joining ON clauses. You can also make the query look a little nicer by changing field names using the AS clause:
SELECT r.REGION, l.LANGUAGE, SUM(pl.FEMALE), SUM(pl.MALE) FROM REGION r JOIN COUNTRY c ON (c.REGION_ID=r.REGION_ID) JOIN POPULATION p ON (p.COUNTRY_ID=c.COUNTRY_ID) JOIN POPULATIONBYLANGUAGE pl ON (pl.POPULATION_ID=p.POPULATION_ID) JOIN LANGUAGE l ON (l.LANGUAGE_ID=pl.LANGUAGE_ID) WHERE pl.FEMALE > pl.MALE AND p.YEAR = 1976 GROUP BY r.REGION, l.LANGUAGE ORDER BY r.REGION, l.LANGUAGE;
If you are using the demographics database as provided by the database creation process described in Appendix B, and the downloads from my website at the following URL:
http://www.oracledbaexpert.com/oracle/beginningxmldatabases/index.html
you should get a result like this:
REGION LANGUAGE FEMALES MALES -------------------------------- ---------------- ---------- ---------- Australasia English Only 5185387 5096972 Australasia French 42750 38236 Australasia Russian 4979 3676 North America English 7101880 7020890 North America Estonian 6205 5770 North America Finish 15185 13285 North America Flemish 4015 3780 North America French 2977060 2910140 North America German 239995 236720 North America Icelandic 2650 2380 North America Japanese 8045 7480 North America Lettish 5610 5545 North America Russian 12105 11375 North America Welsh 1045 1010 North America Yiddish 12325 11115 15 records
How It Works
You created a fairly complex SELECT statement that retrieved four fields from four different tables. The four tables were all joined with explicit field comparisons using the ON clause and inner (intersecting) joins. This means you only found records where all matches existed in all four tables. You also applied two filters in a WHERE clause; plus you summarized two integer values using a SUM function. Most database engines have a built-in SUM function for adding simple numbers together. Last, you altered the query by adding aliases.
Changes are made to fields, in records, in tables, in a database, using the INSERT , UPDATE , and DELETE statements.
The INSERT statement is used to add new records to tables. This is the basic syntax for an INSERT statement:
INSERT INTO table [ (field [, ... ]) ] VALUES (expression [ , ... ]);
A new region could be added to the REGION table with the following statement:
INSERT INTO REGION (REGION_ID, REGION, POPULATION, AREA) VALUES(100,'A New Region', 1000000, 1000);
Some database engines allow omission of the list of fields if the VALUES clause exactly matches the structure of the table:
INSERT INTO REGION VALUES (101,'Another New Region', 2000000, 2000);
The UPDATE statement is used to change existing records in tables and has the following basic syntax:
UPDATE table SET field = expression [, ... ] [ WHERE ... ];
The WHERE clause filter determines which records are updated.
If no WHERE clause is present in an UPDATE statement, then all records in the table will be altered with a single UPDATE statement.
In this example, both of the records created in the previous two INSERT statements are updated:
UPDATE REGION SET REGION = UPPER (REGION) WHERE REGION_ID IN (100, 101);
The DELETE statement is similar in syntax to the UPDATE statement:
DELETE FROM table [ WHERE ... ];
This statement will DELETE the two records created by the previous INSERT statements and subsequently altered by the single UPDATE statement shown previously:
DELETE FROM REGION WHERE REGION_ID IN (100, 101);
Database change statements are the INSERT , UPDATE , and DELETE statements. Subqueries can be embedded into database change statements in various ways.
A subquery can be embedded into the VALUES clause of an INSERT statement:
INSERT INTO REGION (REGION_ID, REGION, POPULATION, AREA) VALUES (102, 'East '(SELECT REGION FROM REGION WHERE REGION_ID=1), 1000000, 100);
The result is that there is a new region called East Africa, as shown in this query:
SELECT * FROM REGION; REGION_ID REGION POPULATION AREA ---------- -------------------------------- ---------- ---------- 1 Africa 789548670 26780325 2 Asia 47382633 657741 3 Australasia 24340222 7886602 4 Caribbean 40417697 268857 5 Central America 142653392 2360325 6 Europe 488674441 4583335 7 Far East 2100636517 15357441 8 Middle East 294625718 6798768 9 Near East 1499157105 4721322 10 North America 331599508 18729272 11 Oceania 9133256 536238 12 Russian Federation 258037209 21237500 13 South America 375489788 17545171 102 East Africa 1000000 100
A subquery can even become the VALUES clause of an INSERT statement:
INSERT INTO REGION SELECT COUNTRY_ID, COUNTRY, POPULATION, AREA FROM COUNTRY WHERE COUNTRY_ID > 150;
The UPDATE and DELETE statements can contain a WHERE clause. Thus the same rules apply as for the SELECT statement, such that a subquery can be embedded in the WHERE clause of UPDATE and DELETE statements.
An interesting use of embedding subqueries into an UPDATE statement is that some database engines allow more than a single field to be updated within the same UPDATE statement. The multiple fields updated must be retrieved using a subquery as in the following example:
UPDATE REGION SET (POPULATION, AREA) = (SELECT SUM(POPULATION), SUM(AREA) FROM COUNTRY WHERE REGION_ID = REGION.REGION_ID GROUP BY REGION_ID);
The preceding query will quite sensibly sum up all POPULATION and AREA values for each region from the COUNTRY table, and set those values into the same summary fields in the REGION table.
It is very important to understand what a transaction is with respect to a relational database. In a relational database, a transaction allows you to temporarily store changes. At a later point, you can choose to store the changes permanently using a COMMIT statement. Or you can also completely remove all changes you have made by using a ROLLBACK statement, as long as you have not already committed changes.
You can execute multiple database changes, such as the three INSERT statements shown here:
INSERT INTO REGION (REGION_ID, REGION, POPULATION, AREA) VALUES (111, 'Region 111', 100000, 100); INSERT INTO REGION (REGION_ID, REGION, POPULATION, AREA) VALUES (112, 'Region 112', 200000, 200); COMMIT; INSERT INTO REGION (REGION_ID, REGION, POPULATION, AREA) VALUES (113, 'Region 113', 300000, 300); ROLLBACK;
In the preceding example, the first two records will be permanently stored in the database by the COMMIT statement. The third record will be removed by the ROLLBACK statement.
Some general concepts of a relational database transaction are as follows:
A transaction is completed when a COMMIT or ROLLBACK statement is issued.
A COMMIT statement stores changes to a database.
A ROLLBACK statement removes pending changes from a database.
A pending change is a change not yet committed (a transaction not as yet terminated by a COMMIT or ROLLBACK statement).
A transaction can exist for one or more database change statements.
Before a change is committed, only the current session (database connection) can see any changes made within that current session. It follows that database changes made in other sessions (other database connections for other users) cannot be seen by the current session until those other sessions issue COMMIT statements.
The term transaction control describes the process of how SQL code can be used to control database changes made by one or more SQL statements.
Database objects such as tables and indexes define how data is stored and accessed in a database. Tables and indexes are known as metadata in that they are the data about the data and not the actual records. So in the demographics database (see Appendix B), the COUNTRY table is metadata and the actual countries are the real data. The COUNTRY table describes the structure of how the countries are stored in the database.
Changing database objects is not really within the scope of this book but it may help to have a very basic picture of some database change statements.
The following CREATE TABLE statement creates a table called MOREREGIONS:
CREATE TABLE MOREREGIONS(REGION_ID INTEGER PRIMARY KEY NOT NULL, REGION VARCHAR2(32) UNIQUE NOT NULL, POPULATION INTEGER NOT NULL, AREA INTEGER NOT NULL);
The next ALTER TABLE statement is used to reset the POPULATION and AREA fields in the MOREREGIONS table, to nullable:
ALTER TABLE MOREREGIONS MODIFY(POPULATION INTEGER, AREA INTEGER);
And because the MOREREGIONS table is actually useless for the purposes of the demographics database, the MOREREGIONS table can be dropped with this statement:
DROP TABLE MOREREGIONS;
Also commonly used in a relational database are database objects called indexes. An index simply creates a physical copy of a small section in a table and creates it in a way that allows for fast searching through the index, such as using a binary tree. In general, an index could be created using a statement like this:
CREATE INDEX XAK_REGION_POPULATION ON REGION (POPULATION);
Indexes can also be altered and dropped using the ALTER INDEX and DROP INDEX statements.
In general, most relational databases allow metadata change commands that are a lot more comprehensive than just creating, altering, and dropping simple tables and indexes. Any more on this topic is not necessary for the purposes of this book.
Some databases include database object change commands within transactions, and some databases do not. In other words, some databases allow a database object change such as a CREATE TABLE statement to be rolled back and some databases do not. Some databases even automatically commit when executing database object change statements.
| ||