A Brief Review of Database Basics

Database services under Linux that use the software discussed in this chapter are based on a client/server model. Database clients are often used to input data and to query or display query results from the server. You can use the command line or a graphical client to access a running server. Databases generally come in two forms: flat file and relational. A flat file database can be as simple as a text file with a space, tab, or some other character delimiting different parts of the information. One example of a simple flat file database is the Ubuntu /etc/passwd file. Another example could be a simple address book that might look something like this:

Doe~John~505 Some Street~Anytown~NY~12345~555-555-1212

You can use standard UNIX tools such as grep, awk, and perl to search for and extract information from this primitive database. Although this might work well for a small database such as an address book that only one person uses, flat file databases of this type have several limitations:

They do not scale well Flat file databases cannot perform random access on data. They can only perform sequential access. This means they have to scan each line in the file, one by one, to look for specific information. As the size of the database grows, access times increase and performance decreases.
Flat file databases are unsuitable for multi-user environments Depending on how the database is set up, it either enables only one user to access it at a time or allows two users to make changes simultaneously, and the changes could end up overwriting each other and cause data loss.

These limitations obviously make the flat file database unsuitable for any kind of serious work in even a small businessmuch less in an enterprise environment. Relational databases, or relational database management systems (RDBMSs) to give them their full name, are good at finding the relationships between individual pieces of data. An RDBMS stores data in tables with fields much like those in spreadsheets, making the data searchable and sortable. RDBMSs are the focus of this chapter.

Oracle, DB2, Microsoft SQL Server, and the freely available PostgreSQL and MySQL are all examples of RDBMSs. The following sections discuss how relational databases work and provide a closer look at some of the basic processes involved in administering and using databases. You will also learn about SQL, the standard language used to store, retrieve, and manipulate database data.

How Relational Databases Work

An RDBMS stores data in tables, which you can visualize as spreadsheets. Each column in the table is a field; for example, a column might contain a name or an address. Each row in the table is an individual record. The table itself has a name you use to refer to that table when you want to get data out of it or put data into it. Figure 21.1 shows an example of a simple relational database that stores name and address information.

Figure 21.1. In this visualization of how an RDBMS stores data, the database stores four records (rows) that include name and address information, divided into seven fields (columns) of data.
last_name	first_name	address	city	state	zip	phone
Doe	John	501 Somestreet	Anytown	NY	55011	555-555-1212
Doe	Jane	501 Somestreet	Anytown	NY	55011	555-555-1212
Palmer	John	205 Anystreet	Sometown	NY	55055	123-456-7890
Johnson	Robert	100 Easystreet	Easytown	CT	12345	111-222-3333

In the example shown in Figure 21.1, the database contains only a single table. Most RDBMS setups are much more complex than this, with a single database containing multiple tables. Figure 21.2 shows an example of a database named sample_database that contains two tables.

Figure 21.2. A single database can contain two tablesin this case, `phonebook` and `cd_collection`.

In the sample_database example, the phonebook table contains four records (rows) and each record hold three fields (columns) of data. The cd_collection table holds eight records, divided into five fields of data.

If you are thinking that there is no logical relationship between the phonebook table and the cd_collection table in the sample_database example, you are correct. In a relational database, users can store multiple tables of data in a single databaseeven if the data in one table is unrelated to the data in others.

For example, suppose you run a small company that sells widgets and you have a computerized database of customers. In addition to storing each customer's name, address, and phone number, you want to be able to look up outstanding order and invoice information for any of your customers. You could use three related tables in an RDBMS to store and organize customer data for just those purposes. Figure 21.3 shows an example of such a database.

Figure 21.3. You can use three related tables to track customers, orders, and outstanding invoices.

In the example in Figure 21.3, we have added a customer ID field to each customer record. This field holds a customer ID number that is the unique piece of information that can be used to link all other information for each customer to track orders and invoices. Each customer is given an ID unique to him; two customers might have the same data in their name fields, but their ID field values will never be the same. The Customer ID field data in the Orders and Overdue tables replaces the Last Name, First Name, and Shipping Address field information from the Customers table. Now, when you want to run a search for any customer's order and invoice data, you can search based on one key rather than multiple keys. You get more accurate results in faster, easier-to-conduct data searches.

Now that you have an idea of how data is stored in an RDBMS and how the RDBMS structure enables you to work with that data, you are ready to learn how to input and output data from the database. This is where SQL comes in.

Understanding SQL Basics

SQL (pronounced "S-Q-L") is a database query language understood by virtually all RDBMSs available today. You use SQL statements to get data into and retrieve data from a database. As with statements in any language, SQL statements have a defined structure that determines their meanings and functions.

As a DBA, you should understand the basics of SQL, even if you will not be doing any of the actual programming yourself. Fortunately, SQL is similar to standard English, so learning the basics is simple.

Creating Tables

As mentioned previously, an RDBMS stores data in tables that look similar to spreadsheets. Of course, before you can store any data in a database, you need to create the necessary tables and columns to store the data. You do this by using the CREATE statement.

For example, the cd_collection table from Figure 21.2 has 5 columns, or fields: id, title, artist, year, and rating.

SQL provides several column types for data that define what kind of data will be stored in the column. Some of the available types are INT, FLOAT, CHAR, and VARCHAR. Both CHAR and VARCHAR hold text strings, with the difference being that CHAR holds a fixed-length string, whereas VARCHAR holds a variable-length string.

There are also special column types, such as DATE, that only take data in a date format, and ENUMs (enumerations), which can be used to specify that only certain values are allowed. If, for example, you wanted to record the genre of your CDs, you could use an ENUM column that accepts only the values POP, ROCK, EASY_LISTENING, and so on. You will learn more about ENUM later in this chapter.

Looking at the cd_collection table, you can see that three of the columns hold numerical data and the other two hold string data. In addition, the character strings are of variable length. Based on this information, you can discern that the best type to use for the text columns is type VARCHAR, and the best type to use for the others is INT. You should notice something else about the cd_collection table: One of the CDs is missing a rating, perhaps because we have not listened to it yet. This value, therefore, is optional; it starts empty and can be filled in later.

You are now ready to create a table. As mentioned before, you do this by using the CREATE statement, which uses the following syntax:

CREATE TABLE table_name (column_name column_type(parameters) options, ...);

You should know the following about the CREATE statement:

SQL commands are not case sensitive For example, CREATE TABLE, create table, and Create Table are all valid.
Whitespace is generally ignored This means you should use it to make your SQL commands clearer.

The following example shows how to create the table for the cd_collection database:

CREATE TABLE cd_collection ( id INT NOT NULL, title VARCHAR(50) NOT NULL, artist VARCHAR(50) NOT NULL, year VARCHAR(50) NOT NULL, rating VARCHAR(50) NULL );

Notice that the statement terminates with a semicolon. This is how SQL knows you are finished with all the entries in the statement. In some cases, the semicolon can be omitted, and we will point out these cases when they arise.

Tip

SQL has a number of reserved keywords that cannot be used in table names or field names. For example, if you keep track of CDs you want to take with you on vacation, you would not be able to use the field name select because that is a reserved keyword. Instead, you should either choose a different name (selected?) or just prefix the field name with an f, such as fselect.

Inserting Data into Tables

After you create the tables, you can put data into them. You can insert data manually with the INSERT statement, which uses the following syntax:

INSERT INTO table_name VALUES('value1', 'value2', 'value3', ...);

This statement inserts value1, value2, and so on into the table table_name. The values that are inserted constitute one row, or record, in the database. Unless specified otherwise, values are inserted in the order in which the columns are listed in the database table. If, for some reason, you want to insert values in a different order (or if you want to insert only a few values and they are not in sequential order), you can specify which columns you want the data to go in by using the following syntax:

INSERT INTO table_name (column1,column4) VALUES('value1', 'value2');

You can also fill multiple rows with a single INSERT statement, using syntax such as the following:

INSERT INTO table_name VALUES('value1', 'value2'),('value3', 'value4');

In this statement, value1 and value2 are inserted into the first row and value3 and value4 are inserted into the second row.

The following example shows how you would insert the Nevermind entry into the cd_collection table:

INSERT INTO cd_collection VALUES(9, 'Nevermind', ''Nirvana', '1991, ''NULL);

MySQL requires the NULL value for the last column (rating) if you do not want to include a rating. PostgreSQL, on the other hand, lets you get away with just omitting the last column. Of course, if you had columns in the middle that were null, you would need to explicitly state NULL in the INSERT statement.

Normally, INSERT statements are coded into a front-end program so users adding data to the database do not have to worry about the SQL statements involved.

Retrieving Data from a Database

Of course, the main reason for storing data in a database is so you can later look up, sort, and generate reports on that data. Basic data retrieval is done with the SELECT statement, which has the following syntax:

SELECT column1, column2, column3 FROM table_name WHERE search_criteria;

The first two parts of the statementthe SELECT and FROM partsare required. The WHERE portion of the statement is optional. If it is omitted, all rows in the table table_name are returned.

The column1, column2, column3 indicates the name of the columns you want to see. If you want to see all columns, you can also use the wildcard * to show all the columns that match the search criteria. For example, the following statement displays all columns from the cd_collection table:

SELECT * FROM cd_collection;

If you wanted to see only the titles of all the CDs in the table, you would use a statement such as the following:

SELECT title FROM cd_collection;

To select the title and year of a CD, you would use the following:

SELECT title, year FROM cd_collection;

If you wanted something a little fancier, you can use SQL to print the CD title followed by the year in parentheses, as is the convention. Both MySQL and PostgreSQL provide string concatenation functions to handle problems such as this. However, the syntax is different in the two systems.

In MySQL, you can use the CONCAT() function to combine the title and year columns into one output column, along with parentheses. The following statement is an example:

SELECT CONCAT(title," (",year, ")") AS TitleYear FROM cd_collection;

That statement lists both the title and year under one column that has the label TitleYear. Note that there are two strings in the CONCAT() function along with the fieldsthese add whitespace and the parentheses.

In PostgreSQL, the string concatenation function is simply a double pipe (||). The following command is the PostgreSQL equivalent of the preceding MySQL command:

SELECT (genus||'' ('||species||')') AS TitleYear FROM cd_collection;

Note that the parentheses are optional, but they make the statement easier to read. Once again, the strings in the middle and at the end (note the space between the quotes) are used to insert spacing and parentheses between the title and year.

Of course, more often than not, you do not want a list of every single row in the database. Rather, you only want to find rows that match certain characteristics. For this, you add the WHERE statement to the SELECT statement. For example, suppose you want to find all the CDs in the cd_collection table that have a rating of 5. You would use a statement like the following:

SELECT * FROM cd_collection WHERE rating = 5;

Using the table from Figure 21.2, you can see that this query would return the rows for Trouser Jazz, Life for Rent, and The Two Towers. This is a simple query, and SQL is capable of handling queries much more complex than this. Complex queries can be written using logical AND and logical OR statements. For example, suppose you want to refine the query so it lists only those CDs that were not released in 2003. You would use a query like the following:

SELECT * FROM cd_collection WHERE rating = 5 AND year != 2003;

In SQL, != means "is not equal to." So once again looking at the table from Figure 21.2, you can see that this query returns the rows for Trouser Jazz and The Two Towers but does not return the row for Life for Rent because it was released in 2003.

So, what if you want to list all the CDs that have a rating of 3 or 4 except those released in the year 2000? This time, you combine logical AND and logical OR statements:

SELECT * FROM cd_collection WHERE rating = 3 OR rating = 4 AND year != 2000;

This query would return entries for Mind Bomb, Natural Elements, and Combat Rock. However, it wouldn't return entries for Adiemus 4 because it was released in 2000.

Tip

One of the most common errors among new database programmers is confusing logical AND and logical OR. For example, in everyday speech, you might say "Find me all CDs released in 2003 and 2004." At first glance, you might think that if you fed this statement to the database in SQL format, it would return the rows for For All You've Done and Life for Rent. In fact, it would return no rows at all. This is because the database interprets the statement as "Find all rows in which the CD was released in 2003 and was released in 2004." It is, of course, impossible for the same CD to be released twice, so this statement would never return any rows, no matter how many CDs were stored in the table. The correct way to form this statement is with an OR statement instead of an AND statement.

SQL is capable of far more than is demonstrated here. But as mentioned before, this section is not intended to teach you all there is to know about SQL programming; rather, it teaches you the basics so you can be a more effective DBA.