10.3 Table Management | Java Database Best Practices

Now that you have a clean, new database instance and have connected to it, it is time to create the structures that will hold your data. The most basic of these structures is the table. Before adding data to a table, you must first create it in the database.

10.3.1 The Basics of Table Creation

The act of creating a table defines the data the table holds and any constraints placed on that data. The basic elements of the table structure are the names of its columns , their data types, and their constraints. SQL data types are similar to data types in other languages. The full SQL standard allows for a large range of data types. The general form for creating a table is:

 CREATE TABLE   tblname   (   colname     type   [   modifiers   ]     [,   colname     type   [   modifiers   ]] )

A table may have any number of columns, but too many columns can render the table inefficient. Good database design can help you avoid unwieldy table structures. By creating properly normalized ^[3] tables, you can join tables together and perform searches for data across multiple tables. We discuss the mechanics of a join later in the chapter.

^[3] See Chapter 2 for a full discussion of database design and normalization.

Consider, for example, the table structure of the User table in Table 10-1.

Table 10-1. The structure for a User table

Column name	Data type	Constraints
`userID`	`INT UNSIGNED`	`NOT NULL PRIMARY KEY`
`name`	`CHAR(10)`	`NOT NULL UNIQUE INDEX`
`lastName`	`VARCHAR(30)`
`firstName`	`VARCHAR(30)`
`office`	`CHAR(2)`	`NOT NULL`

You can create the table shown in Table 10-1 using the following SQL:

 CREATE TABLE User (     userID    INT UNSIGNED     NOT NULL,     name      CHAR(10)         NOT NULL,     lastName  VARCHAR(30),     firstName VARCHAR(30),     office    CHAR(2)          NOT NULL DEFAULT `NY' )

This statement creates a table called User with five columns: userID , name , lastName , firstName , and office . After each column comes the data type for the column and some modifiers. In this example, we describe only the NOT NULL constraints as part of the CREATE statement. We will define the indexes later.

The NOT NULL modifier indicates that the column may not contain any NULL values. If you attempt to assign a NULL value to that column, the database will issue an error. A couple of exceptions exist for this rule. First, if the column is some kind of sequence column, ^[4] the database will automatically generate a unique value for the column. The second exception is when you specify a default value for a column as we did for the office column. When you insert a NULL value into a NOT NULL column with a default value, the default value is inserted in place of the NULL .

^[4] Sequence columns are columns for which the database automatically generates values. The mechanics of sequence columns vary from database to database with very little in common between any two.

To get rid of your newly created table (with the exception of Oracle users), use the DROP statement:

 DROP TABLE User

10.3.2 Data Types

In a table, each column has a data type. As I mentioned earlier, SQL data types are similar to data types in other languages. While many languages define a bare minimum set of types necessary for completeness, SQL goes out of its way to provide types such as DATE that will be useful to everyday users. You could store a date value in a column with a more basic numeric type, but having a type specifically dedicated to the nuances of date processing adds to SQL's ease of use ”one of SQL's primary goals.

In dealing with data types, you really need to know your database engine of choice. All databases share a small subset of data types and then extend beyond that core set. Furthermore, two databases may have data types of the same name that behave differently. Whatever the database you are using, however, some basic best practices can assist you in your database programming.

Before you create a table, you should know what kind of data you intend to store in the table's columns. Beyond obvious decisions about whether your data is character-based or numeric, you should also know the approximate size of the data you wish to store. If it is a numeric field, what is the maximum possible value that could make sense? What is the minimum possible value? Could that range change in the future? Answering these questions will enable you to choose a data type sufficient for storing your data without wasting disk space or RAM.

You should always strive for the smallest possible type capable of storing your value range. If, for example, you have a field that represents the population of a state, use an unsigned numeric type if your database supports unsigned types. As long as no state has a negative population, your database will operate well. Furthermore, a 32-bit numeric type will be sufficient for a state's population. It would take a state population roughly the size of the current population of Earth to get you in trouble with this choice of data type.

10.3.2.1 Numeric types

Numeric data types store uninterpreted number values. Such values can range from simple integers to complex, high-precision decimals. Your choice of numeric data type depends on what you expect the largest possible value to be, what you expect the smallest value to be, and how precise you expect that value to be. Table 10-2 shows the major numeric types for MySQL and Oracle.

Table 10-2. Numeric types in MySQL and Oracle

Database	Type	Description
MySQL	`TINYINT`	Whole 7-bit numbers in the range -128 to 127.
	`SMALLINT`	Whole 8-bit numbers in the range -32758 to 32,757.
	`MEDIUMINT`	Whole 16-bit numbers in the range -8,388,608 to 8,388,607.
	`INT`	Whole 32-bit numbers in the range -2,147,483,548 to 2,147,483,547.
	`BIGINT`	Whole 64-bit numbers in the range -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.
	`DECIMAL(` `p` , `s` `)`	Decimal values with `s` as the scale and `p` as the precision.
	`DOUBLE(` `p` , `s` `)`	Double-precision values with `s` as the scale and `p` as the precision.
	`FLOAT(` `p` `)`	Floating point numbers with a precision of 8 or less.
Oracle	`INTEGER(` `n` `)`	Whole numbers capable of storing up to `n` digits.
	`NUMBER(` `p` , `s` `)`	Any number where `p` specifies the precision and `s` the scale. The precision may be between 1 and 38 and the scale between -84 and 127.
	`FLOAT(` `p` `)`	Floating point numbers with a precision up to 126.

10.3.2.2 Character types

Managing character types is much more complicated that managing numeric types. Not only do you have to worry about minimum and maximum lengths, but you have to worry about the average size, variation, and character set of the strings. Indexing, which I will cover in the next section, also complicates the choice of character type. It generally works best when you choose a fixed-length data type for indexed character columns. If your column has little or no variation in the length of its strings, the fixed-length CHAR data type is probably your best bet. An example of a solid candidate for the CHAR data type is a column holding a country code. The International Standards Organization (ISO) provides a comprehensive list of standard two-character codes for countries (e.g., US for the United States, FR for France, etc.). Because the codes are always two characters , a CHAR(2) is the best way to maintain a column holding country codes.

A value does not have to have a constant length to be held in a CHAR column. It should, however, have very little variance. Phone numbers, for example, will fit in a CHAR(13) column even though phone number lengths vary from nation to nation. The variance is small enough that there is no point in making the string variable length. The problem with a CHAR field, however, is that it always takes up the exact same amount of storage no matter what you store in it. In a CHAR(20) column, the strings "A" and "ABCDEFGHIJKLMNO" all occupy the same amount of storage space. Anything under 20 characters is padded with spaces. Though the minimal potential waste for phone number values is an acceptable trade-off for the efficiency of fixed-length searches, it is not acceptable for strings with greater variance.

Variable-length fields address the needs of strings that have a significant variance between their minimum and maximum lengths. A good, common example of such a value is a web URL. A URL can be as simple as http://www.imaginary.com or as long as http://code.law. harvard .edu/filtering/test.asp?URL=http%3A%2F%2Fwww.slashdot.org . If you create a column with a fixed-width field large enough to hold the latter URL, you will waste a lot of space with the majority of values that look more like the former.

Most character types require you to specify a maximum length for the string. In most databases, the database truncates any strings that exceed the maximum length. If, for example, you insert the string "happy birthday" into a CHAR(4) field, the database will store only "happ".

Table 10-3 contains the most common character types for MySQL and Oracle.

Table 10-3. Character types for MySQL and Oracle

Database	Type	Description
MySQL	`CHAR(` `n` `)`	Fixed-length character type that holds exactly `n` characters. Shorter strings are padded with spaces to `n` characters.
	`NCHAR(` `n` `)`	Same as `CHAR` , except for Unicode strings.
	`VARCHAR(` `n` `)`	Variable-length strings that may store up to `n` characters. Any excess characters are discarded.
	`NVARCHAR(` `n` `)`	Same as `VARCHAR` , except for Unicode strings.
Oracle	`CHAR(` `n` `)`	Fixed-length character type that holds exactly `n` characters. Shorter strings are padded with spaces to `n` characters.
	`NCHAR(` `n` `)`	Same as `CHAR` , except for Unicode strings.
	`VARCHAR2`	Variable-length strings up to 4000 characters in length.
	`NVARCHAR2`	Same as `VARCHAR2` , except for Unicode strings.

10.3.2.3 Other types

SQL supports many other types, from dates and times to binary data and more. The most recent ANSI SQL, SQL99, adds support for user-defined data types in the mold of object-oriented programming languages. You should check the documentation for your database of choice to understand the full range of data types available to you.

10.3.3 Indexing

Indexes assist the database in identifying specific rows in a table. Without an index, a search for a specific row in a table containing a million rows would require the engine to walk through every single row. An index provides the database with hints about the location of the row and how many possible matches might exist for your search criteria.

The cost of an index is storage space. The most efficient use of indexes is therefore to create indexes for the columns you intend to search on. You can create an index using the following basic syntax:

 CREATE [UNIQUE] INDEX   idxname   ON   tblname   (   colname   [,   colname   ] )

As with just about any other SQL statement, each database carries its own variations on this syntax. Nevertheless, you can create the unique index for the name column in the User table referenced earlier in any database engine using the following syntax:

 CREATE UNIQUE INDEX userName ON User ( name )

Some databases also let you create an index while creating the table:

 CREATE TABLE User (     userID    INT UNSIGNED     NOT NULL,     name      CHAR(10)         NOT NULL,     lastName  VARCHAR(30),     firstName VARCHAR(30),     office    CHAR(2)          NOT NULL DEFAULT 'NY',  UNIQUE INDEX ( name )  )

When the database now searches for a row having a specific name value, it knows how to narrow its search to a subset of the table. Furthermore, because the index is unique, it knows to stop the search when it finds the matching value.

ANSI SQL also supports a special kind of index called a primary key . A relational table can have at most a single primary key. The primary key is a unique index that signifies the preferred mechanism for uniquely identifying a row in that table. In all technical respects, the primary key is indistinguishable from a unique index. You are allowed to specify single-column primary keys on the same line as the column definition in a table CREATE statement:

 CREATE TABLE cities (id      INT  NOT NULL  PRIMARY KEY  ,                      name    VARCHAR(100),                      pop     INT,                      founded DATE)

Before you create a table, determine which fields, if any, should be keys. As I mentioned earlier, any fields that support joins are good candidates for primary keys.

You will commonly want to specify multicolumn indexes and primary keys. For example, a translation table for book titles requires the book ID and language to uniquely identify a title translation:

 CREATE TABLE BookTitleTrans (     bookID    INTEGER(9)    NOT NULL,     language  CHAR(2)       NOT NULL,     title     VARCHAR2(255) NOT NULL,     PRIMARY KEY ( bookID, language ));