Critical Skill 2.2 Plan Your Database | MySQL: Essential Skills

Before you actually create your MySQL database and tables, you need to plan how they will be set up. This planning involves choosing names for your database, tables, and the fields within the tables. You also need to decide which data type to assign to each field in your tables.

Of course, the plans you make for your database depend on the information it will contain. For the projects in this book, we will use a fictitious company and its data. Although the company may be whimsical, you will work through concrete examples that will make the learning process easier.

The fictional company is called DuckWear. As you might guess, this is an Internet-based company that sells a variety of costumes for pet ducks . Now it s time to put yourself in their shoes, or at least in the shoes of the DuckWear database administrator.

Naming Your Database

The first decision you need to make is what to call the DuckWear company s database. Database names need to be a compromise between the desire to convey information about the database s contents and the need to be concise . Naming a database too literally (like db or data , for instance) can lead to confusion later, when there are more databases running on the MySQL server or when someone else inherits the responsibility for maintaining the database.

MySQL has only a few rules for the name of a database:

The name can have a maximum of 64 characters .
The name cannot contain a forward slash (/), backslash (\), or a period (.).

We have decided it is logical to give the DuckWear company s database the name duckwear .

Naming Your Tables

Naming the individual tables (and the fields within them) in your database requires that you know what kinds of data you are going to store in them. As with the database name, the table name should tell you what it contains but also be concise.

Very specific names (like duckwear_customer_information , for instance) may be very informative, but they become increasingly annoying to type in on the command line. They take too long and give too many opportunities for typos. For most people, a name like duck_cust will be intuitive enough to be nearly as informative, yet much easier to type than the longer version. It is also common to store a brief description about the table in the comment section when creating tables by adding the syntax comment=" < comment string" > to the [table options] part of the CREATE TABLE syntax. This can help you remember, or future administrators learn, the purpose of the table.

A MySQL table name can be a maximum limit of 64 characters and cannot use the forward slash (/), backslash (\), or period (.) characters. In addition, MySQL has a list of reserved words (see Appendix B) that are not recommended for use as names for tables or columns .

Note

The MySQL reserved words, for the most part, function as data types, commands, or predefined functions, and the improper use of them can cause syntax errors. There are ways around this. For instance, you can use a reserved word as a name by surrounding it with backticks or double quotation marks. However, the easiest approach is simply to avoid using reserved words for names. Appendix B provides a listing of the MySQL reserved words.

We have decided that all the table names in the duckwear database will start with duck_ . The table of customer information will be named duck_cust .

Naming and Defining Fields

After working with the DuckWear CEO, you have come up with a list of information that should be included in the duck_cust table. It will include a customer number, customer name and address, the duck s name, and the duck s birth date. Now you need to decide on a name for each field and choose an appropriate data type and size for it.

Developing a system, or convention, that allows you to tell at a glance what table a specific field name belongs to can save you time and effort and work as a safeguard against inserting data into the wrong field by mistake. But be careful; if the system you develop is unnecessarily complicated, it may eventually become more of a hindrance than help, particularly if someone else becomes responsible for database maintenance. It needs to be logical but also clear enough that its patterns are easily recognized and emulated.

In the end, it all boils down to a matter of personal preference. Whichever convention you choose, the important thing is to stick to it. If you have a few names that defy convention, they will always be the ones you end up mistyping the most.

For the duck_cust table, all the words in field names will be separated by an underscore for easier readability, and the field names will start with the word cust .

Note

An alternative convention to enhance readability is to use uppercase letters to make the names clear, instead of using a separating underscore (for example, DuckCust for the table and CustDuckName for that field). However, if you are running your MySQL database on a Linux-based computer, remember that your OS is case-sensitive. This means that DuckCust and Duckcust are two different names.

You should write down your plans for each of your field names, type, size, and special settings. Having this information before you create a table will help you to avoid mistakes. You can alter a table s definitions after you have created it, but your work will be much easier if you don t need to make those kinds of changes later.

Table 2-10 shows the field listings for the duck_cust table and how they can best be defined.

Table 2-10: Initial Fields for the DuckWear Customer Table (duck_cust)
Information	Field Type	Field Size	Special Settings	Field Name
Customer number	MEDIUMINT	Default	NOT NULL AUTO_INCREMENT	cust_num
Customer title	TINYINT	Default		cust_title
Customer last name	CHAR	20	NOT NULL	cust_last
Customer first name	CHAR	15	NOT NULL	cust_first
Customer suffix	ENUM	Default		cust_suffix
Address	CHAR	30	NOT NULL	cust_add1
Apt. number	CHAR	10		cust_add2
City	CHAR	15	NOT NULL	cust_city
State	CHAR	2	NOT NULL	cust_state
ZIP code	CHAR	5	NOT NULL	cust_zip1
Plus 4	CHAR	4		cust_zip2
Duck's name	CHAR	25	NOT NULL	cust_duckname
Duck's birth date	DATE	Default		cust_duckbday

Requiring Information with the NOT NULL Setting

You will notice many of the fields in Table 2-10 have the additional setting of NOT NULL . This means that data is required for these fields. Even if you do not know the correct data, some sort of default entry needs to be made. For example, you might enter "n/a" if you don t know the duck s name. That way, it is clear that the data was unknown and not accidentally left out.

Although the other fields for the customer address are set as NOT NULL , the one for the plus 4 ZIP code is not. Since it is not absolutely necessary to have the plus 4 information in an address, it can be optional.

Similarly, a customer does not need to provide a preferred title like Mr. or Ms., even though DuckWear has politely decided to offer the choice in an attempt to use the correct title on a mailing label. And a customer may not have an apartment number, so this field can also be considered optional.

Automatically Incrementing Fields

The field cust_num is defined as both NOT NULL and AUTO_INCREMENT , which means that every customer must have a number and that the number put in a newly created record will be automatically incremented by a value of one from the previous number inserted in that column. Since the MEDIUMINT data type has a maximum range of over 16 million, we can be fairly safe in assuming we will never run out of customer numbers , unless duck ownership increases drastically.

Defining Field Sizes

There are several fields that Table 2-10 lists as having the default field size. MEDIUMINT , TINYINT , ENUM , and DATE data types have specific sizes. However, the CHAR , or character, types need a specific length defined, and you must decide on the lengths of these fields before you create the table.

There is no definite rule of thumb for name, address, city, and similar field lengths. In some languages, longer names are more common than in others. Hyphenated last names may be common, for example, and these require more space to store. At best, the length you pick will be a guess that you hope will cover the majority of your customers. Try to think of the longest last name you know, and make your field slightly larger than that.

In other cases, you will know the field s length. For example, since the postal codes for states are all two letters, choosing to make the state field two characters is an easy decision.

Understanding Table Types

The default table type when you create a table is MyISAM. As a beginner, the MyISAM table is recommended, and that is what you are going to use with the projects in this book. MySQL can use several other table types (also called engines), and you should be familiar with the features that distinguish them. The available table types are shown in Table 2-11.

Table 2-11: MySQL Table Types
Table Type	Distinguishing Feature
ISAM	Original storage engine
MyISAM	Binary portable storage type; replaces ISAM
HEAP	Stored only in memory
BDB or BerkeleyDB	Transaction-safe tables with page locking
InnoDB	Transaction-safe tables with row locking
MERGE	Collection of MyISAM tables used as one table

Because most databases offer only one table type, the multiple table types offered by MySQL are often overlooked, particularly in the planning phase. The MyISAM table is a good place to start and has the advantage of being the default, but there are reasons why you might want to use one of the other table types, so we will look at each of them.

Regarding table types, MySQL has an eccentricity that you need to remember. If you specify a particular table type in MySQL, and it cannot comply with your request for some reason (for instance, if you turn off BDB and then forget and try to make a BDB table), MySQL will silently make the table MyISAM; that is, it will do it automatically and not tell you. So, if a table is not acting in the way you expect it to, it may be because of a silent conversion to MyISAM. However, in most of these situations, you will never become aware that such a conversion has happened , because the database will do what you want it to do.

MyISAM Tables

MyISAM is the default table type for MySQL. It was based on the old ISAM table type and has been extensively tested . It has table-level locking, which means during updating, no other user can access the same table. MyISAM also has many useful extensions that allow for checking and repairing tables, as well as accomplishing data recovery should your computer crash while writing to a file. The MyISAM table type allows for static, dynamic, and compressed table characteristics. It is the table version of the best of all possible worlds , because it fits most requirements while providing speed, ease of use, and reasonable security.

ISAM Tables

ISAM tables are popular in other databases and are an older table type. They will no longer be supported once MySQL reaches version 5.0, so even if you are using a pre-5.0 version of MySQL, now would not be a good time to start using ISAM tables.

HEAP Tables

HEAP tables use hashed indexes and are stored in memory. This makes them very fast, but because they are stored in memory, if your MySQL server crashes, you will lose all data stored in them. This makes them unsuitable for everyday storage. However, HEAP tables are useful for temporary tables, such as those that contain real-time statistics that are calculated anew each time the web page that displays them is loaded.

Caution

Because they reside in memory rather than having to be read from the hard disk, HEAP tables are very fast, but for most uses, the increased risk of data loss in the event of system crash does not justify the faster performance.

Transactional Tables: BDB and InnoDB

Transactional tables restrict the user access to varying amounts of information in a database when another user is accessing it, which is referred to as a transaction. Transactions give the user greater control when working with data by protecting the sections of the data being used for as long as the user s transaction is occurring.

There are advantages and disadvantages to using transactional tables, and you must weigh them in concert with your particular data-storage and data-access needs. The transactional table must keep track of what data is being used and when the user is finished with it in order to refuse other users simultaneous access to the same data. This requires that certain resources, like memory, be relegated to this process, which in turn affects things like the amount of memory available and the response time to a request. This is called transaction overhead. Table 2-12 summarizes the pros and cons of using transactional tables.

Table 2-12: Pros and Cons of Using Transactional Tables (BDB and InnoDB)
Advantages	Disadvantages
Safer data because of automatic backups and recovery logs	Slower because of transaction overhead
Ability to ignore changes, if not run in auto-commit mode	Uses more disk space because of transaction overhead
Changes restored, if update fails	Uses more memory to do updates
Provides better concurrency

MySQL offers the two most commonly used transactional table types: BDB (or BerkeleyDB) and InnoDB.

BDB Tables BDB, provided by SleepyCat (www. sleepycat .com), uses page-level locking. That means if one user is accessing a page from a table, no one else can access that table until that user is finished. This makes each transaction safer, but can lead to slower response time in some situations. In order to use BDB tables, you use a binary version of MySQL that has been compiled with BDB support or configure the MySQL source using the withberkeleydb option.

Tip	If you decide not to use BDB tables, start your MySQL server with the skipbdb option. This will free a lot of memory, since the BDB library and the memory MySQL would normally specify for its use will not be reserved on startup. Of course, this means that you will not be able to use BDB tables.

Currently, BDB is not used nearly as much as the alternative transactional table type, InnoDB.

InnoDB InnoDB tables, made by Innobase Oy (www.innodb.com), are distributed under the GPL, as well as commercially. InnoDB features row-level locking (while a user is accessing a row, no one else can access that row), consistent nonlocking read in SELECT statements (to extract information from the database), and common table space for all tables (all TABLES are stored in the same area).

If you need commercial support or commercial licenses for your application and cost is a concern, you may not want to use InnoDB. Not using InnoDB will save you about 20 to 50 percent for licenses and support contracts. However, if data integrity is your primary concern, InnoDB provides MySQL with a transactional storage engine and crash-recovery capabilities.

InnoDB has been designed for maximum performance when processing large data volumes , so if your database includes large amounts of data and handles a lot of traffic, InnoDB may provide the speed and reliability you require. Also, any of the other disk-based, relational, database engines probably do not match its CPU efficiency. So, in a nutshell , most users that need a transactional database prefer the speed and features of InnoDB.

MERGE Tables

MERGE tables are two or more identical MyISAM tables joined by a UNION statement. You can only select, delete, and update from the collection of tables. If you drop the MERGE table, you are dropping only the MERGE specification. That means the MERGE table no longer exists, but the MyISAM tables it was constructed from and the data in them are still intact.

The most common reason to use MERGE tables is to get more speed. You can split a big, read-only table into several parts, and then put the different table parts on different disks. This results in faster access times, and therefore more efficient searches. Also, if you know exactly what you are looking for within the split parts , you can search in just one of the split tables for some queries, or if you need to search the entire table, use a MERGE command to access the parts as a whole.

Repairs on MERGE tables can be more efficient than on the same amount of data stored in any other table type, assuming that you have made it a MERGE table because of its size. It is faster to repair the individual files that are mapped to a MERGE file than to try to repair one huge file. MyISAM, and therefore MERGE, tables are represented as individual files on the hard drive. Using MERGE also allows you to work around your operating system s file-size limit.

However, there are some disadvantages to using MERGE tables:

You can use only identical MyISAM tables for a MERGE table.
The REPLACE command does not work.
Key reads are slower, which negates the reason for having a key (or index) to read.

Because of the way they are constructed, MERGE tables sometimes require different syntax than other tables in order to manipulate them in the same fashion. This isn t a difficulty so much as it is just a fact about MERGE tables that you should remember. Obviously, MERGE tables will only be useful under special conditions; however, if your MyISAM tables develop growth- related restrictions, a MERGE table may be the most effective answer.

Now that you have learned about data types and how to plan a database, you next need to know about the actual commands for creating MySQL databases and tables. Then you can use the commands to set up our duckwear database.

Ask the Expert

What are the main concerns I should consider in selecting a table type?

There are three main concerns you should consider when deciding on a table type: file size, data access, and data safety.

MyISAM, Merge, BDB are all subject to OS file size limits (although MERGE can to some extent work around that). HEAP and TEMPORARY (a table that is created for only as long as the current connection to the database is open ) type tables are stored in memory and therefore subject to the amount of memory your computer has available. InnoDB tables are able to get around the file size limits completely, because they create a shared area that data is written across. For instance, if your InnoDB files are set at 2GB each and you have a 6GB table, InnoDB will spread that one table over all three files.

Data access is also a primary concern. The default table type for MySQL, MyISAM, has table-level locking, which means during an update operation, no other user can access any other record of the same table. BDB uses page-level locking, and during an update operation, no other user can access any other record residing in the same database page of that table, until the locking transaction issues a COMMIT statement. InnoDB, however, uses row-level locking. Row-level locking ensures that during an update operation, no other user can access that particular row, until the locking transaction issues a COMMIT statement. Therefore, InnoDB leaves more of the data available for other users to access, while still maintaining data security for the data in the row already being accessed. That being said, any of the table types will probably work fine for a web server, but in a local area network (LAN) application, the transactional table types (BDB and InnoDB) can cause unnecessary locking- contention issues, and a database is no fun when you are locked out of it.

Most users choose MyISAM if they need speed and InnoDB if they need data integrity. If your database requires both of these things in equal measure, you can use more than one table type, or any combination of these table types, in your database.

Remember to assess the needs of each table before building it. Even though MyISAM is faster and simpler than InnoDB, InnoDB is still fast when compared to any other database engine. With InnoDB you get transactions, speed, and integrity ”three features not usually used in the same sentence . However, MyISAM is a good place to start for most of your applications, because its speed and simplicity will get your database up and running and it provides the most up-front options.

Progress Check

1.	What are the table types MySQL allows you to use?
2.	Which table type is the MySQL default?
3.	Which table type is usually used to store large amounts of data?
4.	What are the two transactional-based table types?
5.	Which table should be used only to store temporary data?

Answers

1.	The MySQL table types are ISAM, HEAP, BDB, InnoDB, MERGE, and MyISAM.
2.	MyISAM is the default table type.
3.	MERGE tables are used to store large amount of data.
4.	BDB and InnoDB are transactional table types.
5.	You should use the HEAP type only for temporary data.