Choosing the Right Storage Engine and Table Type

< Day Day Up >

MySQL's rich functionality and flexibility can be overwhelming: You have so many choices when constructing your MySQL-based solution. This is especially true when designing your database. MySQL offers an ever-increasing number of different storage engines and table types. This section reviews the various types of engines, along with how to select the best-performing option for your specific situation.

Before beginning the examination of your options, keep in mind one important MySQL flexibility feature: You can employ a mixture of different storage engine and table type configurations at the same time, usually without needing to make changes to your queries to cope with your unique combination. This allows you to experiment with varying engine arrangements to determine the best profile for your needs.

To select a storage engine for your table, you have at least two options:

Specify your choice of engine as part of your SQL data definition language (DDL):
```
 CREATE TABLE INNODB_EXAMPLE (FIELD1 INT, FIELD2 VARCHAR(30)) ENGINE = INNODB; CREATE TABLE MY_ISAM EXAMPLE (FIELD1 INT, FIELD2 VARCHAR(30)) ENGINE = MYISAM; 
```
Note that you can interchange TYPE with ENGINE, although the latter is the preferred option for newer versions of MySQL beginning with versions 4.0.18/4.1.2. In fact, the TYPE keyword will disappear in versions 5.1.0 and beyond, so it's a good idea to get used to ENGINE instead.
Choose the appropriate option in the MySQL Table Editor from within MySQL Query Browser or MySQL Administrator, as shown in Figure 4.1.
Figure 4.1. Options in the MySQL Table Editor.

InnoDB

First offered in source-code form in version 3.23.34a and binary distribution 4.0, InnoDB is MySQL's high-performance, transaction-safe engine. It provides advanced multiuser concurrency capabilities by allowing you to lock individual rows rather than entire tables.

Consider using InnoDB as your engine for those tables that will participate in critical transactions those operations in which you must guarantee that data alterations are performed in one unit. InnoDB is also a wise choice when you are faced with the need to store very large amounts of data in a table. In fact, InnoDB tables can even exceed the caps placed on file size by the file system. Note that InnoDB tables generally consume more disk space than the other most popular engine, MyISAM.

Selecting InnoDB as your database engine brings some additional configuration responsibilities, especially with regard to data and log file settings. You'll explore making the most of InnoDB's copious tuning options in Part IV.

For now, let's look at a scenario in which InnoDB is the right choice. Suppose that developers for High-Hat Airways are designing a MySQL-based online reservation system. When customers place reservations, they also request a seat assignment. This operation updates two key tables: reservations and seats, both of which are expected to become very large over time. To ensure data integrity, it's imperative that these tables are both updated together if the transaction succeeds, or both returned to their original state should the transaction fail.

In this case, it makes sense to utilize the InnoDB engine for both tables, and create a transaction to combine the separate operations into one unit. In addition, it is wise to separate the tables and their indexes onto directories located on different disk drives. Note that this separation might not be possible on some operating systems; check your MySQL documentation to be certain.

MyISAM

MyISAM is a direct descendant of the ISAM database engine the original product from MySQL. On some platforms, it is the default engine when you create a new table (unless you specify otherwise via the -default-storage-engine parameter). Among its many advantages, it excels at high-speed operations that don't require the integrity guarantees (and associated overhead) of transactions.

To illustrate this performance difference between InnoDB and MyISAM, the following are two basic sample tables, identical in all ways except for their MySQL engine:

 CREATE TABLE SAMPLE_INNODB (     col1 INT NOT NULL AUTO_INCREMENT PRIMARY KEY,     col2 VARCHAR(255),     col3 CHAR(80),     col4 DATETIME,     col5 FLOAT ) ENGINE = INNODB; CREATE TABLE SAMPLE_MYISAM (     col1 INT NOT NULL AUTO_INCREMENT PRIMARY KEY,     col2 VARCHAR(255),     col3 CHAR(80),     col4 DATETIME,     col5 FLOAT ) ENGINE = MYISAM;

Next, we repeatedly ran an ODBC-based program to insert large numbers of rows containing random data into each table. The results were conclusive: On average, the insert operations took 20% longer to complete for the InnoDB table than for the MyISAM table. This isn't surprising when you consider the overhead that InnoDB's transaction support adds to these kinds of operations. Of course, there are ways to optimize InnoDB in these kinds of situations, but it is still worthwhile to see the difference between the two engines.

MyISAM further boosts system response by letting administrators place their data and index files in separate directories, which can be stored on different disk drives. Again, check the version of MySQL for your operating system to be certain that it supports this separation. The full performance implications of disk drive management are discussed later in Part IV.

Finally, MyISAM tables offer more sophisticated indexing capabilities, including index support for BLOB and TEXT columns.

Continuing with the High-Hat Airways example, suppose that part of the customer support application will collect comments from clients about their High-Hat experience. Customers might telephone these comments, or might enter them via the website. Periodically, management wants to run queries against the comments to gauge customer opinions and search for particular problems. These queries will be free-form; there's no way to come up with a concrete set of query parameters.

In this case, it would be astute to use the MyISAM engine's full-text searching capabilities to give users the power to construct fast, flexible queries. A (simplified) version of the table would look like this:

 CREATE TABLE customer_feedback (     feedback_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,     feedback_text TEXT,     feedback_date DATETIME,     FULLTEXT (feedback_text) ) ENGINE = MYISAM;

Users can now construct powerful searches against customer comments:

 SELECT * FROM customer_feedback WHERE match(feedback_text) AGAINST('irate') SELECT * FROM customer_feedback WHERE match(feedback_text) AGAINST('+luggage     +lawyer' IN BOOLEAN MODE)

Chapter 11, "MyISAM Performance Enhancement," spends much more time investigating how to keep full-text searches as fast as possible.

MEMORY

Choosing the MEMORY engine option (formerly known as HEAP) allows you to create high-speed, memory-resident tables that are accessible to other authorized users for as long as the MySQL server is running. As an added benefit, you can select either a hash or b-tree index for each of your MEMORY tables. Indexing options are discussed in more detail in Chapter 7, "Indexing Strategies."

MEMORY tables offer a fast, simple way to perform sophisticated interprocess communication. The following explores a sample scenario in which this is the right engine for the task at hand.

High-Hat Airways' CIO has just promised a new chat system that must tie into the online reservation application. This new system will provide access to live operators for real-time chat and support. The MEMORY option makes it much easier to develop this potentially complicated feature.

 CREATE TABLE conversation (     conversation_id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,     conversation_start DATETIME,     conversation_end DATETIME ) ENGINE = MEMORY; CREATE TABLE message_buffer (     message_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,     conversation_id INT NOT NULL,     message_sender VARCHAR(255),     message_text VARCHAR(255),     message_date DATETIME,     INDEX (conversation_id),     INDEX (message_sender) ) ENGINE = MEMORY;

Each chat client would use this table as the communication interface. To keep the table from growing indefinitely, you would also write a simple process to purge the table of old conversations. This is far simpler than the effort involved in writing and using other interprocess methods.

In this example, three main drawbacks are associated with MEMORY tables:

If the MySQL server goes down, it wipes out all active chat sessions. However, this is not likely to be a big issue given the stability of MySQL combined with the transient nature of these types of conversations.
You cannot use BLOB or TEXT columns in MEMORY tables, but VARCHAR columns are supported as of MySQL version 5.0.3.
If the max_heap_table_size parameter is not set correctly, you run the risk of consuming too much memory. You can also control MEMORY table size with the MAX_ROWS option when creating the table.

As mentioned earlier, MEMORY tables are fast. How fast? To get an idea of the speed of memory versus disk access, we created three identical copies of the message_buffer table, with the only difference being the storage engine; we selected the MYISAM, INNODB, and MEMORY options, respectively.

After the tables were created, we loaded large blocks of random data into each one. On average, these operations finished 33% faster with the MEMORY table than the MYISAM table, and 50% faster than the INNODB table.

We also constructed an identical, terribly inefficient table-scan query against each table. The results were consistent with the data insert process: The MEMORY table was significantly faster than its peers.

MERGE

MERGE tables are great for spreading the processing load from a single table onto a set of identical subtables. The chief downsides to this approach are fairly minimal:

MERGE tables must use the MYISAM engine, so they are not transaction-safe.
By definition, each subtable must have the same structure as all of its peers.
Because information is spread among subtables, there are certain situations in which the MySQL engine must perform significant additional index processing to return a result set.
Each subtable requires that the MySQL engine allocate a file descriptor for each client process that needs access to the table. File descriptors aren't free, so this might also negatively affect performance.

Despite these drawbacks, there are situations in which you will want to select a MERGE table as the optimal solution for your processing needs. The following example identifies a situation in which this would come in handy.

High-Hat Airways processes tens of millions of ticket purchase transactions each month. Periodically, a customer will call the reservation center and inquire about an old transaction. Transactions are kept in the live system database for 12 months before being archived. Unfortunately, even though the table that holds the purchase records is structured correctly and indexed appropriately, there is still a problem with latency.

In this case, a MERGE table might be the simplest way of eliminating unnecessary processing. This month's transactions would be kept in a dedicated table:

 CREATE TABLE current_month (     transaction_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,     customer_id INT,     transaction_date DATETIME,     amount FLOAT,     INDEX(customer_id) ) ENGINE = MYISAM;

Transactions for earlier months would be stored in separate tables:

 CREATE TABLE january (     transaction_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,     customer_id INT,     transaction_date DATETIME,     amount FLOAT,     INDEX(customer_id) ) ENGINE = MYISAM; CREATE TABLE february (     transaction_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,     customer_id INT,     transaction_date DATETIME,     amount FLOAT,     INDEX(customer_id) ) ENGINE = MYISAM; ... ... ... CREATE TABLE december (     transaction_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,     customer_id INT,     transaction_date DATETIME,     amount FLOAT,     INDEX(customer_id) ) ENGINE = MYISAM;

At the end of each month, you would run a batch process to move the information into the correct tables. To further boost performance, these tables could be positioned on different disk drives and compressed using the myisampack utility.

Finally, you would create a MERGE table that would present a virtual view of the information found in these tables:

 CREATE TABLE transaction_history (     transaction_id INT NOT NULL AUTO_INCREMENT,     customer_id INT,     transaction_date DATETIME,     amount FLOAT,     INDEX(transaction_id),     INDEX(customer_id) ) ENGINE = MERGE UNION=(current_month, january, february, ... , december)      INSERT_METHOD = LAST;

After the MERGE table is in place, you simply run standard SQL against the new table:

 SELECT transaction_id, customer_id, transaction_date, amount FROM transaction_history WHERE customer_id = 19443;

MySQL will locate all relevant records from the monthly tables and present the results to you in a single set.

CSV

If you have enabled the CSV engine, MySQL lets you create tables that store information in comma-separated-value (CSV) format on the file system. These tables are not typically used for enhancing performance, and they have a number of important restrictions (for example, only SELECT and INSERT are available, and you cannot create indexes). However, when used properly, they can greatly simplify the process of exporting data to other applications.

For example, suppose that High-Hat's business analysts want to perform some spreadsheet-based number crunching on a particular set of information. If these analysts work against live data, this work has the potential to bog down your database server. Consequently, you need to come up with an alternative process. You have numerous options, from creating temporary tables to off-loading information to different servers. In this case, however, it might be easiest to just create a CSV table and populate it with the raw data, assuming that the data set isn't too large. You could then have the analysts load the information into their spreadsheets and work with it on their client machines.

BerkeleyDB (BDB)

The BerkeleyDB (BDB) engine is developed and maintained by Sleepycat Software. You'll find it included with the MySQL-Max binary distribution. Tables created with the BDB engine option provide sophisticated transaction support, as well as page-level locking. As with the InnoDB engine, BDB administrators have some additional configuration and tuning responsibilities.

BDB was available before InnoDB, so developers who needed transactions initially chose this engine. However, in most cases today, database developers employing transactions will likely opt for the InnoDB engine. This is not a criticism of BDB; it just reflects the current market reality.

MaxDB

Previously known as SAP DB, MaxDB is typically found in large SAP installations. Because this book is aimed at a more general audience, it does not devote large amounts of time to covering this product. However, our suggestions on table design, indexing, and good query processing apply to this engine as well.

NDB

MySQL's new clustering capabilities rely on the NDB storage engine. By spreading the data and processing load over multiple computers, clustering can greatly improve performance and reliability.

Clustering is explored in much more detail in Part V. For now, just remember that if you want a table to participate in clustering, you need to have clustering support enabled, and then specify the NDB storage engine when creating the table:

 CREATE TABLE cluster_example (     col1 INT NOT NULL AUTO_INCREMENT PRIMARY KEY,     ...     ... ) ENGINE = NDBCLUSTER;

FEDERATED

MySQL version 5.0.3 introduces a new storage engine designed for distributed computing. Specifying the FEDERATED option when creating your table tells MySQL that the table is actually resident in another database server (even if it is running on the same computer). Currently, this server must be running MySQL, but in the future other database engines will be supported.

You may wonder how MySQL knows where the remote table lives, and how it is accessed. First, you use an URL-like formatted string to tell MySQL the location of the remote table as part of the COMMENT portion of your CREATE TABLE statement. Note that the usage, format, security, and location of this connection string will likely be enhanced in future versions.

Next, MySQL reads your SQL statements, performs some internal transformations, and then accesses the FEDERATED table via its client API. The results are then presented to you as if the query was executed locally.