|< Day Day Up >|
MySQL lets you specify a number of performance-related settings when you create or alter a table. You can do this in two ways:
The following sections take a look at some of these options.
Specifying Row Format
When you create or modify a table using the MyISAM engine, you can request that the engine store rows in fixed or dynamic format. If your table contains no TEXT or BLOB columns, MySQL chooses the fixed option by default, which automatically converts VARCHAR columns to CHAR. On the other hand, if you select dynamic, MySQL converts all columns of type CHAR to VARCHAR. As a third choice, running the myisampack command instructs MySQL to compress the table into a smaller, read-only format.
Because of their consistent row size, fixed format tables translate to less work (and better response) for the MySQL engine when accessing, caching, and updating information. They are also less likely to become corrupted. If disk space is not at a premium, fixed format tables are probably the best choice.
Conversely, dynamic tables use less space, but run the risk of fragmentation and/or eventual corruption. The OPTIMIZE TABLE command is reviewed in Chapter 6, "Understanding the MySQL Optimizer," to see how to correct fragmentation problems. Note that if your table has any columns of type TEXT, BLOB, or VARCHAR, MySQL automatically chooses this option.
Finally, compressed tables use the least amount of space.
The following examines the storage implications of these types of tables. We created two identical tables, except for their ROW_FORMAT, and then inserted 100,000 rows of random data:
CREATE TABLE fixed_demo ( col1 INT NOT NULL AUTO_INCREMENT PRIMARY KEY, col2 VARCHAR(255), col3 CHAR(40), col4 INT, INDEX (col2), INDEX (col3) ) ENGINE = MYISAM ROW_FORMAT = FIXED; CREATE TABLE dynamic_demo ( col1 INT NOT NULL AUTO_INCREMENT PRIMARY KEY, col2 VARCHAR(255), col3 CHAR(40), col4 INT, INDEX (col2), INDEX (col3) ) ENGINE = MYISAM ROW_FORMAT = DYNAMIC;
As expected, the fixed_demo table used significantly more disk space than dynamic_demo (29MB data and 48MB index versus 15.5MB data and 23.3MB index).
We then ran a simple query against both tables:
SELECT COUNT(*) FROM fixed_demo WHERE col4 > 345 AND col4 < 1099; SELECT COUNT(*) FROM dynamic_demo WHERE col4 > 345 AND col4 < 1099;
Both queries ran in less than a quarter second. However, the dynamic table query took approximately 50% longer than the fixed table query. For more complex operations, the fixed table savings could be dramatic.
Finally, we ran myisampack against the fixed_demo table. This reduced the table to 18.8MB data and 37.8MB index. Rerunning the SELECT COUNT(*) query against this newly compressed table returned results 10% faster than before compression.
Specifying Table Size
When you're using a dynamic, MyISAM table, you can use the AVG_ROW_LENGTH and MAX_ROWS options to tell MySQL how large you expect your table to be. Knowing this value helps the engine construct index keys in a more optimal way. This can translate into faster performance, especially in large tables.
To test the performance benefits of specifying table size, we created two sample tables, one of which included instructions on expected table size:
CREATE TABLE size_defined ( col1 INT NOT NULL AUTO_INCREMENT PRIMARY KEY, col2 VARCHAR(255), col3 CHAR(40), col4 INT, INDEX (col2), INDEX (col3) ) ENGINE = MYISAM ROW_FORMAT = DYNAMIC AVG_ROW_LENGTH = 200 MAX_ROWS = 250000; CREATE TABLE size_not_defined ( col1 INT NOT NULL AUTO_INCREMENT PRIMARY KEY, col2 VARCHAR(255), col3 CHAR(40), col4 INT, INDEX (col2), INDEX (col3) ) ENGINE = MYISAM ROW_FORMAT = DYNAMIC;
We then loaded 100,000 rows of random data into each table. They both consumed the same amount of disk space, but simple queries ran approximately 10% faster against the size-defined table versus the size-undefined table.
Index Key Compression
When using MyISAM tables, you can request that the engine compress index keys by specifying the PACK_KEYS option when creating a table. You might find that your application's read operations improve; there might be an offsetting added cost for write operations.
Compressing index keys can save substantial disk space. To demonstrate, we created the following tables:
CREATE TABLE pack_keys_demo ( col1 INT NOT NULL AUTO_INCREMENT PRIMARY KEY, col2 VARCHAR(255), col3 INT, INDEX (col2), INDEX (col3) ) ENGINE = MYISAM PACK_KEYS = 1; CREATE TABLE pack_keys_demo_no ( col1 INT NOT NULL AUTO_INCREMENT PRIMARY KEY, col2 VARCHAR(255), col3 INT, INDEX (col2), INDEX (col3) ) ENGINE = MYISAM PACK_KEYS = 0;
We then inserted 100,000 rows of random data into each table. For this test, we made the data in col2 highly random while the data in col3 was highly duplicate: Its values only ranged between 1 and 10.
After the tables were loaded, the pack_keys_demo table consumed 13.6MB and 20.6MB of storage for data and indexes, respectively. The pack_keys_demo_no table used 13.6MB and 43.9MB of storage for data and indexes, respectively.
The impact of index compression is discussed as part of our detailed review of indexing later in this section.
Checksum Integrity Management
If you choose MyISAM for a particular table, you might elect to have MySQL compute a checksum for each row. This adds slight overhead to data update operations, but helps you locate (and then correct) corrupted tables. This value is then used by the CHECKSUM TABLE command:
mysql> CHECKSUM TABLE checksum_demo; +--------------------+-----------+ | Table | Checksum | +--------------------+-----------+ | demo.checksum_demo | 544091798 | +--------------------+-----------+ 1 row in set (0.00 sec)
To test the overhead of this option, we created two identical tables, and asked MySQL to track a checksum for only one:
CREATE TABLE checksum_demo ( col1 INT NOT NULL AUTO_INCREMENT PRIMARY KEY, col2 VARCHAR(255) ) ENGINE = MYISAM CHECKSUM = 1; CREATE TABLE checksum_demo_no ( col1 INT NOT NULL AUTO_INCREMENT PRIMARY KEY, col2 VARCHAR(255) ) ENGINE = MYISAM;
We then repeatedly ran tests that inserted 1,000 rows of random data into each table. The overhead averaged less than 5% for the first table: This is a small price to pay to improve data integrity.
Alternatively, you can still run CHECKSUM TABLE for a table without this option set. However, MySQL takes some time to process the results.
Column Types and Performance
The decisions you make regarding column types can have a significant impact on performance. MySQL offers numerous choices within each major column class (string, numeric, and date/time). Although it is assumed that you already understand the difference between these data types, as well as which ones to use for your application, this section takes a few moments to focus on the interplay of column type and responsiveness.
Making the right choice when deciding how to store dates and times can be confusing: The combination of the various column types, system settings, and date/time-specific SQL can be intimidating. However, because this book is focused on performance, a detailed discussion of the nuances of storing and calculating dates is avoided. Instead, this section reviews only string and numeric column-type considerations.
Before we begin, let's review a very handy utility provided by MySQL both as an example for open source developers who want to write functions to extend the product, as well as a tool for database designers to help pick optimal column types. The analyse() procedure takes the output of a SELECT statement and returns a report detailing the information shown in Table 4.1 for each column.
For example, take a look at the following table:
CREATE TABLE demo_pa ( col1 INT UNSIGNED AUTO_INCREMENT PRIMARY KEY, col2 SMALLINT, col3 CHAR(10), col4 VARCHAR(40) ) ENGINE = INNODB;
After loading 10,000 rows of random data into the table and running the following SQL statement:
SELECT * FROM demo_page PROCEDURE analyse(4);
The first parameter for analyse() refers to the number of columns to process. Optionally, you can also specify a maximum memory that MySQL can allocate to finding unique values in each column. The results are shown in Figure 4.3.
Figure 4.3. MySQL can allocate memory to finding unique values.
What did you learn about the table from this report, based on the sample data set?
Try experimenting with analyse() for your database; you might be surprised with what you learn.
MySQL offers database designers a rich choice of options when storing text and binary-based information. The following list explores the performance implications of these choices.
Database administrators are often torn between stowing BLOBs and TEXT data in the MySQL database versus placing the information on the file system and simply storing a link to it in the database. There are arguments in favor of both approaches from the perspective of optimal performance.
You might find that your MySQL-driven applications bog down under the increased processing and storage costs of BLOBs and TEXT, especially if that data is merely ancillary to your primary database operations. Storing variable-length information also leads to eventual fragmentation, and you might also experience data truncation if you undersize your BLOB and/or TEXT columns.
On the other hand, keeping your BLOBs and TEXT data inside MySQL lets you take advantage of its backup/restore and replication features. In addition, you now can create indexes on these types of columns. You must use prefixes to make searches that employ these indexes meaningful. To improve sorting speed, you can configure the max_sort_length option. Indexing is reviewed in Chapter 6, "Understanding the MySQL Optimizer," and engine parameters are discussed in Chapter 10, "General Server Performance and Parameters Tuning," Chapter 11, "MyISAM Performance Enhancement," and Chapter 12, "InnoDB Performance Enhancement."
After struggling through the decisions among all of the string options, you'll be relieved to know that making a choice among the numeric column options requires much less mental exertion.
A good general rule of thumb is to use the smallest numeric type that suits your needs; this helps conserve space and can also make some joins and searches more efficient. The following lists some examples.
With this many choices, many database designers fall into the expedient trap of just picking a numeric type that they've used many times before, regardless of the application's needs.
For example, suppose that High-Hat Airways' customer_master table includes a field that stores the customer's age. The database designer defines this column as an INT. However, because there are not many members of the flying public greater than 255 years old, an UNSIGNED TINYINT would suffice for this column, and save three bytes per record. Considering that the table will hold tens of millions of rows, the savings could be substantial, especially if the column were used in queries, joins, and groupings.
|< Day Day Up >|