29.4. The InnoDB Engine


29.4. The InnoDB Engine

The InnoDB storage engine manages tables that have the following characteristics:

  • Each InnoDB table is represented on disk by an .frm format file in the database directory, as well as data and index storage in the InnoDB tablespace. The InnoDB tablespace is a logical single storage area that is made up of one or more files or partitions on disk. By default, InnoDB uses a single tablespace that is shared by all InnoDB tables. The tablespace is stored in machine-independent format. It is implemented such that table sizes can exceed the maximum file size allowed by the filesystem. It is also possible to configure InnoDB to create each table with its own tablespace.

  • InnoDB supports transactions, with commit and rollback. It provides full ACID (atomicity, consistency, isolation, durability) compliance. Multi-versioning is used to isolate transactions from one another.

  • InnoDB provides auto-recovery after a crash of the MySQL server or the host on which the server runs.

  • MySQL manages query contention for InnoDB tables using multi-versioning and row-level locking. Multi-versioning gives each transaction its own view of the database. This, combined with row-level locking, keeps contention to a minimum. The result is good query concurrency even if clients are performing a mix of reads and writes. However, it's possible for deadlock to occur.

  • InnoDB supports foreign keys and referential integrity, including cascaded deletes and updates.

  • The tablespace storage format is portable, so InnoDB files can be copied directly to another host and used by a server there. (The conditions for InnoDB portability are given at Section 32.3.4, "Conditions for Binary Portability.")

Support for the InnoDB storage engine is a standard feature in binary distributions. If you build MySQL from source, InnoDB is included unless you explicitly use the --without-innodb configuration option.

If a given MySQL server has the InnoDB storage engine compiled in, but you're sure that you won't need InnoDB tables, you can disable InnoDB support at runtime by starting the server with the --skip-innodb option. Disabling InnoDB reduces the server's memory re-quirements because it need not allocate any InnoDB-related data structures. Disabling InnoDB also reduces disk requirements because no InnoDB tablespace or log files need be allocated.

29.4.1. The InnoDB Tablespace and Logs

InnoDB operates using two primary disk-based resources: a tablespace for storing table contents, and a set of log files for recording transaction activity.

Each InnoDB table has a format (.frm) file in the database directory of the database to which the table belongs. This is the same as tables managed by any other MySQL storage engine, such as MyISAM. However, InnoDB manages table contents (data rows and indexes) on disk differently than does the MyISAM engine. By default, InnoDB uses a shared "tablespace," which is one or more files that form a single logical storage area. All InnoDB tables are stored together within the tablespace. There are no table-specific data files or index files for InnoDB the way there are for MyISAM tables. The tablespace also contains a rollback segment. As transactions modify rows, undo log information is stored in the rollback segment. This information is used to roll back failed transactions.

Although InnoDB treats the shared tablespace as a single logical storage area, it can consist of one file or multiple files. Each file can be a regular file or a raw partition. The final file in the shared tablespace can be configured to be auto-extending, in which case InnoDB expands it automatically if the tablespace fills up. Because the shared tablespace is used for InnoDB tables in all databases (and thus is not database specific), tablespace files are stored by default in the server's data directory, not within a particular database directory.

If you do not want to use the shared tablespace for storing table contents, you can start the server with the --innodb_file_per_table option. In this case, for each new table that InnoDB creates, it sets up an .ibd file in the database directory to accompany the table's .frm file. The .ibd file acts as the table's own tablespace file and InnoDB stores table contents in it. (The shared tablespace still is needed because it contains the InnoDB data dictionary and the rollback segment.)

Use of the --innodb_file_per_table option does not affect accessibility of any InnoDB tables that may already have been created in the shared tablespace. Those tables remain accessible.

In addition to its tablespace files, the InnoDB storage engine manages a set of InnoDB-specific log files that contain information about ongoing transactions. As a client performs a transaction, the changes that it makes are held in the InnoDB log. The more recent log contents are cached in memory. Normally, the cached log information is written and flushed to log files on disk at transaction commit time, though that may also occur earlier.

If a crash occurs while the tables are being modified, the log files are used for auto-recovery: When the MySQL server restarts, it reapplies the changes recorded in the logs, to ensure that the tables reflect all committed transactions.

InnoDB tablespace and log setup is discussed in Section 29.4.7, "Configuring and Monitoring InnoDB."

29.4.2. InnoDB and ACID Compliance

The InnoDB storage engine provides transactional capabilities. A transaction is a logical grouping of statements that is handled by the database server as a single unit. Either all the statements execute successfully to completion or all modifications made by the statements are discarded if an error occurs. Transactional systems often are described as being ACID compliant, where "ACID" stands for the following properties:

  • Atomic. All the statements execute successfully or are canceled as a unit.

  • Consistent. A database that is in a consistent state when a transaction begins is left in a consistent state by the transaction.

  • Isolated. One transaction does not affect another.

  • Durable. All the changes made by a transaction that completes successfully are recorded properly in the database. Changes are not lost.

InnoDB satisfies the conditions for ACID compliance, assuming that its log flushing behavior is set appropriately. InnoDB can be configured for log flushing that provides ACID compliance, or for flushing that gains some in performance at the risk of losing the last few transactions if a crash occurs. By default, InnoDB log flushing is set for ACID compliance. For configuration information, see Section 29.4.7.2, "Configuring InnoDB Buffers and Logs."

29.4.3. The InnoDB TRansaction Model

Multiple clients may execute transactions concurrently, but any given client performs transactions serially, one after the other. The client determines when each of its transactions begins and ends by controlling its autocommit mode. MySQL initializes each client to begin with autocommit mode enabled. This causes each statement to be committed immediately. In transactional terms, this means that each statement is a separate transaction. To group multiple statements as a single transaction so that they succeed or fail as a unit, autocommit mode must be disabled. There are two ways to do this:

  • The first method is to disable autocommit mode explicitly:

     SET AUTOCOMMIT = 0; 

    With autocommit disabled, any following statements become part of the current transaction until you end it by issuing a COMMIT statement to accept the transaction and commit its effects to the database, or a ROLLBACK statement to discard the transaction's effects.

    When you disable autocommit explicitly, it remains disabled until you enable it again as follows:

     SET AUTOCOMMIT = 1; 

  • The second method is to suspend the current autocommit mode by beginning a transaction explicitly. Any of the following statements begins a transaction:

     START TRANSACTION; BEGIN; BEGIN WORK; 

    START TRANSACTION is standard SQL syntax. The others are synonyms. (The BEGIN statement that begins a transaction is different from the BEGIN/END syntax that is used to write compound statements in stored routines and triggers. The latter is described in Section 18.5.1, "Compound Statements.")

    After beginning a transaction with any of those statements, autocommit remains disabled until you end the transaction by committing it or by rolling it back. The autocommit mode then reverts to the value it had prior to the start of the transaction.

If you disable autocommit explicitly, perform transactions like this:

 SET AUTOCOMMIT = 0; ... statements for transaction 1 ... COMMIT; ... statements for transaction 2 ... COMMIT; ... 

If you suspend autocommit by using START TRANSACTION, perform transactions like this:

 START TRANSACTION; ... statements for transaction 1 ... COMMIT; START TRANSACTION; ... statements for transaction 2 ... COMMIT; ... 

While autocommit mode is enabled, attempts to perform multiple-statement transactions are ineffective. Each statement is committed immediately, so COMMIT is superfluous and ROLLBACK has no effect.

If you want to roll back only part of a transaction, you can set a savepoint by using the SAVEPOINT statement:

 SAVEPOINT savepoint_name; 

Multiple savepoints can be set within a transaction. To roll back to a given savepoint, use this statement:

 ROLLBACK TO SAVEPOINT savepoint_name; 

The transaction rolls back to the named savepoint and you can continue from there. Any savepoints that were set after the savepoint are deleted.

Under some circumstances, the current transaction may end implicitly:

  • If you issue any of the following statements, InnoDB implicitly commits the preceding uncommitted statements of the current transaction and begins a new transaction:

     ALTER TABLE BEGIN CREATE INDEX DROP DATABASE DROP INDEX DROP TABLE RENAME TABLE TRUNCATE TABLE LOCK TABLES UNLOCK TABLES SET AUTOCOMMIT = 1 START TRANSACTION 

    UNLOCK TABLES implicitly commits only if you have explicitly locked tables with LOCK TABLES. SET AUTOCOMMIT = 1 implicitly commits only if autocommit mode wasn't already enabled.

  • If a client connection closes while the client has a transaction pending, InnoDB rolls back the transaction implicitly. This occurs regardless of whether the connection closes normally or abnormally.

Because a statement that begins a transaction implicitly commits any current transaction, transactions cannot be nested.

Transaction-control statements can affect explicit table locks. Use of START TRANSACTION or its synonyms causes an implicit UNLOCK TABLES.

The MySQL server initializes each client connection to begin with autocommit enabled. Modifications to the autocommit mode made by a client to its connection persist only to the end of the connection. If a client disconnects and reconnects, the second connection begins with autocommit enabled, regardless of its setting at the end of the first connection.

29.4.4. InnoDB Locking Characteristics

This section describes how InnoDB uses locks internally and some query modifiers you can use to affect locking.

InnoDB has the following general locking properties:

  • InnoDB does not need to set locks to achieve consistent reads because it uses multi-versioning to make them unnecessary: Transactions that modify rows see their own versions of those rows, and the undo logs allow other transactions to see the original rows. Locking reads may be performed by adding locking modifiers to SELECT statements.

  • When locks are necessary, InnoDB uses row-level locking. In conjunction with multi-versioning, this results in good query concurrency because a given table can be read and modified by different clients at the same time. Row-level concurrency properties are as follows:

    • Different clients can read the same rows simultaneously.

    • Different clients can modify different rows simultaneously.

    • Different clients cannot modify the same row at the same time. If one transaction modifies a row, other transactions cannot modify the same row until the first transaction completes. Other transactions cannot read the modified row, either, unless they are using the READ UNCOMMITTED isolation level. That is, they will see the original unmodified row.

  • During the course of a transaction, InnoDB may acquire row locks as it discovers them to be necessary. However, it never escalates a lock (for example, by converting it to a page lock or table lock). This keeps lock contention to a minimum and improves concurrency.

  • Deadlock can occur. Deadlock is a situation in which each of two transactions is waiting for the release of a lock that the other holds. For example, if two transactions each lock a different row, and then try to modify the row locked by the other, they can deadlock. Deadlock is possible because InnoDB does not acquire locks during a transaction until they are needed. When InnoDB detects a deadlock, it terminates and rolls back one of the deadlocking transactions. It tries to pick the transaction that has modified the smallest number of rows. If InnoDB does not detect deadlock, the deadlocked transactions eventually begin to time out and InnoDB rolls them back as they do.

Isolation levels and multi-versioning are discussed more fully in Section 29.4.5, "InnoDB Isolation Levels, Multi-Versioning, and Concurrency Isolation Levels, Multi-Versioning, and Concurrency."

InnoDB supports two locking modifiers that may be added to the end of SELECT statements. They acquire shared or exclusive locks and convert non-locking reads into locking reads:

  • With LOCK IN SHARE MODE, InnoDB locks each selected row with a shared lock. Other transactions can still read the selected rows, but cannot update or delete them until the first transaction releases the locks, which happens when the transaction finishes. Also, if the SELECT will select rows that have been modified in an uncommitted transaction, IN SHARE MODE will cause the SELECT to block until that transaction commits.

  • With FOR UPDATE, InnoDB locks each selected row with an exclusive lock. This is useful if you intend to select and then modify a set of rows, because it prevents other transactions from reading or writing the rows until the first transaction releases the locks, which happens when the transaction finishes.

In the REPEATABLE READ isolation level, you can add LOCK IN SHARE MODE to SELECT operations to force other transactions to wait for your transaction if they want to modify the selected rows. This is similar to operating at the SERIALIZABLE isolation level, for which InnoDB implicitly adds LOCK IN SHARE MODE to SELECT statements that have no explicit locking modifier.

29.4.5. InnoDB Isolation Levels, Multi-Versioning, and Concurrency

As mentioned earlier, multiple transactions may be executing concurrently within the server, one transaction per client. This has the potential to cause problems: If one client's transaction changes data, should transactions for other clients see those changes or should they be isolated from them? The transaction isolation level determines the level of visibility between transactions that is, the ways in which simultaneous transactions interact when accessing the same data. This section discusses the problems that can occur and how InnoDB implements isolation levels. Note that isolation level definitions vary among database servers, so the levels as implemented by InnoDB might not correspond exactly to levels as implemented in other database systems.

When multiple clients run transactions concurrently, three problems that may result are dirty reads, non-repeatable reads, and phantoms. These occur under the following circumstances:

  • A dirty read is a read by one transaction of uncommitted changes made by another. Suppose that transaction T1 modifies a row. If transaction T2 reads the row and sees the modification even though T1 has not committed it, that is a dirty read. One reason this is a problem is that if T1 rolls back, the change is undone but T2 does not know that.

  • A non-repeatable read occurs when a transaction performs the same retrieval twice but gets a different result each time. Suppose that T1 reads some rows and that T2 then changes some of those rows and commits the changes. If T1 sees the changes when it reads the rows again, it gets a different result; the initial read is non-repeatable. This is a problem because T1 does not get a consistent result from the same query.

  • A phantom is a row that appears where it was not visible before. Suppose that T1 and T2 begin, and T1 reads some rows. If T2 inserts a new row and T1 sees that row when it reads again, the row is a phantom.

InnoDB implements four isolation levels that control the visibility of changes made by one transaction to other concurrently executing transactions:

  • READ UNCOMMITTED allows a transaction to see uncommitted changes made by other transactions. This isolation level allows dirty reads, non-repeatable reads, and phantoms to occur.

  • READ COMMITTED allows a transaction to see changes made by other transactions only if they've been committed. Uncommitted changes remain invisible. This isolation level allows non-repeatable reads and phantoms to occur.

  • REPEATABLE READ ensures that if a transaction issues the same SELECT twice, it gets the same result both times, regardless of committed or uncommitted changes made by other transactions. In other words, it gets a consistent result from different executions of the same query. In some database systems, REPEATABLE READ isolation level allows phantoms, such that if another transaction inserts new rows in the interval between the SELECT statements, the second SELECT will see them. This is not true for InnoDB; phantoms do not occur for the REPEATABLE READ level.

  • SERIALIZABLE completely isolates the effects of one transaction from others. It is similar to REPEATABLE READ with the additional restriction that rows selected by one transaction cannot be changed by another until the first transaction finishes.

The essential difference between REPEATABLE READ and SERIALIZABLE is that with REPEATABLE READ, one transaction cannot modify rows another has modified, whereas with SERIALIZABLE, one transaction cannot modify rows if another has merely even read them.

Isolation levels are relevant only within the context of simultaneously executing transactions. After a given transaction has committed, its changes become visible to any transaction that begins after that.

InnoDB operates by default in REPEATABLE READ mode: Each transaction sees a view of the database that consists of all changes that have been committed by the time the transaction issues its first consistent read (such as a SELECT statement), plus any changes that it makes itself. It does not see any uncommitted changes, or committed changes made by transactions that begin later than itself.

InnoDB makes transaction isolation possible by multi-versioning. As transactions modify rows, InnoDB maintains isolation between them by maintaining multiple versions of the rows, and makes available to each transaction the appropriate version of the rows that it should see. Multiple versions of a row that has been changed can be derived from the current version of the row, plus the undo logs.

With multi-versioning, each transaction sees a view of the contents of the database that is appropriate for its isolation level. For example, with a level of REPEATABLE READ, the snapshot of the database that a transaction sees is the state of the database at its first read. One property of this isolation level is that it provides consistent reads: A given SELECT yields the same results when issued at different times during a transaction. The only changes the transaction sees are those it makes itself, not those made by other transactions. For READ COMMITTED, on the other hand, the behavior is slightly different. The view of the database that the transaction sees is updated at each read to take account of commits that have been made by other transactions since the previous read.

To set the server's default transaction isolation level at startup time, use the --transaction-isolation option. The option value should be READ-UNCOMMITTED, READ-COMMITTED, REPEATABLE-READ, or SERIALIZABLE. For example, to put the server in READ COMMITTED mode by default, put these lines in an option file:

 [mysqld] transaction-isolation = READ-COMMITTED 

The isolation level may also be set dynamically for a running server with the SET TRANSACTION ISOLATION LEVEL statement. The statement has three forms:

 SET GLOBAL TRANSACTION ISOLATION LEVEL isolation_level; SET SESSION TRANSACTION ISOLATION LEVEL isolation_level; SET TRANSACTION ISOLATION LEVEL isolation_level; 

The value of isolation_level should be READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, or SERIALIZABLE. The first form of the statement sets the server's global isolation level. It applies to all new client connections established from that point on. Existing connections are unaffected. The second form sets the isolation level for the current client connection only and applies to transactions the client performs from that point on. The third form sets the isolation level only for the current client's next transaction.

Only clients that have the SUPER privilege may use the first form of the statement. Any client may use the second and third forms of the statement; they affect only its own transactions, so no special privilege is required.

29.4.6. Using Foreign Keys

The InnoDB storage engine supports the use of foreign keys. This capability enables you to declare relationships between columns in different tables, and InnoDB maintains integrity between the tables by prohibiting operations that violate those relationships. For example, you can specify requirements such as these:

  • A table must contain only records with ID values that are known in another reference table.

  • If an ID in the referenced table is changed, the ID in all matching records in the referencing table must be changed to match.

  • If a record with a given ID in the referenced table is deleted, all records with the matching ID in the referencing table must also be deleted.

The following example demonstrates some of the ways in which InnoDB provides referential integrity between tables. It shows how to define a foreign key relationship that enforces the requirements just described. The example is based on the implicit relationship between the Country and City tables in the world database:

  • Each Country record has a Code column that specifies a unique country code.

  • Each City record has a CountryCode column that matches the code for the country in which the city is located.

The relationship is only implicit because Country and City are MyISAM tables: MyISAM does not have any syntax for specifying the relationship explicitly and provides no means for enforcing it. This means that you could change a Code value in the Country table and any City records with the corresponding country code would not be changed to match. You could delete a Country table row but the corresponding City records would not be deleted. In either case, the City records would become orphaned because there is no longer any corresponding Country table record for them. InnoDB does not allow these types of referential integrity failures to occur.

To define a country-city relationship explicitly, derive a couple of InnoDB tables from the MyISAM tables. We'll call these tables CountryParent and CityChild to illustrate that the country records are the "parent" records and the city records are the "child" records that depend on them:

 mysql> CREATE TABLE CountryParent     -> (     ->     Code CHAR(3) NOT NULL,     ->     Name CHAR(52) NOT NULL,     ->     PRIMARY KEY (Code)     -> ) ENGINE = InnoDB; mysql> CREATE TABLE CityChild     -> (     ->     ID          INT NOT NULL AUTO_INCREMENT,     ->     Name        CHAR(35) NOT NULL,     ->     CountryCode CHAR(3) NOT NULL,     ->     PRIMARY KEY (ID),     ->     INDEX (CountryCode),     ->     FOREIGN KEY (CountryCode)     ->         REFERENCES CountryParent (Code)     ->         ON UPDATE CASCADE     ->         ON DELETE CASCADE     -> ) ENGINE = InnoDB; 

In these two tables, the column and PRIMARY KEY definitions are the same as in the original Country and City tables. The parts of the syntax that differ from the original tables are the ENGINE table option, which specifies the InnoDB storage engine, and the INDEX and FOREIGN KEY definitions for the CountryCode column in the CityChild table.

It's necessary to use InnoDB because that is the only storage engine that supports foreign keys. (You can specify a FOREIGN KEY clause for other table types, but it would simply be ignored.)

The FOREIGN KEY clause has several parts:

  • It names the column in the referring table (CountryCode).

  • It names the Code column in the CountryParent table as the referenced column. This column is the "foreign" key.

  • It specifies what actions to take if records are modified in the referenced table. The foreign key definition shown specifies the CASCADE action for both UPDATE and DELETE operations. This means that changes in the parent table are cascaded down to the child table. If you change a Code value in the CountryParent table, InnoDB changes any corresponding CityChild records with that value in the CountryCode column to match. If you delete a CountryParent record, InnoDB also deletes any CityChild records with the same country code. (InnoDB supports actions other than CASCADE, but they are not covered here. For details, see the MySQL Reference Manual.)

In a foreign key relationship, the referring column and the referenced column should have the same data type, and both must be indexed. (If the referring column has no index, InnoDB creates an index on it automatically.)

The ON UPDATE and ON DELETE parts are optional. If you omit them, InnoDB simply disallows attempts to update or delete Code values in the CountryParent table if there are CityChild records that refer to them.

The preceding CREATE TABLE statements define the foreign key relationship between CountryParent and CityChild. Now let's verify that InnoDB enforces it. Populate the two InnoDB tables with information from the original Country and City tables:

 mysql> INSERT INTO CountryParent SELECT Code, Name FROM Country; Query OK, 239 rows affected (0.34 sec) Records: 239  Duplicates: 0  Warnings: 0 mysql> INSERT INTO CityChild SELECT ID, Name, CountryCode FROM City; Query OK, 4079 rows affected (2.30 sec) Records: 4079  Duplicates: 0  Warnings: 0 

Examine a small set of related records from the two tables so that we can see the effect of updates and deletes on them. The following statement retrieves the country information for Croatia, and the corresponding city records for Croatian cities:

 mysql> SELECT * FROM CountryParent AS P, CityChild AS C     -> WHERE P.Code = C.CountryCode AND P.Name = 'Croatia'; +------+---------+------+--------+-------------+ | Code | Name    | ID   | Name   | CountryCode | +------+---------+------+--------+-------------+ | HRV  | Croatia | 2409 | Zagreb | HRV         | | HRV  | Croatia | 2410 | Split  | HRV         | | HRV  | Croatia | 2411 | Rijeka | HRV         | | HRV  | Croatia | 2412 | Osijek | HRV         | +------+---------+------+--------+-------------+ 

Test the effect of ON UPDATE CASCADE by changing the Croatia country code in the CountryParent table, and checking how that affects the CityChild table:

 mysql> UPDATE CountryParent SET Code = 'xxx' WHERE Name = 'Croatia'; Query OK, 1 row affected (0.21 sec) Rows matched: 1  Changed: 1  Warnings: 0 mysql> SELECT * FROM CountryParent WHERE Code = 'xxx'; +------+---------+ | Code | Name    | +------+---------+ | xxx  | Croatia | +------+---------+ 1 row in set (0.00 sec) mysql> SELECT * FROM CityChild WHERE CountryCode = 'xxx'; +------+--------+-------------+ | ID   | Name   | CountryCode | +------+--------+-------------+ | 2409 | Zagreb | xxx         | | 2410 | Split  | xxx         | | 2411 | Rijeka | xxx         | | 2412 | Osijek | xxx         | +------+--------+-------------+ 4 rows in set (0.00 sec) 

InnoDB has changed the country codes in the corresponding CityChild records to match.

Test the effect of ON DELETE CASCADE by deleting the record for Croatia from the CountryParent table (which now has a Code value of 'xxx'), and checking how that affects the CityChild table:

 mysql> DELETE FROM CountryParent WHERE Name = 'Croatia'; Query OK, 1 row affected (0.25 sec) mysql> SELECT * FROM CountryParent WHERE Code = 'xxx'; Empty set (0.00 sec) mysql> SELECT * FROM CityChild WHERE CountryCode = 'xxx'; Empty set (0.00 sec) 

InnoDB has deleted the corresponding records form the CityChild table.

The example demonstrates that foreign keys help you maintain referential integrity between tables. Because InnoDB performs the required changes in the referring table, you don't need to do so. This reduces application programming complexity.

29.4.7. Configuring and Monitoring InnoDB

A server that has InnoDB enabled uses a default configuration for its tablespace and log files unless you provide configuration options. This section describes how to configure InnoDB explicitly, and how to obtain status information from InnoDB while the server is running.

29.4.7.1 Configuring the InnoDB Tablespace

By default, the InnoDB storage engine manages the contents for all InnoDB tables in its shared tablespace. The tablespace stores data rows and indexes. It also contains a rollback segment consisting of undo log records for ongoing transactions, in case they need to be rolled back. The shared tablespace has the following general characteristics:

  • It can consist of one file or multiple files.

  • Each component file of the tablespace can be a regular file or a raw partition (a device file). A given tablespace can include both types of files.

  • Tablespace files can be on different filesystems or physical disk drives. One reason to place the files on multiple physical drives is to distribute InnoDB-related disk activity among them.

  • The tablespace size can exceed the limits that the filesystem places on maximum file size. This is true for two reasons. First, the tablespace can consist of multiple files and thus can be larger than any single file. Second, the tablespace can include raw partitions, which are not bound by filesystem limits on maximum file size. InnoDB can use the full extent of partitions, which makes it easy to configure a very large tablespace.

  • The last component of the tablespace can be auto-extending, with an optional limit on how large the file can grow.

If you don't specify any tablespace configuration options at all, InnoDB creates a shared tablespace consisting of a single 10MB auto-extending regular file named ibdata1 in the data directory. To control the tablespace configuration explicitly, use the innodb_data_file_path and innodb_data_home_dir options:

  • innodb_data_file_path names each of the files in the tablespace, their sizes, and possibly other optional information. The parts of each file specification are delimited by colons. If there are multiple files, separate their specifications by semicolons. The minimum combined size of the files is 10MB.

  • innodb_data_home_dir specifies a pathname prefix that is prepended to the pathname of each file named by innodb_data_file_path. By default, tablespace files are assumed to be located in the data directory. You can set the home directory to the empty value if you want filenames in innodb_data_file_path to be treated as absolute pathnames. This is useful when you want to place tablespace files on different filesystems or if you want to use raw partitions.

Normally, you place the settings for these options in an option file to make sure that the server uses the same tablespace configuration each time it starts. The following examples show various ways to set up an InnoDB tablespace:

  • A tablespace consisting of a single 100MB file named innodata1 located in the data directory:

     [mysqld] innodb_data_file_path = innodata1:100M 

    It's unnecessary to specify a value for the innodb_data_home_dir option in this case because the data directory is its default value.

  • A tablespace like that in the previous example, except that the file is auto-extending:

     [mysqld] innodb_data_file_path = innodata1:100M:autoextend 

  • A tablespace like that in the previous example, but with a limit of 500MB on the size to which the auto-extending file may grow:

     [mysqld] innodb_data_file_path = innodata1:100M:autoextend:max:500M 

  • A tablespace consisting of two 500MB files named innodata1 and innodata2 located in the data directory:

     [mysqld] innodb_data_file_path = innodata1:500M;innodata2:500M 

  • A tablespace like that in the previous example, but with the files stored under the E:\innodb directory rather than in the data directory.

     [mysqld] innodb_data_home_dir = E:/innodb innodb_data_file_path = innodata1:500M;innodata2:500M 

    Note that backslashes in Windows pathnames are written as forward slashes in option files.

  • A tablespace consisting of two files stored on different filesystems. Here the home directory is set to an empty value so that the file specifications can be given as absolute pathnames on different filesystems:

     [mysqld] innodb_data_home_dir = innodb_data_file_path = E:/innodata1:500M;D:/innodata2:500M 

When you first configure the tablespace, any regular (non-partition) files named by the configuration options must not exist. InnoDB will create and initialize them when you start the server.

Any raw partitions named in the configuration must exist but must have the modifier newraw listed after the size in the file specification. newraw tells InnoDB to initialize the partition when the server starts up. New partitions are treated as read-only after initialization. After InnoDB initializes the tablespace, stop the server, change newraw to raw in the partition specification, and restart the server. For example, to use a 10GB Unix partition named /dev/hdc6, begin with a configuration like this:

 [mysqld] innodb_data_home_dir = innodb_data_file_path = /dev/hdc6:10Gnewraw 

Start the server and let InnoDB initialize the tablespace. Then stop the server and change the configuration from newraw to raw:

 [mysqld] innodb_data_home_dir = innodb_data_file_path = /dev/hdc6:10Graw 

After changing the configuration, restart the server.

If you do not want to use the shared tablespace for storing table contents, you can configure InnoDB with the innodb_file_per_table option. For example:

 [mysqld] innodb_data_file_path = innodata1:100M innodb_file_per_table 

In this case, for each new table that InnoDB creates, it sets up an .ibd file to accompany the table's .frm file in the database directory. The .ibd file acts as the table's own tablespace file and InnoDB stores table contents in it. (The shared tablespace still is needed because it contains the InnoDB data dictionary and the rollback segment.)

29.4.7.2 Configuring InnoDB Buffers and Logs

InnoDB uses a buffer pool to hold information read from InnoDB tables. The buffer pool serves to reduce disk I/O for information that is frequently accessed, and a larger buffer more effectively achieves this goal. To change the size of the buffer pool, set the innodb_buffer_pool_size option. Its default value is 8MB. If your machine has the memory available, you can set the value much higher.

The InnoDB storage engine logs information about current transactions in a memory buffer. When a transaction commits or rolls back, the log buffer is flushed to disk. If the log buffer is small, it might fill up before the end of the transaction, requiring a flush to the log file before the outcome of the transaction is known. For a committed transaction, this results in multiple disk operations rather than one. For a rolled-back transaction, it results in writes that, with a larger buffer, would not need to have been made at all. To set the size of the log buffer, use the innodb_log_buffer_size option. The default value is 1MB. Typical values range from 1MB to 8MB. Values larger than 8MB are of no benefit.

By default, InnoDB creates two 5MB log files in the data directory named ib_logfile0 and ib_logfile1. To configure the InnoDB log files explicitly, use the innodb_log_files_in_group and innodb_log_file_size options. The first controls how many log files InnoDB uses and the second controls how big each file is. For example, to use two log files of 50MB each, configure the log like this:

 [mysqld] innodb_log_files_in_group = 2 innodb_log_file_size = 50M 

The product of the two values is the total size of the InnoDB log files. Information is logged in circular fashion, with old information at the front of the log being overwritten when the log fills up. However, the log entries cannot be overwritten if the changes they refer to have not yet been recorded in the tablespace. Consequently, a larger log allows InnoDB to run longer without having to force changes recorded in the logs to be applied to the tablespace on disk.

The innodb_flush_log_at_trx_commit setting affects how InnoDB TRansfers log information from the log buffer in memory to the log files on disk. The buffer contains information about committed transactions, so it is important that it be written properly: It is one thing to perform a write operation, and another to make sure that the operating system actually has written the information to disk. Operating systems typically buffer writes in the filesystem cache briefly and do not actually perform the write to disk immediately. To ensure that buffered information has been recorded on disk, InnoDB must perform a write operation to initiate a disk transfer and a flush operation to force the transfer to complete.

InnoDB TRies to flush the log approximately once a second in any case, but the innodb_flush_log_at_trx_commit option can be set to determine how log writing and flushing occurs in addition. The setting of this option is directly related to the ACID durability property and to performance as follows:

  • If you set innodb_flush_log_at_trx_commit to 1, changes are written from the log buffer to the log file and the log file is flushed to disk for each commit. This guarantees that the changes will not be lost even in the event of a crash. This is the safest setting, and is also the required setting if you need ACID durability. However, this setting also produces slowest performance.

  • A setting of 0 causes the log file to be written and flushed to disk approximately once a second, but not after each commit. On a busy system, this can reduce log-related disk activity significantly, but in the event of a crash can result in a loss of about a second's worth of committed changes.

  • A setting of 2 causes the log buffer to be written to the log file after each commit, but file writes are flushed to disk approximately once a second. This is somewhat slower than a setting of 0. However, the committed changes will not be lost if it is only the MySQL server that crashes and not the operating system or server host: The machine continues to run, so the changes written to the log file are in the filesystem cache and eventually will be flushed normally.

The tradeoff controlled by the innodb_flush_log_at_trx_commit setting therefore is between durability and performance. If ACID durability is required, a setting of 1 is necessary. If a slight risk to durability is acceptable to achieve better performance, a value of 0 or 2 may be used.

29.4.7.3 Viewing InnoDB Status Information

You can ask the InnoDB storage engine to provide information about itself by means of SHOW statements.

SHOW ENGINE INNODB STATUS requires the SUPER privilege and displays extensive information about InnoDB's operation:

 mysql> SHOW ENGINE INNODB STATUS\G *************************** 1. row *************************** Status: ===================================== 030914 17:44:57 INNODB MONITOR OUTPUT ===================================== Per second averages calculated from the last 35 seconds ---------- SEMAPHORES ---------- OS WAIT ARRAY INFO: reservation count 65, signal count 65 Mutex spin waits 1487, rounds 28720, OS waits 51 RW-shared spins 28, OS waits 13; RW-excl spins 1, OS waits 1 ------------ TRANSACTIONS ------------ Trx id counter 0 31923 Purge done for trx's n:o < 0 21287 undo n:o < 0 0 Total number of lock structs in row lock hash table 0 LIST OF TRANSACTIONS FOR EACH SESSION: -------- FILE I/O -------- I/O thread 0 state: waiting for i/o request (insert buffer thread) I/O thread 1 state: waiting for i/o request (log thread) I/O thread 2 state: waiting for i/o request (read thread) I/O thread 3 state: waiting for i/o request (write thread) Pending normal aio reads: 0, aio writes: 0,  ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0 Pending flushes (fsync) log: 0; buffer pool: 0 77 OS file reads, 10959 OS file writes, 5620 OS fsyncs 0.00 reads/s, 0 avg bytes/read, 83.20 writes/s, 41.88 fsyncs/s ------------------------------------- INSERT BUFFER AND ADAPTIVE HASH INDEX ------------------------------------- Ibuf for space 0: size 1, free list len 0, seg size 2, 0 inserts, 0 merged recs, 0 merges Hash table size 34679, used cells 1, node heap has 1 buffer(s) 6.06 hash searches/s, 36.68 non-hash searches/s --- LOG --- Log sequence number 0 1520665 Log flushed up to   0 1520665 Last checkpoint at  0 1520665 0 pending log writes, 0 pending chkp writes 10892 log i/o's done, 82.80 log i/o's/second ---------------------- BUFFER POOL AND MEMORY ---------------------- Total memory allocated 18373254; in additional pool allocated 725632 Buffer pool size   512 Free buffers       447 Database pages     64 Modified db pages  0 Pending reads 0 Pending writes: LRU 0, flush list 0, single page 0 Pages read 22, created 42, written 141 0.00 reads/s, 0.46 creates/s, 1.49 writes/s Buffer pool hit rate 1000 / 1000 -------------- ROW OPERATIONS -------------- 0 queries inside InnoDB, 0 queries in queue Main thread id 10836480, state: waiting for server activity Number of rows inserted 5305, updated 3, deleted 0, read 10 41.08 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s ---------------------------- END OF INNODB MONITOR OUTPUT ============================ 

SHOW TABLE STATUS, when used with any InnoDB table, displays in the Comment field of the output the approximate amount of free space available in the InnoDB tablespace:

 mysql> SHOW TABLE STATUS LIKE 'CountryList'\G *************************** 1. row ***************************            Name: CountryList            Type: InnoDB      Row_format: Fixed            Rows: 171  Avg_row_length: 287     Data_length: 49152 Max_data_length: NULL    Index_length: 0       Data_free: 0  Auto_increment: NULL     Create_time: NULL     Update_time: NULL      Check_time: NULL  Create_options:         Comment: InnoDB free: 13312 kB 

The free space value applies to the shared tablespace if that is where the table is stored. If the table has its own per-table tablespace (an .ibd file), the value applies to its own tablespace.

The information displayed by SHOW TABLE STATUS also is available in the TABLES table of the INFORMATION_SCHEMA database.



MySQL 5 Certification Study Guide
MySQL 5.0 Certification Study Guide
ISBN: 0672328127
EAN: 2147483647
Year: 2006
Pages: 312

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net