29.4. The InnoDB EngineThe InnoDB storage engine manages tables that have the following characteristics:
Support for the InnoDB storage engine is a standard feature in binary distributions. If you build MySQL from source, InnoDB is included unless you explicitly use the --without-innodb configuration option. If a given MySQL server has the InnoDB storage engine compiled in, but you're sure that you won't need InnoDB tables, you can disable InnoDB support at runtime by starting the server with the --skip-innodb option. Disabling InnoDB reduces the server's memory re-quirements because it need not allocate any InnoDB-related data structures. Disabling InnoDB also reduces disk requirements because no InnoDB tablespace or log files need be allocated. 29.4.1. The InnoDB Tablespace and LogsInnoDB operates using two primary disk-based resources: a tablespace for storing table contents, and a set of log files for recording transaction activity. Each InnoDB table has a format (.frm) file in the database directory of the database to which the table belongs. This is the same as tables managed by any other MySQL storage engine, such as MyISAM. However, InnoDB manages table contents (data rows and indexes) on disk differently than does the MyISAM engine. By default, InnoDB uses a shared "tablespace," which is one or more files that form a single logical storage area. All InnoDB tables are stored together within the tablespace. There are no table-specific data files or index files for InnoDB the way there are for MyISAM tables. The tablespace also contains a rollback segment. As transactions modify rows, undo log information is stored in the rollback segment. This information is used to roll back failed transactions. Although InnoDB treats the shared tablespace as a single logical storage area, it can consist of one file or multiple files. Each file can be a regular file or a raw partition. The final file in the shared tablespace can be configured to be auto-extending, in which case InnoDB expands it automatically if the tablespace fills up. Because the shared tablespace is used for InnoDB tables in all databases (and thus is not database specific), tablespace files are stored by default in the server's data directory, not within a particular database directory. If you do not want to use the shared tablespace for storing table contents, you can start the server with the --innodb_file_per_table option. In this case, for each new table that InnoDB creates, it sets up an .ibd file in the database directory to accompany the table's .frm file. The .ibd file acts as the table's own tablespace file and InnoDB stores table contents in it. (The shared tablespace still is needed because it contains the InnoDB data dictionary and the rollback segment.) Use of the --innodb_file_per_table option does not affect accessibility of any InnoDB tables that may already have been created in the shared tablespace. Those tables remain accessible. In addition to its tablespace files, the InnoDB storage engine manages a set of InnoDB-specific log files that contain information about ongoing transactions. As a client performs a transaction, the changes that it makes are held in the InnoDB log. The more recent log contents are cached in memory. Normally, the cached log information is written and flushed to log files on disk at transaction commit time, though that may also occur earlier. If a crash occurs while the tables are being modified, the log files are used for auto-recovery: When the MySQL server restarts, it reapplies the changes recorded in the logs, to ensure that the tables reflect all committed transactions. InnoDB tablespace and log setup is discussed in Section 29.4.7, "Configuring and Monitoring InnoDB." 29.4.2. InnoDB and ACID ComplianceThe InnoDB storage engine provides transactional capabilities. A transaction is a logical grouping of statements that is handled by the database server as a single unit. Either all the statements execute successfully to completion or all modifications made by the statements are discarded if an error occurs. Transactional systems often are described as being ACID compliant, where "ACID" stands for the following properties:
InnoDB satisfies the conditions for ACID compliance, assuming that its log flushing behavior is set appropriately. InnoDB can be configured for log flushing that provides ACID compliance, or for flushing that gains some in performance at the risk of losing the last few transactions if a crash occurs. By default, InnoDB log flushing is set for ACID compliance. For configuration information, see Section 29.4.7.2, "Configuring InnoDB Buffers and Logs." 29.4.3. The InnoDB TRansaction ModelMultiple clients may execute transactions concurrently, but any given client performs transactions serially, one after the other. The client determines when each of its transactions begins and ends by controlling its autocommit mode. MySQL initializes each client to begin with autocommit mode enabled. This causes each statement to be committed immediately. In transactional terms, this means that each statement is a separate transaction. To group multiple statements as a single transaction so that they succeed or fail as a unit, autocommit mode must be disabled. There are two ways to do this:
If you disable autocommit explicitly, perform transactions like this: SET AUTOCOMMIT = 0; ... statements for transaction 1 ... COMMIT; ... statements for transaction 2 ... COMMIT; ... If you suspend autocommit by using START TRANSACTION, perform transactions like this: START TRANSACTION; ... statements for transaction 1 ... COMMIT; START TRANSACTION; ... statements for transaction 2 ... COMMIT; ... While autocommit mode is enabled, attempts to perform multiple-statement transactions are ineffective. Each statement is committed immediately, so COMMIT is superfluous and ROLLBACK has no effect. If you want to roll back only part of a transaction, you can set a savepoint by using the SAVEPOINT statement: SAVEPOINT savepoint_name; Multiple savepoints can be set within a transaction. To roll back to a given savepoint, use this statement: ROLLBACK TO SAVEPOINT savepoint_name; The transaction rolls back to the named savepoint and you can continue from there. Any savepoints that were set after the savepoint are deleted. Under some circumstances, the current transaction may end implicitly:
Because a statement that begins a transaction implicitly commits any current transaction, transactions cannot be nested. Transaction-control statements can affect explicit table locks. Use of START TRANSACTION or its synonyms causes an implicit UNLOCK TABLES. The MySQL server initializes each client connection to begin with autocommit enabled. Modifications to the autocommit mode made by a client to its connection persist only to the end of the connection. If a client disconnects and reconnects, the second connection begins with autocommit enabled, regardless of its setting at the end of the first connection. 29.4.4. InnoDB Locking CharacteristicsThis section describes how InnoDB uses locks internally and some query modifiers you can use to affect locking. InnoDB has the following general locking properties:
Isolation levels and multi-versioning are discussed more fully in Section 29.4.5, "InnoDB Isolation Levels, Multi-Versioning, and Concurrency Isolation Levels, Multi-Versioning, and Concurrency." InnoDB supports two locking modifiers that may be added to the end of SELECT statements. They acquire shared or exclusive locks and convert non-locking reads into locking reads:
In the REPEATABLE READ isolation level, you can add LOCK IN SHARE MODE to SELECT operations to force other transactions to wait for your transaction if they want to modify the selected rows. This is similar to operating at the SERIALIZABLE isolation level, for which InnoDB implicitly adds LOCK IN SHARE MODE to SELECT statements that have no explicit locking modifier. 29.4.5. InnoDB Isolation Levels, Multi-Versioning, and ConcurrencyAs mentioned earlier, multiple transactions may be executing concurrently within the server, one transaction per client. This has the potential to cause problems: If one client's transaction changes data, should transactions for other clients see those changes or should they be isolated from them? The transaction isolation level determines the level of visibility between transactions that is, the ways in which simultaneous transactions interact when accessing the same data. This section discusses the problems that can occur and how InnoDB implements isolation levels. Note that isolation level definitions vary among database servers, so the levels as implemented by InnoDB might not correspond exactly to levels as implemented in other database systems. When multiple clients run transactions concurrently, three problems that may result are dirty reads, non-repeatable reads, and phantoms. These occur under the following circumstances:
InnoDB implements four isolation levels that control the visibility of changes made by one transaction to other concurrently executing transactions:
The essential difference between REPEATABLE READ and SERIALIZABLE is that with REPEATABLE READ, one transaction cannot modify rows another has modified, whereas with SERIALIZABLE, one transaction cannot modify rows if another has merely even read them. Isolation levels are relevant only within the context of simultaneously executing transactions. After a given transaction has committed, its changes become visible to any transaction that begins after that. InnoDB operates by default in REPEATABLE READ mode: Each transaction sees a view of the database that consists of all changes that have been committed by the time the transaction issues its first consistent read (such as a SELECT statement), plus any changes that it makes itself. It does not see any uncommitted changes, or committed changes made by transactions that begin later than itself. InnoDB makes transaction isolation possible by multi-versioning. As transactions modify rows, InnoDB maintains isolation between them by maintaining multiple versions of the rows, and makes available to each transaction the appropriate version of the rows that it should see. Multiple versions of a row that has been changed can be derived from the current version of the row, plus the undo logs. With multi-versioning, each transaction sees a view of the contents of the database that is appropriate for its isolation level. For example, with a level of REPEATABLE READ, the snapshot of the database that a transaction sees is the state of the database at its first read. One property of this isolation level is that it provides consistent reads: A given SELECT yields the same results when issued at different times during a transaction. The only changes the transaction sees are those it makes itself, not those made by other transactions. For READ COMMITTED, on the other hand, the behavior is slightly different. The view of the database that the transaction sees is updated at each read to take account of commits that have been made by other transactions since the previous read. To set the server's default transaction isolation level at startup time, use the --transaction-isolation option. The option value should be READ-UNCOMMITTED, READ-COMMITTED, REPEATABLE-READ, or SERIALIZABLE. For example, to put the server in READ COMMITTED mode by default, put these lines in an option file: [mysqld] transaction-isolation = READ-COMMITTED The isolation level may also be set dynamically for a running server with the SET TRANSACTION ISOLATION LEVEL statement. The statement has three forms: SET GLOBAL TRANSACTION ISOLATION LEVEL isolation_level; SET SESSION TRANSACTION ISOLATION LEVEL isolation_level; SET TRANSACTION ISOLATION LEVEL isolation_level; The value of isolation_level should be READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, or SERIALIZABLE. The first form of the statement sets the server's global isolation level. It applies to all new client connections established from that point on. Existing connections are unaffected. The second form sets the isolation level for the current client connection only and applies to transactions the client performs from that point on. The third form sets the isolation level only for the current client's next transaction. Only clients that have the SUPER privilege may use the first form of the statement. Any client may use the second and third forms of the statement; they affect only its own transactions, so no special privilege is required. 29.4.6. Using Foreign KeysThe InnoDB storage engine supports the use of foreign keys. This capability enables you to declare relationships between columns in different tables, and InnoDB maintains integrity between the tables by prohibiting operations that violate those relationships. For example, you can specify requirements such as these:
The following example demonstrates some of the ways in which InnoDB provides referential integrity between tables. It shows how to define a foreign key relationship that enforces the requirements just described. The example is based on the implicit relationship between the Country and City tables in the world database:
The relationship is only implicit because Country and City are MyISAM tables: MyISAM does not have any syntax for specifying the relationship explicitly and provides no means for enforcing it. This means that you could change a Code value in the Country table and any City records with the corresponding country code would not be changed to match. You could delete a Country table row but the corresponding City records would not be deleted. In either case, the City records would become orphaned because there is no longer any corresponding Country table record for them. InnoDB does not allow these types of referential integrity failures to occur. To define a country-city relationship explicitly, derive a couple of InnoDB tables from the MyISAM tables. We'll call these tables CountryParent and CityChild to illustrate that the country records are the "parent" records and the city records are the "child" records that depend on them: mysql> CREATE TABLE CountryParent -> ( -> Code CHAR(3) NOT NULL, -> Name CHAR(52) NOT NULL, -> PRIMARY KEY (Code) -> ) ENGINE = InnoDB; mysql> CREATE TABLE CityChild -> ( -> ID INT NOT NULL AUTO_INCREMENT, -> Name CHAR(35) NOT NULL, -> CountryCode CHAR(3) NOT NULL, -> PRIMARY KEY (ID), -> INDEX (CountryCode), -> FOREIGN KEY (CountryCode) -> REFERENCES CountryParent (Code) -> ON UPDATE CASCADE -> ON DELETE CASCADE -> ) ENGINE = InnoDB; In these two tables, the column and PRIMARY KEY definitions are the same as in the original Country and City tables. The parts of the syntax that differ from the original tables are the ENGINE table option, which specifies the InnoDB storage engine, and the INDEX and FOREIGN KEY definitions for the CountryCode column in the CityChild table. It's necessary to use InnoDB because that is the only storage engine that supports foreign keys. (You can specify a FOREIGN KEY clause for other table types, but it would simply be ignored.) The FOREIGN KEY clause has several parts:
In a foreign key relationship, the referring column and the referenced column should have the same data type, and both must be indexed. (If the referring column has no index, InnoDB creates an index on it automatically.) The ON UPDATE and ON DELETE parts are optional. If you omit them, InnoDB simply disallows attempts to update or delete Code values in the CountryParent table if there are CityChild records that refer to them. The preceding CREATE TABLE statements define the foreign key relationship between CountryParent and CityChild. Now let's verify that InnoDB enforces it. Populate the two InnoDB tables with information from the original Country and City tables: mysql> INSERT INTO CountryParent SELECT Code, Name FROM Country; Query OK, 239 rows affected (0.34 sec) Records: 239 Duplicates: 0 Warnings: 0 mysql> INSERT INTO CityChild SELECT ID, Name, CountryCode FROM City; Query OK, 4079 rows affected (2.30 sec) Records: 4079 Duplicates: 0 Warnings: 0 Examine a small set of related records from the two tables so that we can see the effect of updates and deletes on them. The following statement retrieves the country information for Croatia, and the corresponding city records for Croatian cities: mysql> SELECT * FROM CountryParent AS P, CityChild AS C -> WHERE P.Code = C.CountryCode AND P.Name = 'Croatia'; +------+---------+------+--------+-------------+ | Code | Name | ID | Name | CountryCode | +------+---------+------+--------+-------------+ | HRV | Croatia | 2409 | Zagreb | HRV | | HRV | Croatia | 2410 | Split | HRV | | HRV | Croatia | 2411 | Rijeka | HRV | | HRV | Croatia | 2412 | Osijek | HRV | +------+---------+------+--------+-------------+ Test the effect of ON UPDATE CASCADE by changing the Croatia country code in the CountryParent table, and checking how that affects the CityChild table: mysql> UPDATE CountryParent SET Code = 'xxx' WHERE Name = 'Croatia'; Query OK, 1 row affected (0.21 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> SELECT * FROM CountryParent WHERE Code = 'xxx'; +------+---------+ | Code | Name | +------+---------+ | xxx | Croatia | +------+---------+ 1 row in set (0.00 sec) mysql> SELECT * FROM CityChild WHERE CountryCode = 'xxx'; +------+--------+-------------+ | ID | Name | CountryCode | +------+--------+-------------+ | 2409 | Zagreb | xxx | | 2410 | Split | xxx | | 2411 | Rijeka | xxx | | 2412 | Osijek | xxx | +------+--------+-------------+ 4 rows in set (0.00 sec) InnoDB has changed the country codes in the corresponding CityChild records to match. Test the effect of ON DELETE CASCADE by deleting the record for Croatia from the CountryParent table (which now has a Code value of 'xxx'), and checking how that affects the CityChild table: mysql> DELETE FROM CountryParent WHERE Name = 'Croatia'; Query OK, 1 row affected (0.25 sec) mysql> SELECT * FROM CountryParent WHERE Code = 'xxx'; Empty set (0.00 sec) mysql> SELECT * FROM CityChild WHERE CountryCode = 'xxx'; Empty set (0.00 sec) InnoDB has deleted the corresponding records form the CityChild table. The example demonstrates that foreign keys help you maintain referential integrity between tables. Because InnoDB performs the required changes in the referring table, you don't need to do so. This reduces application programming complexity. 29.4.7. Configuring and Monitoring InnoDBA server that has InnoDB enabled uses a default configuration for its tablespace and log files unless you provide configuration options. This section describes how to configure InnoDB explicitly, and how to obtain status information from InnoDB while the server is running. 29.4.7.1 Configuring the InnoDB TablespaceBy default, the InnoDB storage engine manages the contents for all InnoDB tables in its shared tablespace. The tablespace stores data rows and indexes. It also contains a rollback segment consisting of undo log records for ongoing transactions, in case they need to be rolled back. The shared tablespace has the following general characteristics:
If you don't specify any tablespace configuration options at all, InnoDB creates a shared tablespace consisting of a single 10MB auto-extending regular file named ibdata1 in the data directory. To control the tablespace configuration explicitly, use the innodb_data_file_path and innodb_data_home_dir options:
Normally, you place the settings for these options in an option file to make sure that the server uses the same tablespace configuration each time it starts. The following examples show various ways to set up an InnoDB tablespace:
When you first configure the tablespace, any regular (non-partition) files named by the configuration options must not exist. InnoDB will create and initialize them when you start the server. Any raw partitions named in the configuration must exist but must have the modifier newraw listed after the size in the file specification. newraw tells InnoDB to initialize the partition when the server starts up. New partitions are treated as read-only after initialization. After InnoDB initializes the tablespace, stop the server, change newraw to raw in the partition specification, and restart the server. For example, to use a 10GB Unix partition named /dev/hdc6, begin with a configuration like this: [mysqld] innodb_data_home_dir = innodb_data_file_path = /dev/hdc6:10Gnewraw Start the server and let InnoDB initialize the tablespace. Then stop the server and change the configuration from newraw to raw: [mysqld] innodb_data_home_dir = innodb_data_file_path = /dev/hdc6:10Graw After changing the configuration, restart the server. If you do not want to use the shared tablespace for storing table contents, you can configure InnoDB with the innodb_file_per_table option. For example: [mysqld] innodb_data_file_path = innodata1:100M innodb_file_per_table In this case, for each new table that InnoDB creates, it sets up an .ibd file to accompany the table's .frm file in the database directory. The .ibd file acts as the table's own tablespace file and InnoDB stores table contents in it. (The shared tablespace still is needed because it contains the InnoDB data dictionary and the rollback segment.) 29.4.7.2 Configuring InnoDB Buffers and LogsInnoDB uses a buffer pool to hold information read from InnoDB tables. The buffer pool serves to reduce disk I/O for information that is frequently accessed, and a larger buffer more effectively achieves this goal. To change the size of the buffer pool, set the innodb_buffer_pool_size option. Its default value is 8MB. If your machine has the memory available, you can set the value much higher. The InnoDB storage engine logs information about current transactions in a memory buffer. When a transaction commits or rolls back, the log buffer is flushed to disk. If the log buffer is small, it might fill up before the end of the transaction, requiring a flush to the log file before the outcome of the transaction is known. For a committed transaction, this results in multiple disk operations rather than one. For a rolled-back transaction, it results in writes that, with a larger buffer, would not need to have been made at all. To set the size of the log buffer, use the innodb_log_buffer_size option. The default value is 1MB. Typical values range from 1MB to 8MB. Values larger than 8MB are of no benefit. By default, InnoDB creates two 5MB log files in the data directory named ib_logfile0 and ib_logfile1. To configure the InnoDB log files explicitly, use the innodb_log_files_in_group and innodb_log_file_size options. The first controls how many log files InnoDB uses and the second controls how big each file is. For example, to use two log files of 50MB each, configure the log like this: [mysqld] innodb_log_files_in_group = 2 innodb_log_file_size = 50M The product of the two values is the total size of the InnoDB log files. Information is logged in circular fashion, with old information at the front of the log being overwritten when the log fills up. However, the log entries cannot be overwritten if the changes they refer to have not yet been recorded in the tablespace. Consequently, a larger log allows InnoDB to run longer without having to force changes recorded in the logs to be applied to the tablespace on disk. The innodb_flush_log_at_trx_commit setting affects how InnoDB TRansfers log information from the log buffer in memory to the log files on disk. The buffer contains information about committed transactions, so it is important that it be written properly: It is one thing to perform a write operation, and another to make sure that the operating system actually has written the information to disk. Operating systems typically buffer writes in the filesystem cache briefly and do not actually perform the write to disk immediately. To ensure that buffered information has been recorded on disk, InnoDB must perform a write operation to initiate a disk transfer and a flush operation to force the transfer to complete. InnoDB TRies to flush the log approximately once a second in any case, but the innodb_flush_log_at_trx_commit option can be set to determine how log writing and flushing occurs in addition. The setting of this option is directly related to the ACID durability property and to performance as follows:
The tradeoff controlled by the innodb_flush_log_at_trx_commit setting therefore is between durability and performance. If ACID durability is required, a setting of 1 is necessary. If a slight risk to durability is acceptable to achieve better performance, a value of 0 or 2 may be used. 29.4.7.3 Viewing InnoDB Status InformationYou can ask the InnoDB storage engine to provide information about itself by means of SHOW statements. SHOW ENGINE INNODB STATUS requires the SUPER privilege and displays extensive information about InnoDB's operation: mysql> SHOW ENGINE INNODB STATUS\G *************************** 1. row *************************** Status: ===================================== 030914 17:44:57 INNODB MONITOR OUTPUT ===================================== Per second averages calculated from the last 35 seconds ---------- SEMAPHORES ---------- OS WAIT ARRAY INFO: reservation count 65, signal count 65 Mutex spin waits 1487, rounds 28720, OS waits 51 RW-shared spins 28, OS waits 13; RW-excl spins 1, OS waits 1 ------------ TRANSACTIONS ------------ Trx id counter 0 31923 Purge done for trx's n:o < 0 21287 undo n:o < 0 0 Total number of lock structs in row lock hash table 0 LIST OF TRANSACTIONS FOR EACH SESSION: -------- FILE I/O -------- I/O thread 0 state: waiting for i/o request (insert buffer thread) I/O thread 1 state: waiting for i/o request (log thread) I/O thread 2 state: waiting for i/o request (read thread) I/O thread 3 state: waiting for i/o request (write thread) Pending normal aio reads: 0, aio writes: 0, ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0 Pending flushes (fsync) log: 0; buffer pool: 0 77 OS file reads, 10959 OS file writes, 5620 OS fsyncs 0.00 reads/s, 0 avg bytes/read, 83.20 writes/s, 41.88 fsyncs/s ------------------------------------- INSERT BUFFER AND ADAPTIVE HASH INDEX ------------------------------------- Ibuf for space 0: size 1, free list len 0, seg size 2, 0 inserts, 0 merged recs, 0 merges Hash table size 34679, used cells 1, node heap has 1 buffer(s) 6.06 hash searches/s, 36.68 non-hash searches/s --- LOG --- Log sequence number 0 1520665 Log flushed up to 0 1520665 Last checkpoint at 0 1520665 0 pending log writes, 0 pending chkp writes 10892 log i/o's done, 82.80 log i/o's/second ---------------------- BUFFER POOL AND MEMORY ---------------------- Total memory allocated 18373254; in additional pool allocated 725632 Buffer pool size 512 Free buffers 447 Database pages 64 Modified db pages 0 Pending reads 0 Pending writes: LRU 0, flush list 0, single page 0 Pages read 22, created 42, written 141 0.00 reads/s, 0.46 creates/s, 1.49 writes/s Buffer pool hit rate 1000 / 1000 -------------- ROW OPERATIONS -------------- 0 queries inside InnoDB, 0 queries in queue Main thread id 10836480, state: waiting for server activity Number of rows inserted 5305, updated 3, deleted 0, read 10 41.08 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s ---------------------------- END OF INNODB MONITOR OUTPUT ============================ SHOW TABLE STATUS, when used with any InnoDB table, displays in the Comment field of the output the approximate amount of free space available in the InnoDB tablespace: mysql> SHOW TABLE STATUS LIKE 'CountryList'\G *************************** 1. row *************************** Name: CountryList Type: InnoDB Row_format: Fixed Rows: 171 Avg_row_length: 287 Data_length: 49152 Max_data_length: NULL Index_length: 0 Data_free: 0 Auto_increment: NULL Create_time: NULL Update_time: NULL Check_time: NULL Create_options: Comment: InnoDB free: 13312 kB The free space value applies to the shared tablespace if that is where the table is stored. If the table has its own per-table tablespace (an .ibd file), the value applies to its own tablespace. The information displayed by SHOW TABLE STATUS also is available in the TABLES table of the INFORMATION_SCHEMA database. |