Transaction Management > Journal Management
A journal is a repository of information that is used to recover a database when aborting a transaction or statement subtransaction, and also when recovering after an application, system, or power failure. SQLite uses a single journal file per database. (It does not use journal files for in-memory databases.) It assures only rollback (undo, and not redo) of transactions, and the journal file is often called the rollback journal. The journal always resides in the same directory as the database file does, and has the same name, but with -journal appended.
SQLite permits at most one write-transaction on a database file at a time. It creates the journal file on the fly for every write-transaction, and deletes the file when the transaction is complete.
SQLite partitions the rollback journal file into variable size log segments. Each segment starts with a segment header record followed by one or more log records. The format of the segment header is shown in Figure 4-4. The header begins with a magic number containing eight bytes in this sequence: 0xD9, 0xD5, 0x05, 0xF9, 0x20, 0xA1, 0x63, and 0xD7. The magic number is used for a sanity check only, and has no special significance. The number of records (nRec, for short) component specifies how many valid log records are in this log segment. The value of the random number component is used to estimate checksums for individual log records. Different segments may have different random numbers. The initial database page count component notes how many pages were in the database file when the current journaling started. The sector size is the size of the disk sector where the journal file resides. The segment header occupies a complete disk sector. The unused space in the header is kept there as filler.
SQLite supports asynchronous transactions that are faster than normal transactions. SQLite does not recommend using asynchronous transactions, but you can set the asynchronous mode by executing an SQLite pragma command. This mode is normally used at application development time to reduce the development time. This mode is also satisfactory for some testing applications that do not test for recovery from failures. Asynchronous transactions neither flush the journal nor the database file. The journal file will have only one log segment. The nRec value is -1 (i.e., 0xFFFFFFFF as a signed value), and the actual value is derived from the size of the file.
A rollback journal file normally contains a single log segment. But, in some scenarios, it is a multisegment file, and SQLite writes the segment header record multiple times in the file. (You will see such scenarios in the Cache Flush section later.) Each time the header record is written, it is written at the sector boundary. In a multisegment journal file, the nRec field in any segment header cannot be 0xFFFFFFFF.
Non-SELECT statements from the current write-transaction produce log records. SQLite uses an old value logging technique at the page level granularity. Before altering any page for the first time, the original content of that page (along with its page number) are written into the journal in a new log record (see Figure 4-5). The record also contains a 32-bit checksum. The checksum covers both the page number and the page image. The 32-bit random value that appears in the log segment header (see the third component in Figure 4-4) is used as the checksum key. The random number is important because garbage data that appears at the end of a journal is likely data that was once in other files that have now been deleted. If the garbage data came from an obsolete journal file, the checksums might be correct. But, by initializing the checksum to a random value that is different for different journal file, SQLite minimizes that risk.
You may recall that an application can attach additional databases to an open connection by executing the SQLite ATTACH command. If a transaction modifies multiple databases, then each database has its own rollback journal. They are independent rollback journals, and not aware of one another. To bridge the gap, SQLite additionally maintains a separate aggregate journal called the master journal. The journal does not contain any log records that are used for rollback purposes. Instead, it contains the names of all the individual rollback journals that participate in the transaction. Each individual rollback journal also contains the name of the master journal as shown in Figure 4-6. If there is no attached database, or if no attached database is participating in the current transaction for update purpose, no master journal is created, and the normal rollback journal does not contain any information about the master journal.
The master journal always resides in the same directory as the main database file does, and has the same name, but with -mj appended, followed by eight randomly chosen alphanumeric characters. It is also a transient file. It is created when the transaction attempts to commit, and deleted when the commit processing is complete.
When in a user transaction, SQLite maintains a statement subjournal for the latest non-SELECT statement execution. The journal is required to recover the database from the statement failure. A statement journal is a separate, ordinary rollback journal file. It is an arbitrary named temporary file (prefixed by sqlite_). The file is not required for crash recovery operation; it is only needed for statement aborts. SQLite deletes the file when the statement is complete. The journal does not have a segment header record. The nRec (number of log records) value is kept in an in-memory data structure, and so is the database file size as of the start of the statement execution. These log records do not have any checksum information.
SQLite follows write-ahead logging (WAL) protocol to ensure durability of database changes, and recoverability of databases in the occurrences of application, system, or power failures. Writing log records in the journal file is lazy: SQLite does not force them to the disk surface immediately. However, before writing the next page in the database file, it forces all log records to disk. This is called flushing the journal. Journal flushing is done to make sure all the pages that have been written to the journal have actually reached the disk surface. It is not safe to modify the database file until after the journal has been flushed. If the database is modified before the journal is flushed, and a power failure occurs, the unflushed log records would be lost, and SQLite would not be able to completely roll back the transaction's effects from the database, resulting in database corruption.
The default commit logic is forced-log-at-commit and forced-database-at-commit. When an application commits a transaction, SQLite makes sure that all log records in the rollback journal are in the disk. At the end of the commit, the rollback journal file is deleted, and the transaction is complete. If the system fails before reaching that point, the transaction commit has failed, and it will be rolled back when the database is read next time. However, before deleting the rollback journal file, all changes to the database file are written to the disk. This is done to make sure that the database has received all updates from the transaction before the journal is deleted.
SQLite does not perform journal or database flushing at commit for asynchronous transactions. Thus, upon a failure, the database might be corrupt. Asynchronous transaction writers have been warned!