Section 15.5. Database Structure | Backup & Recovery: Inexpensive Backup Solutions for Open Systems

15.5. Database Structure

There are many terms thrown around when discussing RDBMSs. The good news is that you don't need to know all of them to properly back up and recover databases. You do need to know some of them, thoughabout 20 individual terms. It's helpful to know:

What all the different storage elements are and what they are called
How these elements are logically organized within the RDBMS
What facilities are in place to protect and back up the data

This information can be complex because it depends a lot on how you look at the data. This chapter presents this information from first a power user's, then a DBA's, point of view. The various building blocks of a database are defined, although we may have to go up and down the building a bit before we're done!

What About Exchange?

Chapter 20 covers Exchange, but Exchange is kind of the "odd man out" here. While Exchange is at its heart actually a relational database, Microsoft does a pretty good job of hiding that from its administrators and users. They don't use any of the usual database terminology when talking about Exchange. This chapter covers Exchange where it can, but Exchange is not covered as much as the other products.

15.5.1. The Power User's View: Logical Elements of a Database

Before looking at how databases are stored on disk, let's look at the "power user's" view of a database. This is necessary because some of these terms are used in the definition of the storage elements. We're calling it the "power user's" view because many users will have little or no knowledge of any of these terms. But unless they want to start doing the DBA's job of putting a database together, these terms may be all that they will ever need. The terms are presented in no particular order because it is very difficult to define one term without using another one. Therefore, it may help some readers to study this section more than once.

This view also could be called the "logical" view, because many of the elements described in this view don't exist in a physical sense. That fact is one of the many things that differentiate an RDBMS from a simple spreadsheet. A spreadsheet has one table that resides in one physical file. An RDBMS table, on the other hand, gives the appearance that its data is all sitting in one place, but it may be spread out all over the system.

15.5.1.1. Instance

DB2	Exchange	Informix	MySQL	Oracle	PostgreSQL	SQL Server	Sybase
Instance	Server	Instance	Instance	Instance	Cluster	Instance	Server

Instance is probably the most difficult term to explain because it means different things to different peopleand to different database platforms. The simplest definition is that an instance is one or more processes on one or more machines, through which the databases on that machine (or set machines) communicate with shared memory. There can be multiple databases within an instance, and a database also can be distributed across multiple instances on the same machine or on separate machines within a cluster. Therefore, an instance and a database are two entirely different concepts.

The Sybase term server (or dataserver) stems from the original intent that each machine/server would have one Sybase instance/server on it, although it is now quite common to have more than one instance on each machine. Informix occasionally uses the term in this manner, and it can be rather confusing. They tend to use server when speaking about the software and instance when speaking about a running environment, especially when discussing running multiple instantiations of the server.

If an instance needs to be shut down and restarted for any reason, all databases within that instance are unavailable during the shutdown. This may help you understand the original definition, because all the databases within an instance have a single connection to shared memory, which is provided by the instance. If the instance is shut down, that connection is no longer available. (See Figure 15-1 for a graphical representation of an instance.)

15.5.1.2. Database

DB2	Exchange	Informix	MySQL	Oracle	PostgreSQL	SQL Server	Sybase
Database	Database	Database	Database	Database	Database	Database	Database

A database is a collection of database objects. It may be a very simple database with one table and no indexes, or it could contain many tables, indexes, and other database objects. (All database products can have more than one database object and more than one type of database object.) For example, the "customer" database may contain a table that has customer addresses and an index for that table. It also may contain a binary large object (BLOB) table that contains a scanned-in image of the customer's contract, a regular table that contains the data from that contract, and an index for that table. Then there might be another database that keeps track of all the widgets that your company sells. (See Figure 15-1 for a graphical representation of a database.)

Figure 15-1. A database instance

15.5.1.3. Table

DB2	Exchange	Informix	MySQL	Oracle	PostgreSQL	SQL Server	Sybase
Table	N/A	Table	Table	Table	Table	Table	Table

A table is a grouping of related information. (That is why it is called a relation in formal database terminology.) Information is typically grouped in such a way that data is not replicated between tables except when necessary. In the previous example, the customer database could contain the customer information table, with each customer having a unique account number. The BLOB table that stores the customer's signed contract would then need to store only the customer's account number to be able to tie the two pieces of information together. (BLOB data is discussed later in this chapter.) That method takes up less space than storing the customer's whole name with the contract. The order table would contain that account number as well, and it might list what a customer ordered only by part number. If you wanted to see the details about that part number, the instance could reference the "parts" database using that part number. (See Figures 15-2 and 15-3 for graphical representations of a table.)

Figure 15-2. Table layout

Figure 15-3. Tablespace layout

A related term is a view, which usually refers to a virtual table. For example, you may put together a view that references the customer, order, and parts tables to present a united view of information about a given customer. Data often is replicated in multiple views, but because it's a virtual table, it doesn't take up extra storage space on the disk. A view normally is constructed on the fly from the results of a SELECT statement.

15.5.1.4. Index

DB2	Exchange	Informix	MySQL	Oracle	PostgreSQL	SQL Server	Sybase
Index	N/A	Index	Index	Index	Index	Index	Index

An index is a special-purpose object that allows for quicker lookups of a normal table. A table is indexed by the value that you normally would use to look up a record (row). For example, the customer database might be indexed by last name if customers frequently call in and do not know their account number. It has a unique ability when recovering a database, because you can usually recreate an index from an existing table instead of recovering it.

15.5.1.5. LOBs

DB2	Exchange	Informix	MySQL	Oracle	PostgreSQL	SQL Server	Sybase
BLOB, CLOB, DBCLOB	Email	BLOBspace	blob and text datatypes	BLOB, CLOB, BFILE	ByteA, BLOB, and text	Ntext, text, and image datatypes	Character or image datatype

Large object (LOB) data has grown in popularity within the last few years. It refers to anything that does not fit into a "normal" table. This may range from a large piece of text data to a scanned-in image. Most databases differentiate between character LOB (text) and binary LOB (graphics and the like) data.

If the LOB data is stored inside the database, it presents no unique backup requirements. However, the use of datatypes that allow the storage of the LOB data outside the database (i.e., in the filesystem) are a different story. While they may provide many performance enhancements, they do present a unique backup challenge. The BLOB data needs to be backed up in sync with the database data because the database is keeping track of what files are where. Suppose a piece of BLOB data was inserted at 11:00. If you back up the filesystem at 10:00 and the database at 12:00, the database will know about a file that exists out on the filesystem, but that file will not be found on your filesystem backup. Even more confusion may be added if the filesystem backup spans the time of the database backup. In other words, it begins at 10:00 and ends at 4:00, although the database backup begins at 12:00 and is done by 2:00. (PostgreSQL does allow you to store LOB data in the filesystem, but it will be backed up by default in later versions of PostgreSQL.)

There are only two ways to resolve this conflict. The easiest way is to shut down the database or put it in read-only mode during the entire time of your filesystem backup. This may be impractical for many environments. The second way is to use the snapshot concept, which allows you to take a "snapshot" of the entire filesystem in just a few seconds and then take all night to back it up. This provides a consistent picture of the filesystem at a certain point in time.

Microsoft does not really talk about emails being LOBs. But if you think about it, that's what they are. An email can be anything from a simple line of text to a huge attachment, and all of these are stored in the Exchange database. Therefore, they must be treating these as LOBs from a database perspective.

15.5.1.6. Object

DB2	Exchange	Informix	MySQL	Oracle	PostgreSQL	SQL Server	Sybase
Object	N/A	Object	Object	Object	Object	Object	Object

This generic term refers to any element managed by a database, and there are several object types, including tables, indexes, stored procedures, functions, synonyms, and triggers. So we use the term object to describe any type of element that may be in a database. This includes, but is not limited to, simple tables, indexes, BLOB tables, stored procedures, packages, and triggers.

15.5.1.7. Row

DB2	Exchange	Informix	MySQL	Oracle	PostgreSQL	SQL Server	Sybase
Tuple	N/A	Tuple	Tuple	Tuple	Tuple	Row	Tuple

A row (called a tuple in formal database terminology) is a collection of related attributes. For example, there may be a row that contains all the basic information about a customer, such as her name, address, account number, and phone number (this is also similar to a row in a spreadsheet). Each row typically has at least one unique attribute, such as the account number, to distinguish it from other rows. A row is also sometimes called a record. (See Figure 15-2 for a graphical representation of a row.)

15.5.1.8. Attribute

DB2	Exchange	Informix	MySQL	Oracle	PostgreSQL	SQL Server	Sybase
Attribute	N/A	Attribute	Attribute	Attribute	Attribute	Attribute	Attribute

An attribute is the basic element of data within a table. (See Figure 15-2 for a graphical representation of an attribute.) It is a single value, such as a customer's name or zip code. An attribute may be very small, such as a zip code, or very large, such as a BLOB. An attribute is the value that a database user changes when performing a transaction. Transactions are covered later in this chapter.

15.5.2. The DBA's View: Physical Elements of a Database Environment

The DBA has to know quite a bit more about the database than even the most sophisticated power user, because the DBA must create databases, tune them, back them up, and recover them in the case of failure. DBAs also know some variation of a programming language called SQL that allows them to construct precise types of queries that increase the usability of the database. Good DBAs also need to know quite a bit about operating system technology in order to efficiently use the storage capabilities of their operating environment.

The good news is that, unless you're a DBA, you don't need to know all of that to properly back up and recover databases. This section describes the physical elements of database storage and how they are combined with the logical elements discussed earlier.

The bad news is that, unlike the terms in the user's view, the elements in the database view are called something different in almost every product. Often, the same term presented in the previous section is used to describe different types of elements in different products. It took quite a bit of work to be able to discuss them all in a single chapter. It often was very difficult to find a generic term that would apply to all of them without confusing things. Therefore, the terms that serve as headings for these sections are often words that were coined just for this purpose. This generic term is used throughout this chapter when discussing the different types of storage elements and how they fit together. The product-specific terms are used only when referring to the products themselves.

Some of the coined terms that follow may be useful only when discussing things with someone else who has read this chapter. If you are discussing storage elements with a particular DBA who has not read this chapter, be sure to use the appropriate terms for the product that DBA knows.

15.5.2.1. Page

DB2	Exchange	Informix	MySQL	Oracle	PostgreSQL	SQL Server	Sybase
Page	Page	Page	Page	Block	Page	Page	Page

The page, also called a block, is the basic building block of every database. It is the smallest amount of data that is moved in an I/O operation. It is similar to, although not exactly the same as, a filesystem block. (You can have a 2 K block inside a database that sits on a filesystem with an 8 K block size.^[]) Page sizes tend to range from 2 K to 32 K in size, but some database products allow you to specify a custom page size for your environment. Whatever size the page is, it is the smallest atomic entity within a database. When you modify a table within a database, it eventually modifies one or more pages stored somewhere on disk. (See Figures 15-2 and 15-3 for graphical representations of a page.)

] That is why I prefer the term "page" over the Oracle term, "block." It helps to differentiate between filesystem blocks and database blocks, or pages. I have seen DBAs and SAs confuse each other talking about what block size a given Oracle database should have.

15.5.2.2. Datafile

DB2	Exchange	Informix	MySQL	Oracle	PostgreSQL	SQL Server	Sybase
Container	Datafile	Chunk	Datafile	Datafile	Datafile	Datafile	Device

A datafile is where the data is stored. This may be a raw device (e.g., /dev/hda1 in Linux), or a cooked file (e.g., /oracle/data/dbs01.dbf or c:\database\somefile.dbf). Some products require the use of raw partitions, while others merely suggest it. Some products allow a mixture of raw and cooked files. The only real difference to the DBA is how they are initially created and how they are backed up. Other than that, they look the same within the database. (See Figure 15-3 for graphical representations of a page.)

15.5.2.3. Extents

DB2	Exchange	Informix	MySQL	Oracle	PostgreSQL	SQL Server	Sybase
Extent	N/A	Extent	Extent	Extent	Extent	Extent	Extent

An extent is a number of pages that are logically grouped together and are considered logically contiguous. They may or may not be physically contiguous. (If the pages in an extent are physically contiguous, that means that they are physically next to each other.) Informix extents are physically contiguous, while others may or may not be, depending on when and how they were created. All extents are considered logically contiguous, because they are treated as a single block of storage that is allocated to a table. Extents do not span more than one datafile, but a datafile may contain any number of extents. The actual size of an extent is determined by the database platform. (See Figures 15-1 and 15-3 for graphical representations of an extent.)

Informix DBAs: the term tablespace in this chapter refers to an Informix dbspace. A tablespace as defined here is "the space into which you insert tables." Informix uses the term tablespace (and a similar word tblspace) to mean something different. As you can see in Figure 15-1, a tablespace can consist of extents that are in different datafiles. When a table spans datafiles like this, Informix uses the term tblspace to refer to the part of a table that resides within a single datafile, and the term tablespace to refer to the amount of space a table takes up.

15.5.2.4. Tablespace

DB2	Exchange	Informix	MySQL	Oracle	PostgreSQL	SQL Server	Sybase
Tablespace	Store or storage group	Dbspace	Tablespace (InnoDB and NDB)	Tablespace	Tablespace	Filegroup	Segment

A tablespace is a collection of one or more datafiles and is the space into which you insert tables. (See Figure 15-3 for a graphical depiction of a tablespace.) A table is created in a tablespace (e.g., create table in tablespace alpha). When creating an instance in most database products, you specify which datafile will be the main (or system) tablespace. The database product then creates this tablespace for you automatically.

All Sybase databases have at least two segments: default and system. They're automatically created when you run the create database command. You can also create your own segments that consist of one or more Sybase devices and can specify those segments in the create table command. System tables reside in the system segment. If you don't create your own custom segments, all user tables will be placed in the defaultsegment. This means that segments are the same as tablespaces; however, since many Sybase DBAs do not define custom segments, they do not think of segments as tablespaces. SQL Server calls its tablespaces filegroups: the primary filegroup stores system tables, and the default filegroup is the default location for tables. You can also create your own filegroups and create tables in them as well. Since a given user's data in Exchange is stored within a single storage group, and there is a table of users, we say that a store or storage group is equivalent to a tablespace. In MySQL, only tables stored in the InnoDB and NDB storage engines have tablespaces.

15.5.2.5. Partition

DB2	Exchange	Informix	MySQL	Oracle	PostgreSQL	SQL Server	Sybase
Partition	N/A	Fragment	Partition	Partition	Partition	Partition	Partition

One of the biggest advancements in RDBMS technology is the ability to spread, or partition, a table across multiple resources. Historically, a table had to be contained within a tablespace, as described earlier. Now some database products allow you to specify that a table is partitioned across multiple tablespaces based on the values of certain attributes. For example, you could create a table that is partitioned across tablespaces A and B. You could specify that all records with value 1100 in attribute A go to tablespace A, and all records with value 101200 in attribute A go to tablespace B. (See Figure 15-1 for a graphical depiction of a partitioned table.) A DB2 partition allows you to allocate a table to multiple CPUs. A partitioned table does not present any unique backup requirements.

15.5.2.6. Master database

DB2	Exchange	Informix	MySQL	Oracle	PostgreSQL	SQL Server	Sybase
Catalog	Private and public information store databases	Sysmaster, onconfig file, rootdb	MySQL database	Control file	System tables	Master database	Master database

Each instance has some way to keep track of all the storage elements it has at its disposal. This master database keeps track of all the devices and their status, and any information that all the databases need to have access to. If multiple databases are allowed, it needs to track them as well. Sybase has a special database to do this, and Oracle has what it calls a control file, which keeps track of this information. Informix also has a special database, called the Sysmaster, that tracks the status of every object within an instance. However, some information is tracked by the onconfig file and reserved pages within the rootdb.

15.5.2.7. Transaction

DB2	Exchange	Informix	MySQL	Oracle	PostgreSQL	SQL Server	Sybase
Transaction	Transaction	Transaction	Transaction	Transaction	Transaction	Transaction	Transaction

A transaction is an activity within a database that changes one or more attributes within one or more tables. If a user changes a customer's address, that is a transaction. There are two types of transactions, a simple transaction and a complex transaction. A simple transaction is done in one statement (e.g., update attribute X in table Y to 100). A complex transaction may be much longer, and opens with a begin transaction statement and closes with an end transaction statement. There may be a number of simple statements in between the open and close statements, or there may be a complicated SQL program that updates hundreds of values based on certain parameters. For example, it has become relatively common to change someone's area code as a city grows and splits into multiple area codes. A complex transaction could be designed that would scan all customer phone numbers and change their area code based on their three-digit exchange. A complex transaction is treated as an "atomic" eventit's all or nothing. Both the start transaction and end transaction statements are recorded in the transaction log (defined later), and if anything happens before the end transaction statement is recorded, all changes that were made by that transaction are rolled back, or undone. Things that can make that happen include the user logging out in the middle, the database being shut down, the system crashing, or even the user changing his mind.

Not all storage engines in MySQL support transactions. In those ACID-compliant tables that support them, they would be called transactions. However, the default storage engine (MyISAM) does not support transactions.

15.5.2.8. Rollback log

DB2	Exchange	Informix	MySQL	Oracle	PostgreSQL	SQL Server	Sybase
Transaction log	Transaction log	Physical log	Rollback segment (InnoDB)	Undo segment, rollback segment	Internal table	Transaction log	Transaction log

From a data integrity standpoint, it is important to realize what a transaction does. While to a user's eyes a transaction changes a record in the database, it actually changes one or many pages. It is on the page level that transaction recovery is done. If a given transaction modifies 100 pages, and the transaction does not complete, those 100 pages must be returned to what they looked like before the transaction occurred. This is referred to as rolling back the page to its before image. (The before image is what the page looked like before it was changed.) The following elements describe the facilities that databases have to ensure that this (and other data-integrity activities) occurs properly.

The rollback log is the place where the database stores this before image. Informix and Oracle have a dedicated log just for this purpose. The before image of each changed page is stored in this log until the transaction is complete, or committed. Sybase, SQL Server, and DB2 record both the before image and the transaction data in the transaction log. It is important to note that this before image must be physically written to disk before any pages are to be physically changed. That ensures that the before image is available if the system crashes. How this before image is actually used varies widely between database products. Oracle has changed how rollback works, and now uses the undo segment instead of the rollback segment. PostgreSQL stores the previous records for changed rows in the table itself. At some point, previous records are deleted via a vacuum process.

15.5.2.9. Transaction log

DB2	Exchange	Informix	MySQL	Oracle	PostgreSQL	SQL Server	Sybase
Transaction log	Transaction log	Logical log	Transaction log	Redo log	Write ahead log	Transaction log	Transaction log

Suppose that a system were to crash in such a way that it needed to be recovered from your latest database backup. If there were no way to redo the transactions that occurred since the last backup, all such transactions would be lost. A transaction log records each transaction and what pages it changed. This information is used in case a system crash requires reentering those transactions. The master database knows what state each datafile is in, and it looks at each of them upon starting up. If it detects any that are corrupt, you have to restore those datafiles from your backup. The master database then looks at the datafile and realizes that it was restored from an earlier point in time. It then goes to the transaction log to "redo" all the transactions that have been recorded since that time. Uncommitted transactions are then rolled back. The actual order of this process differs from product to product. However, the main purpose remains the same. The rollback and transaction logs work together to ensure that all pages are returned to their proper state after a crash or reboot.

Many people have difficulty understanding the difference between the rollback log and the transaction log. Informix's terminology helps in this case, because its terms physical and logical log illustrate exactly what the logs contain. The physical log contains a physical "image" of a page prior to it being changed by a transaction. It doesn't know or care how the page was changed; it just knows what it looked like before it was changed. The logical log, on the other hand, keeps track of how the page was changed, so that the database can recreate this change after a recovery. Oracle's terminology helps here as well. An undo or rollback log is the log that will allow you to "undo" or "remove" the changes done by a transaction (that were not yet committed). Redo logs (aka transaction logs) allow you to repeat or "redo" the committed transactions.

MySQL does have the binary log that contains SQL statements that can be replayed against a consistent database to redo those SQL statements. However, it is not used for crash recovery. Each storage engine has its own transaction log for this purpose.

15.5.2.10. Checkpoint

DB2	Exchange	Informix	MySQL	Oracle	PostgreSQL	SQL Server	Sybase
Checkpoint	Checkpoint	Checkpoint	Checkpoint	Checkpoint	Checkpoint	Checkpoint	Checkpoint

In order to increase performance, databases keep a lot of data in memory: recently changed pages, commonly accessed pages, before images of modified pages, and the transactions themselves. This means that if the system crashes at some point, some data will be lost, because RAM is volatile. The database needs some way to go back to a time that it knows everything was on disk and nothing was in memory. This point in time is called the checkpoint.

At certain intervals, the database flushes everything to disk. All datafiles and logfiles are therefore in a consistent state. If the system were to crash without damaging the datafiles, the database would revert to this checkpoint, then replay any completed transactions that have been recorded since that checkpoint, and finally roll back any incomplete transactions. This ensures that the database can always be backed up in a consistent state.