Choosing a Database: MySQL Versus PostgreSQL

< Day Day Up >

If you are just starting out and learning about using a database with Linux, the first logical step is to research which database will best serve your needs. Many database software packages are available for Linux; some are free, and others cost hundreds of thousands of dollars. Expensive commercial databases, such as Oracle, are beyond the scope of this book. Instead, this chapter focuses on two freely available databases: MySQL and PostgreSQL.

Both of these high-quality databases are included on this book's CD-ROMs and DVD. But just because MySQL and PostgreSQL are free does not mean they lack power. Both of these databases (PostgreSQL in particular) can handle very complex projects. They are used in commercial organizations (like Yahoo!), government agencies (such as NASA), research institutions, and educational institutions.

Both of these databases are quite capable, and either one could probably serve your needs. However, each database has a unique set of features and capabilities that might serve your needs better or make developing database applications easier for you. The following sections look at some of the key features of any database and discuss how those features are implemented in MySQL and PostgreSQL. You should use the information in these sections to compare the two databases so you can choose the one that's right for your project.

Speed

Until recently, the speed choice was simple: If the speed of performing queries was paramount to your application, you used MySQL. MySQL has a reputation for being an extremely fast database. Until recently, PostgreSQL was quite slow by comparison.

Newer versions of PostgreSQL have improved in terms of speed (when it comes to disk access, sorting, and so on). In certain situations, such as periods of heavy simultaneous access, PostgreSQL can be significantly faster than MySQL, as you will see in the next section. However, MySQL is still extremely fast when compared to many other databases.

Data Locking

To prevent data corruption, a database needs to put a lock on data while it is being accessed. As long as the lock is on, no other process can access the data until the first process has released the lock. This means that any other processes trying to access the data have to wait until the current process completes. The next process in line then locks the data until it is finished, and the remaining processes have to wait their turn, and so on.

Of course, operations on a database generally complete quickly, so in environments with a small number of users simultaneously accessing the database, the locks are usually of such short duration that they do not cause any significant delays. However, in environments in which many people are accessing the database simultaneously, locking can create performance problems as people wait their turn to access the database.

MySQL uses a data-locking method that is fundamentally different from the data-locking method used by PostgreSQL.

Older versions of MySQL lock data at the table level, which can be considered a bottleneck for updates during periods of heavy access. This means that when someone writes a row of data in the table, the entire table is locked so no one else can enter data. If your table has 500,000 rows (or records) in it, all 500,000 rows are locked any time 1 row is accessed. Once again, in environments with a relatively small number of simultaneous users, this doesn't cause serious performance problems because most operations complete so quickly that the lock time is extremely short. However, in environments in which many people are accessing the data simultaneously, MySQL's table-level locking can be a significant performance bottleneck.

PostgreSQL, on the other hand, locks data at the row level. In PostgreSQL, only the row currently being accessed is locked. The rest of the table can be accessed by other users. This row-level locking significantly reduces the performance impact of locking in environments that have a large number of simultaneous users. Therefore, as a general rule, PostgreSQL is better suited for high-load environments than MySQL.

The MySQL release bundled with Fedora Core gives you the choice of using tables with table-level or row-level locking. In MySQL terminology, MyISAM tables use table-level locking and InnoDB tables use row-level locking.

NOTE

MySQL's data locking methods are discussed in more depth at http://www.mysql.com/doc/en/Internal_locking.html.

You can find more information on PostgreSQL's locking at http://www.postgresql.org/idocs/index.php?locking-tables.html.

ACID Compliance in Transaction Processing to Protect Data Integrity

Another way MySQL and PostgreSQL differ is in the amount of protection they provide for keeping data from becoming corrupted. The acronym ACID is commonly used to describe several aspects of data protection:

Atomicity A term derived from the word atom, which, as you might know, means "indivisible." In the database world, this means that several database operations are treated as an atomic unit, often called a transaction. In a transaction, either all unit operations are carried out or none of them are. In other words, if any operation in the atomic unit fails, the entire atomic unit is canceled.
If you have a power failure or server crash after an original record has been deleted, for example, but before the updated one has been added, you don't lose the record because atomic transactions ensure that the original record isn't deleted if the update portion of the atomic unit fails.
Consistency Ensures that no transaction can cause the database to be left in an inconsistent state. Inconsistent states can be caused by database client crashes, network failures, and similar situations. Consistency ensures that, in such a situation, any transaction or partially completed transaction that would cause the database to be left in an inconsistent state is rolled back, or undone. This prevents, for example, the deletion of one critical field of a database record when all other fields for that record remain (referred to as orphaned data). Incomplete transactions are rolled back to maintain consistency.
Isolation Ensures that multiple transactions operating on the same data are completely isolated from each other. This prevents data corruption if two users try to write to the same record at the same time. The way isolation is handled can generally be configured by the database programmer. One way that isolation can be handled is through locking, as discussed previously.
Durability Ensures that, after a transaction has been committed to the database, it cannot be lost in the event of a system crash, network failure, or other problem. This is usually accomplished through transaction logs. Durability means, for example, that if the server crashes, the database can examine the logs when it comes back up and it can commit any transactions that were not yet complete into the database.

PostgreSQL is ACID-compliant, but again MySQL gives you the choice of using ACID-compliant tables or not. MyISAM tables are not ACID-compliant, whereas InnoDB tables are. Note that ACID compliancy is no easy task: All the extra precautions incur a performance overhead.

SQL Subqueries

Subqueries allow you to combine several operations into one atomic unit, and they enable those operations to access each other's data. By using SQL subqueries, you can perform some extremely complex operations on a database. In addition, using SQL subqueries eliminates the potential problem of data changing between two operations as a result of another user performing some operation on the same set of data. Both PostgreSQL and MySQL have support for subqueries in this release of Fedora, but this was not true in earlier releases. MySQL has only recently implemented the technology.

Importing Text Files into Tables

In MySQL and PostgreSQL, the SQL statements used to create a database table can be entered directly into the command-line clients. However, doing so introduces a lot of opportunity for error; if you make a mistake, you have to type the entire table again from the beginning. Fortunately, both MySQL and PostgreSQL can read a list of SQL statements from a plain-text file. You could also enter SQL statements into a text editor and save the file with the filename extension .sql. (The .sql extension is helpful because it makes it easy to see that this file contains SQL statements when doing a directory listing.)

Statements saved in a text file can be imported into the database with the following commands in MySQL and PostgreSQL, respectively, where mammal.sql is the name of the saved text file:

 $ mysql database_name < mammal.sql $ psql database_name < mammal.sql

Of course, for this to work, mysql or psql must be invoked by a user who has permission to create tables in the database represented by database_name. For more information, see the sections "Granting and Revoking Privileges in MySQL" and "Creating a Database in MySQL," later in this chapter.

Procedural Languages and Triggers

A procedural language is an external programming language that can be used to write functions and procedures. This allows you to do things that aren't supported by simple SQL. A trigger allows you to define an event that will invoke the external function or procedure you have written. For example, a trigger can be used to cause an exception if an INSERT statement containing an unexpected or out-of-range value for a column is given.

For example, in the CD tracking database, you could use a trigger to cause an exception if a user entered data that did not make sense. PostgreSQL has a procedural language called PL/pgSQL. Although MySQL has support for a limited number of built-in procedures and triggers, it does not have any procedural language. This means you cannot create custom procedures or triggers in MySQL, although the same effects can often be achieved through creative client-side programming.

Getting Started with Databases and Fedora

A number of steps are required before database services can be used with Fedora. The first step is to ensure that the database server has been properly started (usually at boot time). The root operator needs to make sure that Fedora starts the server during any anticipated system runlevel (see Chapter 14, "Automating Tasks," for more information). A root database user must then be created, and the database server needs to be initialized. A root database user password is then created, along with one or more databases. The root database user can then grant authorized users various levels of access to each database.

< Day Day Up >