Chapter 16, "Optimal Replication," examined the costs and benefits of replication and a number of requirements that you could use to decide whether these benefits were worth the effort. It's now time to do the same with regard to clustering.
High availability The data stored on your MySQL server needs to be accessible 24 x 7.
Abundant, low-performance computers Your organization might not have the fastest computers, but they have lots of them.
Reliability You are very interested in keeping your information as dependable as possible; the loss of even a single record in an aborted transaction is a big concern.
Speed As a reader of this book, interest in this final criteria will likely apply to you. However, you should note that a MySQL Cluster based database's read performance might be slower than a corresponding standalone system's when the 4.1.x series is deployed. On the other hand, write performance might be faster. In any case, version 5.x should yield dramatically faster results.
Clustering Versus Replication
Given that both clustering and replication spread the data and processing loads among multiple systems, how can you determine which technology to employ? Note that these two approaches are not necessarily mutually exclusive, but for simplicity's sake, this chapter treats them as if they are.
The Case for Replication
The following list highlights some of the advantages of choosing a replication strategy rather than employing clustering. This is only a partial list, primarily focusing on performance differences.
Simplicity Although setting up a MySQL cluster is not difficult if you plan and follow instructions, deploying replication is easier and generally requires less work.
Broader operating system support Currently, the list of operating systems supported for clustering is smaller than those supported for replication. One glaring omission is Windows; if your MySQL installation relies on this operating system, you need to wait before rolling out clustering.
Query cache support As you saw in Chapter 10, "General Server Performance and Parameters Tuning," the query cache can leverage the power of memory-based processing by storing already-run queries and their results in RAM. This feature is not supported in a MySQL cluster.
More optimal query plans In certain conditions, you might need to coach the query optimizer into recommending a better query plan via the FORCE INDEX or USE INDEX commands. Furthermore, the MySQL query optimizer was not designed for distributed operations, so there can be performance-affecting issues when clustering.
Better range scans InnoDB and MyISAM are both more efficient when processing range scans. These scans might make up a significant portion of your database activity.
FULLTEXT indexing support As you saw in Chapter 11, "MyISAM Performance Enhancement," FULLTEXT indexes can make information stored in your MyISAM-based tables much more accessible.
The Case for Clustering
The preceding section makes a compelling case for using replication rather than clustering to achieve your distributed database goals. What are some arguments in favor of clustering?
Availability Although replicated (slave) servers contain copies of the master's data, these slaves are generally not meant to be updated. Consequently, the loss of the master can disrupt the transactional nature of most database-reliant applications. Conversely, a clustered environment can more gracefully handle interruptions to individual servers, or even groups of servers. This helps increase overall availability, yielding a more dependable solution.
Better speed By storing all database information in memory and reducing the overall amount of disk access, MySQL Cluster can significantly increase performance. In version 4.1.x, this is especially true for write-intensive applications; read-intensive applications might see degraded performance. Version 5.x will introduce numerous features to address this problem.
Consistency MySQL Cluster delivers enhanced consistency because updates to one data node are reflected nearly instantaneously on all other data nodes within the cluster.
Transaction reliability As you saw in the preceding item, the fact that updates to one data node are reflected elsewhere translates into more reliable transactions. Unlike replication, in which there might be some latency between an alteration on the master and its reflection on slaves, a clustered environment should see its transactions created and executed in multiple places at once.