Creating a Clustering Strategy | MySQL Database Design and Tuning

< Day Day Up >

With major clustering terms and concepts defined, it's now time to look at strategies and best practices you can employ to set up the right clustering environment.

This section provides an important discussion about the performance-related capabilities of the 4.1.x and 5.x series of MySQL Cluster, followed by some ideas on defining the right cluster topology.

Choosing the Right Version

As previously stated several times in this chapter, dramatic performance differences exist between different versions of MySQL Cluster. Perhaps the easiest way to view these differences is as follows: Version 4.1.x delivers high availability but might hurt performance, whereas version 5.x will address these performance issues through a collection of enhancements. These enhancements include the following:

Parallelism MySQL 5.x will do a much better job of leveraging data nodes to work in parallel, retrieving information much faster than before.
Data node based filtering Prior to version 5.x, MySQL returned raw results to the SQL node, which, in turn, filtered these results. Newer versions will do a better job of filtering these results on the data node itself; this will greatly reduce expensive network traffic.
Better range queries In Chapter 6, "Understanding the MySQL Optimizer," you saw how the version 5.x series was able to utilize multiple indexes to retrieve results faster. The same holds true for queries across a cluster.

In spite of these performance challenges with version 4.1.x, it's still worthwhile to explore MySQL Cluster, if for no other reason than to gain experience and prepare for upgrading to version 5.x.

Cluster Topology

Whenever configuring a distributed computing environment, it's natural to wonder how to allocate servers, or in the case of MySQL Cluster, hosts and nodes.

This is yet another example of the site-specific nature of performance tuning: What works for a read-intensive search engine might be anathema to a site that processes thousands of data-modifying transactions per second. Administrators must strike a balance between the high-availability benefits brought about by deploying numerous redundant data nodes and the added network costs that these extraneous nodes incur.

With that said, consider a few suggestions that should be appropriate in the majority of situations in which MySQL Cluster is deployed.

Define at least two management nodes As you just saw, the management node is responsible for critical oversight of the MySQL Cluster; defining an additional node provides supplementary protection should the first management node fail.
Define at least two data nodes One technique used by MySQL Cluster to achieve high availability is to break tables into fragments and then distribute those fragments among multiple data nodes. This is only possible if enough data nodes have been defined.
Define enough SQL nodes to support your workload Recall that the SQL node is simply a standard MySQL database server that is able to leverage the added capabilities of the NDB storage engine. Although these nodes don't have the intense workload of the data nodes, it's still important to configure enough of them to support the expected demands from your clients. Note that in future versions, additional processing work might be pushed down to the data node level from the SQL node; it's likely that this will change the ideal ratio among nodes.

< Day Day Up >