Clustering | MySQL Database Design and Tuning

< Day Day Up >

Recall Chapter 19's description of what went wrong when High-Hat launched its web-only airfare sale. The reduced number of servers could not handle the load. In fact, some crashed, which led to severe performance and availability problems. At that time, you recommended using MySQL's clustering capabilities to help improve response and availability.

Although availability is definitely improved, users are noticing considerable query delays. From their perspective, not much has improved, so your recommendation needs to be revisited. This is not surprising to you because you know that every clustering situation is different and is subject to periodic optimization review.

Diagnosis

As you know from Chapter 17, "Optimal Clustering," MySQL Cluster employs a sophisticated network of specialized nodes; each node is dedicated to one of three roles:

Management As its name implies, this node is responsible for administrative and other tasks that are vital to the health of the cluster.
SQL This node, which services the actual users for the cluster, is nothing more than a standard MySQL server that happens to use the NDB storage engine to store its information on the data node.
Data This node looks after the actual information, including distributing redundant copies across the cluster onto other data nodes.

High-Hat elected to place each node on its own computer, or host. This is the safest approach from the perspective of high availability because a crash of a particular host won't take down more than one node.

To get an idea of what is happening in the cluster, you use the tools provided by MySQL to look for any error conditions or other anomalies. These include the management client (ndb_mgm) as well as evaluating the cluster logs. You also employ network management tools to get an idea about any potential bandwidth issues.

Observations

Poring through MySQL's clustering diagnostics doesn't show anything remarkable: Everything appears normal. However, monitoring the network's traffic levels presents a different picture.

Traffic appears to be directly correlated with the number of queries under way at one time. Even a simple query causes a spike in network traffic. When multiplied by the number of active users on the network during the workday, this leads to severely degraded performance.

Solution

This is an all-too-common case in which there is no "magic bullet" that will solve all problems. The fundamental issue is that in version 4.x, MySQL and MySQL Cluster still contain components that are not fully "distributed-aware." For example, there is currently no filtering on data nodes. This means that running a simple query from a SQL node causes one or more data nodes to send all rows to the SQL node for filtering. This is very inefficient: Imagine the extra traffic from the following basic, primary key driven query that is run against a three-million row table:

 SELECT * FROM customer_master WHERE customer_id = 129212;

In this case, the data nodes gather and send all rows from the customer_master table back to the requesting SQL node. The SQL node, in turn, filters out the 2,999,999 extraneous rows, returning the one correct row to the requesting client. Version 5.x of MySQL Cluster will feature data node based filtering, which should go a long way toward improving query performance. The same holds true for MySQL Cluster's parallelism capabilities, which are used to take advantage of the parallel processing power afforded by the multiple computers that usually make up a cluster.

However, you need to make some recommendations given the current configuration and software version, so they are broken out as discussed in the following sections.

Node Realignment

Given that version 4.x sends massive amounts of query-driven traffic between the data and SQL nodes, why not put the two nodes on the same computer? That way, this communication occurs in shared memory. However, the price of this transformation is paid in availability: A host failure now takes out two nodes rather than one.

Transport Alteration

Chapter 17 described the various transports available for MySQL Cluster traffic. One interesting new transport is Scalable Coherent Interface (SCI), which uses new hardware to improve network communication speed by up to 10 times. This is still slower than shared memory, but allows High-Hat to maintain a separation of nodes onto dedicated hosts.

Using Replicated Slave Servers for Queries

The potentially least-intrusive tactic is to direct query-hungry users and applications to a replicated server. Remember that MySQL Cluster and replication are not mutually exclusive: They can both coexist in the same environment. These read-only operations could take place on a replicated slave server, which would deliver all of the built-in performance benefits offered by MySQL; users would still need to make their data updates elsewhere, however.

< Day Day Up >