How Does Clustering Work? | MySQL Database Design and Tuning

< Day Day Up >

Before beginning an exploration of how to employ this technology to enhance performance, it's worthwhile to spend some time defining some key clustering terms and concepts.

Although you can visualize a MySQL cluster in many ways, at its core it is simply a group of computers all working together to provide you with information as quickly as possible, with much higher levels of availability and failure resistance than a single database server could offer. Each computer fulfills at least one of three specialized roles, which are discussed momentarily.

Nodes

In the context of a MySQL cluster, a physical, tangible computer is known as a "host." The role that it plays is described as a "node," which is a logical rather than a physical concept. Three classes of node are available, as described in the following three sections.

SQL Node

Previously known as client, or API node, this is merely a MySQL database server, just like you use for all of your other MySQL needs. The difference here is that this database process uses a specialized storage engine (NDB), which is responsible for the physical storage of information on another type of server: the data node, which may also be referred to as the "storage" node. In effect, the SQL node has outsourced the task of looking after data to the data node.

Clients (such as enterprise applications, query editors, tools, and so forth) continue to contact the SQL node to serve their data needs, which, in turn, looks to the data node for help. To these clients, however, there is no difference between a "normal" MySQL installation and a clustered installation, except that information access should be more reliable when clustering is employed.

Data Node

This node, alternatively described as the storage node and formerly known as the database or DB node, has the important job of keeping information in memory as well as periodically making any data alterations permanent on disk. As you have seen throughout the book, any time you can take advantage of in-memory processing usually means better performance: This is a key benefit of clustering.

In addition, data nodes work together to keep redundant copies of information on different nodes. This also serves as one of the cornerstones of clustering, and helps give it its high availability and fault-tolerant reputation.

Management Node

This node has the least glamorous job: administration. It is responsible for launching and shutting down all other types of nodes, keeping track of their configuration, logging data alterations, and backing up and possibly restoring information.

One task that is not performed during communication among the nodes is security and permission checks; these still happen within the SQL node, however. This distinction is important, and serves to highlight the assertion you will see shortly: Clustering should happen on its own dedicated network.

Shared-Nothing

This term does not refer to the communal attitudes and beliefs of a group of stingy loners; instead, it accurately describes the clustering architecture scenario in which each node runs on its own separate host. The host is dedicated to this node and is not shared by any other nodes. Although not mandatory, it offers the highest degree of protection from unanticipated problems: The loss of one node does not translate into a catastrophic failure unless that node was the only one of its kind.

Cluster Clients

Whether written by you using one of the many MySQL-friendly languages or connectors, purchased commercially, or acquired via open source, these standard database-accessing applications don't need to know that they are addressing a cluster. At the end of the day, these cluster clients are the ultimate beneficiaries of your investment in a MySQL cluster, reaping the performance and reliability gains from your distributed computing strategy.

Storage Engine

Chapter 4, "Designing for Speed," described the specialized roles that each MySQL storage engine plays. In the case of MySQL Cluster, the NDB storage engine is the only choice. This engine is designed for distributed processing, and provides full transactional support.

Transporter

Now that you've seen what role each node plays along with where the cluster client fits in, you might wonder how these nodes are made aware of each others' existence, and how they communicate. This is the job of the transporter, which is simply the designated communication protocol used to transmit information within the cluster. Note that the transporter is not responsible for communication between the SQL node and actual clients: This is a separate configuration setting.

There are currently four varieties of the supported transporter, briefly summarized in the following sections. Configuring these transporters is reviewed a little later.

Shared Memory

Although beneficial from a performance perspective but not desirable from the viewpoint of fault tolerance, it is indeed possible to have a single host run multiple types of nodes. In these kinds of circumstances, MySQL Cluster is able to employ a shared memory transporter, which provides for extremely fast internode communication.

TCP/IP Local

If multiple nodes are running on the same computer, you can elect to use the TCP/IP protocol as the transporter for local communication among these nodes.

TCP/IP Remote

Naturally, TCP/IP support is also available to serve those clusters when the nodes are spread among multiple hosts. It should be noted, however, that node-to-node communication in the 4.1.x series of production servers is not optimized; this will be much better in the 5.x series of MySQL products.

Scalable Coherent Interface (SCI)

This new technology shatters the 100-MBPS speed limit; computers can now communicate at speeds up to 10 times faster. This speed comes at a price, however: Additional hardware and more complex configuration need to be in place before this transporter is ready for your nodes to use.

Data Distribution

One technique that the NDB storage engine and MySQL Cluster use to greatly boost fault tolerance and availability is to automatically spread data among multiple data nodes. The next section explores how this information is allocated.

Fragment

To facilitate this data distribution, tables are divvied up into chunks that are known as fragments in a process that is invisible to the administrator. These fragments (also known as partitions) are then available for distribution for redundancy's sake. In a moment, you will see how this works.

Replicas

After the NDB storage engine has created and populated fragments from a given table, these objects are then distributed among multiple data nodes, assuming, of course, that you have deployed more than one data node. These copies are known as replicas, and MySQL Cluster currently allows up to four replicas for a given fragment.

Checkpoints

In any distributed computing environment, there needs to be a mechanism to establish a commonly agreed upon system state. Without this mechanism, each computer would have its own interpretation of reality. Although this freedom might boost the computers' individuality and self-esteem, it's not likely to improve the mood of the users.

Fortunately, MySQL Cluster uses an event known as a checkpoint, which is responsible for ensuring consistency among the nodes participating in the cluster. There are two types of checkpoints, described in the following sections.

Local Checkpoint

A local checkpoint is responsible for guaranteeing that data alterations on a given node are, in fact, written to disk, making them permanent. A little later in this chapter, you'll see ways to help control the frequency of these events.

Global Checkpoint

Just as a local checkpoint's scope is the transactions within a single node, a global checkpoint is in charge of making sure that transactions across the cluster are in a consistent state.

< Day Day Up >