RAM Usage | Installation

MySQL Cluster is an in-memory database in version 5.0, and at the time of this writing, disk-based tables in version 5.1 are in the fairly early stages of testing and are unlikely to be in a generally available version, and therefore available for production use, for quite some time. This means that RAM usage is an important consideration when you are deciding whether MySQL Cluster is the software package for you. If you have an 80GB database, unless you have an extremely large budget, you are not going to be able to use a cluster in MySQL Cluster.

There are several important things to bear in mind about MySQL Cluster. The first is a general rule that your table will occupy significantly more space in NDB than in MyISAM, for the following reasons:

NDB is fixed width in versions 4.1 and 5.0, so if you have variable-width fields, such as VARCHARs, you will find that the size of the table increases dramatically. Version 5.1 will have true VARCHARs, however.
NDB stores more indexes than MyISAM. To start with, there is always a primary key in an NDB table; if you don't define one, NDB will create one for you (and hide it from view).

Memory Usage

If you are considering deploying a cluster in MySQL Cluster, in order to determine how many nodes you need and how much RAM you will need installed in each, you need to be able to calculate the memory usage per table.

The first step is calculating the amount of RAM that each row uses in each table. You then multiply the size of each row by the number of rows to get the size of each table. Then you add all the tables together to get the size of the database. Finally, you work out the exact memory usage on each storage node. To finish off, you can test your prediction and see how much RAM the cluster in MySQL Cluster is actually using.

If you have never actually thought about how much space each table occupies, as many administrators do not, this process can be quite daunting. Something that confuses many new administrators and that you should always bear in mind is the difference between bits and bytes: There are 8 bits in 1 byte. In almost all cases, you will be dealing with bytes.

Calculating the RAM Usage of a Table

It is important to remember that MySQL Cluster does not support variable-width fields, which means all rows occupy the same space. This means that VARCHAR fields are actually stored as CHAR fields internally (note that this should change in version 5.1). Many poorly designed tables have massive VARCHAR fields (for example, VARCHAR(200) for email addresses). This is because when using storage engines that support variable-width columns, such as MyISAM, it is easier to make the field far too large than to risk making it too small. However, in MySQL Cluster, such tables take up a colossal quantity of RAM. It is always worth attempting to reduce the size of variable-width fields if at all possible.

The calculation of the memory requirement for NDB tables follows the same principles as for the other table engines, with some differences. We now list the memory requirements for different types of fields.

In the following tables, M refers to the defined size of the string field (for example, in a VARCHAR(200) field, M = 200.) Note that character sets are 1 to 3 bytes in size. If you are unsure of how big your character set is, you can find the information by using the SQL command SHOW CHARACTER SET and viewing the Maxlen field.

Table 1.2. String Types
String Type	Size
`VARCHAR(M)`	M x character set size + 2 bytes, rounded up to next multiple of 4
`CHAR(M)`	M x character set size bytes, rounded up to next multiple of 4
`BINARY(M)` plus derivatives	Same as `CHAR` or `VARCHAR` for `VARBINARY`
`ENUM('value1','value2',...)`	1 or 2 bytes, depending on the number of options
`SET('value1', 'value2',...)`	1 to 8 bytes, depending on the number of set members
`BLOB, TEXT` plus derivatives	If M < 256, then M; otherwise, (M - 256) bytes, rounded up to the next multiple of 2,000 plus 256 bytes

Table 1.3. Numeric Types
Integer Type	Size (Round Up to Even 4 Bytes)
`TINYINT`	1 byte
`SMALLINT`	2 bytes
`MEDIUMINT`	3 bytes
`INT, INTEGER`	4 bytes
`BIGINT`	8 bytes
`FLOAT(p)`	4 bytes if 0 <= p <= 24; 8 bytes if 25 <= p <= 53
`FLOAT`	4 bytes
`DOUBLE [PRECISION], item REAL`	8 bytes
`BIT(M)`	Approximately (M+7)/8 bytes
`DECIMAL (A, B)`	Complex; see http://dev.mysql.com/doc/refman/5.0/en/storage-requirements.html

Table 1.4. Date Types
Data Type	Size (Round Up to Even 4 Bytes)
`DATE`	3 bytes
`DATETIME`	8 bytes
`TIMESTAMP`	4 bytes
`TIME`	3 bytes
`YEAR`	1 byte

To calculate DataMemory, you first need to work out how many bytes the fields are going to use. Then you add a fixed row overhead, which is 16 bytes, or 12 bytes if you declare all your columns as NOT NULL. Then, for each ordered index, 10 bytes of storage are used. You get an ordered index for each INDEX you define. In addition, you automatically get one ordered index for each primary key and each unique key you specify, unless you tell it not to create one in the CREATE TABLE syntax. Remember that you can change tables at a later time by using ALTER TABLE or CREATE/DROP INDEX commands within MySQL; this, of course, will change your memory allocation.

However, this is not the complete story if you are trying to work out how much RAM you are going to use; you must take into account the storage space used by primary keys. Each primary key (if you have defined it) or hash index (which NDB automatically creates if you have not defined a primary key) occupies 25 bytes of storage (25 bytes + the size of the key in MySQL 4.1). You add this number to the number you calculated earlier and write this number down as the size of each row.

In addition, for each unique constraint in your schema, other than primary keys, an additional hidden table is created to enforce the constraint. In this hidden table, there are at least two columns. The first columns are the ones that you declared as unique. These columns are treated as the primary key in the hidden table. The other columns in this table make up the primary key from the base table. This hidden table has all the normal space requirements as a regular table (that is, 12 bytes overhead per row). You can see that this causes unique constraints to be quite a bit of extra size overhead. However, this setup does allow for additional speed in accessing constraints, which is discussed further in Chapter5, "Performance."

You then have to consider the way that NDB stores information: in pages. All data is stored in pages (a page is a certain part of RAM), and each page holds 32,768 bytes of data. Each page of data can store data from only one table (or hidden table). You therefore have to work out how many rows you get per page, which tells you how many pages you need. You divide 32,768 by the number you produced earlier and round the number down. This tells you how many rows you will get per page of 128 bytes.

You next divide the number of rows by the number of rows per page to get the number of pages you need. This is important because you will need to add a fixed overhead per page.

You can now calculate the total memory requirement, in bytes, for one copy (replica) of your table:

(Size of each row x Number of Rows) + (Number of pages x 128)

You repeat this calculation for all tables in your cluster to get a figure for the total memory requirement for one copy of your tables.

Of course, you also want to know how much RAM you need on each storage node. You use the following calculation for this:

(Total memory requirement for one copy of your tables x Number of replicas[3] / Number of storage nodes

[3] You will come across this parameter in detail in Chapter2. It determines how many copies of each piece of data the cluster holds. 2 is a typical value.

You divide this figure by 1,024 to get kilobytes and finally by another 1,024 to get megabytes.

Remember that this calculation gives you an estimate for the RAM required for data storage. More RAM is required for buffers, temporary storage, and so on; exactly how much depends entirely on the environment, so you need to experiment.

A Sample Calculation of the RAM Required for a Table

Because calculating the RAM usage of a table is so complex, this section gives a sample calculation for the following table, which contains both integer and string types, as well as variable- and fixed-width fields:

mysql> desc testtable;

+-------+---------------+------+-----+---------+-------+
| Field | Type |Null | Key | Default | Extra |
+-------+---------------+------+-----+---------+-------+
| id | int(11) | | PRI | 0 | |
| name | varchar(60) | | | | |
| email | varchar(80) | | | | |
| sex | enum('M','F') | | | M | |
+-------+---------------+------+-----+---------+-------+
4 rows in set (0.00 sec)

This example assumes that the table uses latin1 (so the size of the character set is 1 byte). It also assumes that this table has 500 rows.

The first step is to calculate the size of each row:

Each INT field occupies 4 bytes.
Each VARCHAR(60) field occupies 64 bytes (60 + 2, rounded to next multiple of 4).
Each VARCHAR(80) field occupies 84 bytes (80 + 2, rounded to next multiple of 4).
Each small ENUM field is 1 byte, rounded to next multiple of 4.
There are two indexes, so there is a 20-byte overhead per row.

This makes the total size of each row 176 bytes.

You can now divide 32,768 (the maximum size of a page) by 176 to get 186.2, so you know that each page will hold 186 rows. You can therefore work out that for 500 rows, you will need 3 pages. You can now work out the extra overhead of 128 bytes per page: 384 bytes.

Therefore, the RAM required for data storage will be (176 x 500) [Size of row x Number of rows] + (16 x 500) [Number of rows x Fixed overhead per row of 16 bytes] + 384 [Page overhead]. This comes out to 96,000 bytes, or 94KB.

You now have to work out the RAM requirement for indexes, which is easy: 25 bytes per row, so in this case, 12,500 bytes, or 12.2KB.

This gives you the quantity, in data and index RAM requirement, of this one very small, very simple, table. Of course, you need to take account of your total number of nodes and number of replicas; if you have four nodes and two replicas, you divide these figures in half to work out the storage requirement per node (because each node has half of each table).

Of course, there are significant overheads, and it is always a good idea to allow at least an extra 10% for temporary storage and so onand for a small table, you should add a lot more.

Automatic Methods of Calculating RAM Requirements

At the time of this writing, MySQL has just released a script written by Steward Smith, ndb_size, that aims to calculate memory usage for a table automatically. We anticipate that by the time this book is published, this script will be included with MySQL. However, if it is not, you will be able to download if from this book's website, www.mysql-cluster.com.

The idea is that you run the ndb_size script on a table before you convert it to NDB so that you can tell if you have enough RAM and get a good idea about what values to set for several configuration options.

To use this script, you need to download the script ndb_size.pl as well as the HTML template ndb_size.tmpl. You need the following Perl modules installed: DBD::mysql and HTML::Template. After you download the script, you should execute it with the following parameters:

[root@host] perl ndb_size.pl db-name host user password

It will create an HTML file that you can view in any web browser.

For example, to calculate the size of the test database on the localhost where the password for the root account is mypass, you would use this command:

[root@host] perl ndb_size.pl test localhost root mypass > /path/to/file.htm

We do not repeat the output here, but the output is split into sections, all of which are useful. The first section, "Parameter Settings," gives estimates of what you should set the parameters in config.ini to. These will tend to be low but give a pretty good estimation. If you want to calculate your RAM usage, you add up the first two lines in the table (DataMemory (kb) and IndexMemory (kn)) and ignore the rest.

The rest of the report gives a breakdown of memory usage by attributes (tables, indexes, and so on) and then a further breakdown by table.

It is worth noting that the MySQL Cluster version 5.1 figures may change by the time that version is released.

Using MySQL Cluster on 64-Bit Operating Systems

So far we have concentrated on standard systems. Over the past 18 months, the hype surrounding 64-bit processors has increased massively, and many new servers are now able to run in either 64-bit or 32-bit modes.

From the point of view of MySQL Cluster, 64-bit processors have a massive advantage: They can address more RAM. If you have a 32-bit system, you have two problems: First, there is no possibility of the system addressing (that is, using) more than about 4GB of RAM, and second, the system will generally fail to start if the operating system attempts to allocate more than a few gigabytes of RAM in one chunk (for example, to a storage node), with an error such as this:

Message: Memory allocation failure
Fault ID: 2327
Problem data: DBTUP could not allocate memory for Page

The obvious solution to this is to upgrade to 64-bit hardware and a 64-bit operating system. However, if you want to address the full 4GB of RAM and keep your 32-bit hardware, you can simply run multiple NDBD nodes on the same machines, with DataMemory plus IndexMemory for each ndbd process set to the maximum that you can safely address (say, 2GB). If you do this, you must specify which node group each storage node is a members of, as explained earlier in this chapter, in the section "An Example of a config.ini File." You must ensure that you do not have all the nodes in any node group running on the same physical machine; otherwise, failure of that machine will take the whole cluster down.

Calculating the DataMemory and IndexMemory Parameters

You will see in Chapter2 that there are actually two separately configurable options that control RAM usage: DataMemory and IndexMemory. DataMemory controls everything except the primary key or hash index that every single row has. (If you do not create a primary key, NDB will create a hash index for you and hide it.) So DataMemory includes other regular indexes. IndexMemory can be calculated as 25 bytes per row for the primary key and 25 additional bytes for each unique key (due to the hidden table). Additional hidden tables created by secondary hash indexes use DataMemory, not IndexMemory.

Reducing Memory Usage

If you discover that your current table structure is too large to fit viably into RAM, you have several options. The first, and often most viable, is to normalize the data to eliminate duplicate data. For example, say you have a table like this:

PEOPLE
ID INT(2)
Name VARCHAR (200)
Favorite_color VARCHAR (100)

You might find that you have perhaps five colors, which make up 95% of the colors that people select. If you have, say 1,000,000 rows, you could save yourself a vast amount of RAM by changing the structure to two tables:

PEOPLE
ID INT(2)
Name VARCHAR (200)
COLOR_ID TINYINT (1)

COLORS
COLOR_ID TINYINT(1)
Favorite_color VARCHAR (100)

You can then make queries with a simple join.

The second commonly used trick is to reduce the size of variable-width fields that are often set too large. For example, tables often have a very large number of VARCHAR(225) fields defined because whoever produced the table wanted to err on the side of caution, and it worked at the time. However, often the largest field is actually much smaller than thatperhaps 50 bytes. In table engines that support variable-width fields, this is not a problem, but as we have seen, NDB not only does not support variable-width fields (so each VARCHAR(225) field is stored as a CHAR(225) with an additional overhead) but it also stores at least two copies of each piece of data.

To illustrate this point, if we have VARCHAR(225) fields that have 100,000 rows with an average field length of 50 bytes, the following gives an idea of the size the field will use:

MyISAM 0.6MB
NDB 2.7MB x NumberOfReplicas (+ Overheads)

This represents a 350% increase from MyISAM to NDB before you account for the fact that NDB requires at least two copies of each piece of data. If you started off with a very large MyISAM table with lots of these VARCHAR fields, you could require such a ridiculous amount of RAM that the cluster in MySQL Cluster would cease to be a viable and cost-effective solution.

The best solution to this problem is to go back through the applications that use the database and establish what the actual maximum acceptable size is (if they have any validation before inserts or updates) and either reduce it to something sensible or add validation at a sensible level. You can then change the table structure to this new lower size, which makes an enormous difference.

Installation

Configuration

Backup and Recovery

Security and Management

Performance

Troubleshooting

Common Setups

A MySQL Cluster Binaries

B Management Commands

C Glossary of Cluster Terminology

C Glossary of Cluster Terminology

Index