Benchmarking is a tool that, when used correctly, can help you plan for scalability, test throughput, and measure response time. When used incorrectly, benchmarking can give you a very wrong impression of these things.
In a good benchmark, you want to simulate your production environment as closely as possible. This includes things such as hardware available, software being used, and usage patterns.
The following sections describe some common problems with benchmarks.
Using the Wrong Data Size
When doing a benchmark, you need to have the same amount of data you plan on having in the system. Doing a benchmark with 50MB of data when you plan on having 10GB in production is not a useful benchmark.
Using a Data Set That Is Not Representative
If you are generating data, you need to make sure it isn't too random. In real life, most data has patterns that cause more repeats of certain data than of other data. For example, imagine that you have a set of categories for something. In real life, certain categories are likely to be much more common than other categories. If you have exactly the same distribution of these in your generated data, this can influence the benchmark.
Using Inaccurate Data Access Patterns
Using inaccurate data access patterns is similar to using a data set that is not representative, but it relates to data access instead. For example, if in your benchmark you are searching for "Latin dictionary" as commonly as "Harry Potter," you will see different effects. Some things are much more commonly accessed than others, and your benchmarks need to take that into account.
Failing to Consider Cache Effects
There are two ways failing to consider cache effects can affect your benchmarking. First, you can run a benchmark that makes overly heavy use of caches. For example, if you run just the same query over and over, caching plays a very big role in it. Second, you can do completely different things over and over, which reduces the effectiveness of caches. This relates to the previously mentioned concept of data access patterns. "Harry Potter" would most likely have a high cache hit rate, but "Latin dictionary" wouldn't make as much use of the cache, and this can be difficult to measure in benchmarks.
Using Too Little Load or Too Few Users
In order for a benchmark to be accurate, it needs to reflect the number of users who will be accessing your system. A very common problem with MySQL Cluster benchmarks is attempting to benchmark with only a single user (which MySQL Cluster is quite poor at due to bad response time, but it has great scalability and throughput).
Now that you know some of the problems with benchmarking, how can you work around them? One of the easiest ways is to benchmark against your actual application. For some application types, such as web applications, this is very easy to do. Two commonly used web benchmarking applications that are available for free are httperf (www.hpl.hp.com/research/linux/httperf/) and Microsoft Web Application Stress Tool (www.microsoft.com/technet/archive/itsolutions/intranet/downloads/webstres.mspx).For some application types, such as embedded applications, benchmarking is more difficult, and you may need to create a tool yourself.
The other solution is to use a benchmarking tool that can more closely mimic your application. Two tools in this category are Super Smack (http://vegan.net/tony/supersmack/) and mybench (http://jeremy.zawodny.com/mysql/mybench/). With both of these tools, you can more accurately represent what the query/user/data patterns are for your application. They require quite a bit of customization, but they can help you avoid some of the common pitfalls.