8.1. A Simple Process for Tuning Your First Deployment
You've built your application, created all your unit tests, and set up a staging server. After months of development, you're now ready to put the thing out in the wild and watch it sink or swim.
To give your application a fighting chance, you'll need to get it properly installed and working really well, then you'll want to tune it. "Tuning" historically meant "apply a list of tips and it'll be fast." We're changing tuning to mean "measuring and applying changes to make statistically significant improvements in measured performance." Let's break this definition down into its pieces to make it clearer:
8.1.1. Setting Your Goals
Performance is to programmers what a room full of gold-coated slot machines is to gambling addicts. They will sit in front of a program tuning it to the complete detriment of the project. What's worse is this happens to everyone. The second you start talking about performance, every programmer becomes interested. It's a sickness.
You must have a goal to escape this trap. Without a target performance goal that is easily measurable, you can spend weeks working on a problem that is actually not possible to solve. Imagine if you spent a week working on tuning a deployment and only three people actually use it. Or worse, you tune it and only after spending a week on the problem you realize that it'll never be fast enough.
There are a few key components to your goal, though:
"Measurable" means that you can equate the goal to some metric that you can collect. Requests per second (req/sec) is usually the best one. No matter what people tell you, it is not possible to measure "users" for performance. You can have 10 users that are massive porn hounds and download at 1,000 req/sec for weeks solid and you can have 10 million users that visit once a week each with 1 req/sec.
"Exactly defined" means that you have to use numbers. It has to be a goal that's defined in terms of req/sec numbers, bytes/sec transfer, etc. Not using numbers means you have no idea whether your measurements are getting you to the goal.
Finally, you can't just pull a goal out of nowhere. Management will typically do this, but you'll have to be ready to do your own analysis and go back to them and tell them it's not possible if their goals are unrealistic. Having solid metrics and numbers that are easily verified will help with this.
A very simple example goal is to simply state the number of requests per second the site should service in the best and worst cases. For example, "The site should consistently serve 120 req/sec for best case (fastest) actions /foo and /bar, and serve 55 req/sec for worst case actions /lamz and /flarz." With this kind of measurable goal, you can get started and then refine the goal as you go, adding more conditions or even reducing the expectations. It also means you can go back to management and tell them early on whether this is possible.
8.1.2. Gathering Your Tools
The preferred tool for many people is either httperf or Apache Bench (ab). The statistics and results of httperf are better defined and accurate, but ab uses much less resources during the test. We've found that ab is fine for quick pen tests (and it even aborts connections without you having to tell it), but that httperf is better for serious production measurement.
Tools to avoid are JMeter, Siege, or any tool that claims to test "users," produces graphs (these are generally useless), or doesn't include common statistical measurements like standard deviation or mean. In fact, if you run the tool once and don't see the words "mean" and "standard deviation" then reject the tool immediately. Without these basic statistical elements you can't even use the tool to see if one measurement is statistically better than a previous one.
8.1.3. Collect Baseline Data
Your next task is to simply collect a baseline of what you can expect for the very best possible performance. This is most easily done by taking your existing deployed applicationdeployed exactly how it will exist in productionand testing a simple "tester" action. I typically set up the following:
Hit each of these with httperf a few times and record that as the best possible performance you can get. The Mongrel handler is the top for possible performance; the Rails handlers are the tops for a plain action and an action with some database behind it.
It's also good to make sure this is what you expect, and then investigate any possible configuration problems that make these slow.
8.1.4. The Tuning Process
After you have your baseline your process is very simple for getting your application faster:
It's also important to re-run this process after each deployment to make sure that nothing broke during development or the last configuration change.
What's all this insistence on "statistically significant differences"? Humans are really bad at figuring out whether two things actually differ, especially when presented with large amounts of information. Statisticians have figured out clever ways to summarize the distributions of large sets of numbers and then use these "summaries" to determine if there actually was a difference worth mentioning. In science this is important because you can't claim to have found a new cure for cancer unless you have some significant results showing the drug actually makes a difference.
Most performance-tuning books never mention this part of statistics. A very few even mention standard deviation let alone mean, but I haven't found any that actually use common statistics significance tests to determine if one measurement is different from another. The main reason (besides stupidity and ignorance) is that talking about significance testing alone can eat up entire books, so we can't even cover it in this one.
If you use a tool like httperf and aren't inclined to learn the math needed to do a real significance test, then just look for changes that move the mean measurement consistently in the right direction. If a change is only a few percentages different, then it is probably not worth the effort and most likely isn't really creating a difference.