|
As we all know, server-based applications put a priority on speed. Every operation you undertake has to be carefully considered, because the longer something takes, the less scalability you'll get out of the application. Many of us are working on applications that have to handle thousands—or even millions—of individual requests, and any extraneous work your application has to manage can have huge consequences. What makes a server application even more "fun" is that it's highly multithreaded, so sometimes it's not even obvious where an application is killing performance.
It's tough enough to write server applications, but it's doubly hard to debug them. You can observe client applications with your own eyeballs to see some bugs, especially performance bottlenecks. However, with server applications, you're left groping at a blob of code that is running in the dark recesses of memory and that you can see only tangentially in response times and by poking at it with tools like debuggers and PerfMon, which show you only snippets of the overall health of the application. To make matters even worse, none of the nontrivial bugs you'll encounter will ever show up in the nice controlled environment of your QA systems; they show up only in the jungles of production.
To debug these server applications, everyone resorts to the old standby of tracing. This is the only way you'll be able to see the big picture, especially on live production systems. I've worked on applications that have some of the most incredible tracing systems you could ever imagine. A lot of thought was put into them so that the developers could truly see what was happening in production, giving them a good chance of finding and solving bugs.
Unfortunately, the balance between "debugability" and performance is extremely delicate when it comes to server applications. In fact, several times in our consulting business, we've been called in to work on performance problems and have run into the situation where the tracing system used by the team is the performance problem. What makes these scenarios even more interesting is that in the cases in which the tracing system was the bottleneck, the teams didn't even realize it.
Since I've had numerous opportunities to wrestle with poorly performing tracing systems, I wanted to solve the problem once and for all. For this chapter, I wrote a tool, FastTrace, which should allow you to have as much tracing as you'd like without killing your performance. Before I discuss using FastTrace and its implementation, I want to explain why most of the tracing techniques that people use are such a problem.
|