Before we get into performance tuning, we need to discuss what performance means, as it can have different meanings to different people. Many different metrics can go into measuring performance and scalability, but the overall definition we use in this book is "ensuring that every client's needs are satisfied as quickly as possible." Beyond this definition, we can talk about many other related topics as well, such as response time, throughput, and scalability.
Response time indicates how long someone has to wait to be served. There are some different metrics based on this that are worth monitoring. First is maximum response time, which indicates how long anyone has to wait in the worst-case scenario. The second is the average response time, which is useful for knowing what the average response time is. Keep in mind that response time normally includes two different pieces of the time for a response: wait time and execution time. Wait time, or queue time, is how long the client has to block (that is, wait in a queue for access) before execution can begin. This wait occurs normally due to lock contention. Execution time is the amount of time actually spent running (that is, executing) statements.
Throughput is a metric that measures how many clients and requests you serve over some time period. Typically you might see this measured as queries per second or transactions per minute.
Finally, scalability is how well the previous two values can be maintained as the overall load increases. This increase in load might be tied to the number of database users, or it could possibly be independent of that. Sometimes, multiple dimensions of load increase at the same time. For example, as the number of users increases, the amount of data and the number of queries may both increase. This is often one of the hardest pieces to be sure of, due to the fact that it can be difficult to simulate the exact effects of these increases without them actually occurring.