As any good English dictionary will tell you, performance has several possible meanings. In computer science, performance most often refers to the functioning efficiency of software and can relate to both speed and scalability. There are established standards and benchmarks to measure and compare the performance of products and services. Why care about performance? Well, consider your job performance; assuming you are an employee, your job review and, therefore, salary raise and bonus, will hinge on how well you do your job. In order to get the best work out of you, your manager needs to know what your interests are, what motivates you, and then assign appropriate tasks to get the maximum performance from you. Your manager will be rewarding you for your performance. Usually in the currency you like most, cash.
It follows that if you are a data warehouse designer using Analysis Services you need to know how to get the best performance from the system so as to satisfy the customers. Just like your boss can push certain buttons to motivate you, Analysis Services provides various parameters that can be set to achieve maximum performance. As for server products such as Analysis Services, one can attribute performance results to how well server properties are tuned in the context of speed and scalability requirements.
The graph shown in Figure 13-1 is a typical server scalability graph. The query throughput lines show the server throughput (queries served per minute) as more users are using the system concurrently for two different hardware configurations. For Hardware 1, up to about 50 users, the server throughput increases linearly. That means the server has sufficient resources to support 50 concurrent users. Then after about 50 users, the throughput starts to flatten out. In the 50-to-100 user range, the server doesn't have enough resources (CPU or memory or disk) to serve requests of all concurrent users. In this circumstance, some user requests would be queued in the system request queue to keep the system from slowing down all the user requests. At about 100 users, the system is running at maximum capacity. The curve flattens off at high loads because internally, the server only executes a few queries concurrently, and queues the rest. This is so that with many outstanding queries, new users can still get reasonable response time while connecting to the server and executing non-query commands.
From the perspective of a user, when a server is under an extreme load with many outstanding queries, some of the execution time is spent waiting in the queue, and some time is spent actually working on the query. You might compare it to waiting in line for a food in a crowded baseball stadium. When the load is high, the wait time can easily exceed the time to actually do the business. Hardware 2 has better resources (CPUs, memory, network) as compared to Hardware 1. Hence if you run the server on Hardware 2 you obviously get a better throughput since the saturation to maximum users occurs at about 200 users.
If you have a well architected server then Figure 13-2 shows the average query response under load. The average response time starts to increase, and eventually the average response time will increase linearly. If your system only needs to support 10 to 50 members, you don't have to do anything. But if your system needs to support 100+ users, you need to identify the system bottlenecks. Typically servers will expose performance monitoring counters to expose internal values. You can use the task manager and performance counters to identify bottlenecks; whether the system is CPU-bound (such that the Server CPU is pegged at 100%) or memory-bound (memory usage is constantly maxed out) or disk-bound (reads or writes to disk). By removing the system hardware bottleneck and adding more CPU or memory to the server, you should be able to get performance improvements and support more concurrent users as shown in Figure 13-2 for Hardware 2.
Note that while we expect to see the general shape of the curves described above, the limits will be different for your hardware, your cube design, the queries your users execute, and the frequency with which queries are executed. With respect to Analysis Services, in a typical query scenario, CPU, memory, and disk read speed can all be potential bottlenecks for your system. In a cube or dimension processing scenario, CPU, memory, disk writing speed, and network speed between the relational source and Analysis Services can all be candidate areas for system optimization work.
For Analysis Services performance we refer to four main areas — database design and processing, querying, and configuring Analysis Services (setting properties). The first involves design optimization to facilitate the optimized processing of dimensions and cubes. The second relates to optimization to speed up MDX queries. The third involves fine tuning Analysis Services or appropriate hardware acquisition based on requirements. Does performance imply both scalability and optimization of Analysis Services? Depending on whom you ask, one or both are true. The bottom line is that you need best query performance with regard to your OLAP cubes. That is true regardless of size, and however you can get it that doesn't involve the violation of federal or state laws is fine. In the next section you will learn some of the design techniques that can help in improving your Analysis Services performance.