How to Identify Bottlenecks

Given a high enough load, any software application will start to exhaust some hardware capacity of the system. Hardware bottlenecks can generally be categorized as CPU, memory, disk, or network capacity limits. Lets examine each of these further, starting with the most common bottlenecks.

Disk Bottlenecks

Relatively speaking, disk I/O is the most overused system resource. Not only will an application read and write data from the disk, so to will the operating system. All modern operating systems employ "virtual memory" techniques. Virtual memory works by moving data from physical memory to a special disk area reserved for use as "virtual memory," also called swap space. Thus many disk bottlenecks may actually be memory bottlenecks in disguise. These will be saved for later discussion. The best way to architect your software application with respect to disk sizing is to recognize the performance and capacity attributes of disks and allow for the correct amount of each.

Every disk drive has a raw data capacity. Calculating capacity is generally the easiest part of disk sizing. However, remember that a 9 Gbyte disk may not give you 9 Gbytes of usable storage. For starters, the operating system may reserve up to 10% of a disk drive for file system use. Secondly, a RAID file system layout will tack on another 20% to 100% overhead depending on the type of striping, parity, or mirroring used.

Every disk drive also has a throughput capacity and is connected to a controller with its own throughput capacity. Consider a modern disk drive with a 5 Mbyte/second throughput. An Ultra-SCSI controller is rated at 40 Mbytes a second and can physically connect up to 15 disk drives. A 40 Mbyte/second controller, obviously, cannot handle 15*5 = 75 Mbytes/second, which is what would be generated by 15 drives all running at full throughput. If your application is data intensive and requires high throughput, then you need to closely consider the number and type of both disk drives and controllers.

Another factor limiting disk performance is the number of random I/O operations a disk can perform per second. This is controlled by a combination of the disk platter RPM speed and the disk read/write head seek time. Applications such as online transaction processing typically stress a disk's I/O operations per second capacity long before total storage capacity or throughput capacity are reached.

While performance attributes of SCSI disk drives is fairly well understood , newer technologies such as FC-AL (fiber channel arbitrated loop) introduce their own unique performance variables into the equation. For instance, one of the reasons for implementing FC-AL technology is that the throughput is more than twice as fast (100 Mbyte/second) than Ultra-SCSI. Another benefit of FC-AL technology is that the fiber channel loops are not limited to Ultra-SCSI's 15 drives. FC-AL hubs and switches allow even greater numbers of drives to be connected in various combinations. It very quickly becomes simple to build disk subsystems that are so complex they are very difficult to diagnose once a performance problem arises.

In sizing disk requirements for your application, you therefore need to consider total storage capacity requirements, total throughput requirements, and total I/O operation capacity. Generally speaking, the larger a disk the greater its cost per I/O operation and the smaller a disk the greater its cost per megabyte of capacity. You will need to consider both when selecting the disk subsystem for your application.

CPU Bottlenecks

In purely CPU-intensive applications, CPU bottlenecks are easy to identify. On larger multiprocessor systems, however, CPU bottlenecks may not always be as readily identifiable. Most operating systems further break down CPU time into user application time, operating system time, wait time, and idle time. If a system is routinely spending more than 20% of its time executing the operating system, there may very well be another software bottleneck. On a multiprocessor system this will typically be a result of resource contention issues caused by multithreaded code. Similarly, large amounts of wait time are probably the result of a multithreading bottleneck either in an application or in the operating system. As opposed to idle time when the CPU has no jobs to run, wait time is measured as the time when a CPU has jobs to run but cannot run any because it is waiting for a shared resource that is currently allocated by another CPU. Chapter 19 on multithreading further explains these kinds of bottlenecks.

Memory Bottlenecks

Memory bottlenecks are often the hardest local problems to detect. One reason is because modern virtual memory systems will use memory as a cache and will keep data in memory as long as possible. Simply monitoring how much free memory a system has will tell you very little about its memory utilization. To correctly determine if a system has enough memory requires at least a basic understanding of the underlying operating system's virtual memory, paging, and swapping algorithms.

Network Bottlenecks

In today's network-centric environments, a system's network interface can easily be a cause of application performance issues. Consider that on most platforms, a single CPU can drive a 100 Mbit/sec network interface card at full speed. On larger, multi-CPU systems, therefore, either faster interfaces or more interfaces are often required. Assuming your network infrastructure can handle it, a faster network interface card such as gigabit ethernet or OC-12 (622 Mbit/sec) ATM may solve a network bottleneck. However, many corporate networks do not yet support such standards. Another approach is to spread the network load out among multiple network interface cards. This often works well on server applications that are serving multiple clients . There are two basic ways to utilize multiple network interface cards in a single server. The first method is simply to use subnetting in your network and connect a single network interface card to each subnet. If you need all the network capacity between hosts on a single network, trunking is another solution. In trunking, multiple network interface cards are assigned a single IP address and the operating system handles the interleaving of data between network interface cards. Trunking is also referred to as "fast ether channel" technology by Cisco.