5.3 Common Performance Problems

There are three common disk performance problems: I/O skew, disk overloading caused by paging, and unexplainably high service times on idle drives .

5.3.1 High I/O Skew

It is easy to overload a disk with too many accesses . A condition called high I/O skew exists when some disks have much higher loads than others. It is hard to overestimate the impact of excessive resource utilization. It is easy, if subjective , to diagnose high I/O skew: just use the iostat tool, discussed in greater detail in Section 5.5.4 later in this chapter, and look for a wide range in the sevice times and busy percentatges of the disks in the system.

The solution to this problem is very simple in theory: spread the workload over several disks. There are quite a few ways of accomplishing this:

  • Split the workload on functional grounds. For example, if the overloaded disk contains two large databases, move one to another disk. This assumes a stable workload and isn't often appropriate in the real world.

  • Split the workload by users. If the overloaded disk contains the home directories for both the engineering group and the documentation group, move one group to another disk. As with a functional split, this assumes a stable workload.

  • Allocate disk space on a round robin basis. This works well, but requires application-level support; the Solaris kernel uses this method to distribute accesses across multiple swap spaces.

  • Concatenate several disks. This method tends to overload a single "hot" disk while keeping the rest of the data relatively available for use.

  • Configure several disks into a stripe (a RAID 0 array; we'll discuss this in greater detail in Section 6.2.1). The interlace size will largely determine performance: too large of an interlace will not be sufficient to spread the data across multiple disks, and too small will cause small accesses to activate multiple disks. Stripes cannot be grown online, however, which limits future expansion, and they are vulnerable to disk failure.

  • Configure multiple disks into a combined mirror/stripe array. This is the fastest , safest storage configuration, but can be very expensive.

  • Configure a parity-protected array (typically RAID 5; see Section 6.2.6 in Chapter 6). This is cheap and safe, but may suffer substantial performance penalties.

There is no easy solution to this problem. It depends on the specifics of your installation and your budget.

5.3.2 Memory-Disk Interactions

The idea behind paging and swapping -- freeing memory by using disk space as a backing store -- in essence transforms an immediate memory problem into a possible disk problem. For more details on the mechanics of paging, see Section 4.3. Memory is far faster than disk, and in a situation where the system is forced to use paging space heavily, the relatively slow disk subsystem can create a serious bottleneck. Ideally, paging space should be located on an isolated controller and a fast disk that is not being used for anything else.

If you are using multiple swap areas, the Solaris kernel provides a simple, automatic, and effective round robin mechanism that distributes this I/O across multiple disks. The Linux kernel performs a similar optimization across swap areas of equal priority. This /etc/fstab file will result in a three-way stripe of swap space in Linux:

 /dev/sda1       swap      swap     pri=1 /dev/sdb1       swap      swap     pri=1 /dev/sdc1       swap      swap     pri=1 

Mirroring swap space may provide substantial reliability improvements, however. This reliability improvement does exact a toll on performance, in the form of more writes being issued.

5.3.3 High Service Times

Sometimes disks do very odd things. One of these is that lightly used disks often have extremely large service times (hundreds of milliseconds ). The disk can't possibly be overloaded, so why does this happen? This situation has been conjectured to be caused by rounding errors at low activity levels, thermal recalibration, or problems with the filesystem layout. However, the problem was analyzed in some detail by Adrian Cockcroft of Sun Microsystems, and the root cause is the fsflush or bdflush process flushing memory to disk (see Section 4.4.2). Although it looks unusual, it is not a problem.



System Performance Tuning2002
System Performance Tuning2002
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 97

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net