15.5 Disk I/O Performance IssuesDisk I/O is the third major performance bottleneck that can affect a system or individual job. This section will look first at the tools for monitoring disk I/O and then consider some of the factors that can affect disk I/O performance. 15.5.1 Monitoring Disk I/O PerformanceUnfortunately, Unix tools for monitoring disk I/O data are few and rather poor. BSD-like systems provide the iostat command (all but Linux have some version of it). Here is an example of its output from a FreeBSD system experiencing moderate usage on one of its two disks: $ iostat 6 tty ad0 ad1 cd0 cpu tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id 0 13 31.10 71 2.16 0.00 0 0.00 0.00 0 0.00 0 0 11 2 87 0 13 62.67 46 2.80 0.00 0 0.00 0.00 0 0.00 0 0 10 2 88 0 13 9.03 64 0.56 0.00 0 0.00 0.00 0 0.00 1 0 7 1 91 0 13 1.91 63 0.12 0.00 0 0.00 0.00 0 0.00 2 0 4 2 92 0 13 2.29 64 0.14 0.00 0 0.00 0.00 0 0.00 2 0 5 1 92 The command parameter specifies the interval between reports (and we've omitted the first, summary one, as usual). The columns headed by disk names are the most useful for our present purposes. They show current disk usage as the number of transfers/sec (tps) and MB/sec. System V-based systems offer the sar command, and it can be used to monitor disk I/O. Its syntax in this mode is: $ sar -d interval [count] interval is the number of seconds between reports, and count is the total number of reports to produce (the default is one). In general, sar's options specify what data to include in its report. sar is available for AIX, HP-UX, Linux, and Solaris. However, it requires that process accounting be set up before it will return any data. This report shows the current disk usage on a Linux system: $ sar -d 5 10 Linux 2.4.7-10 (dalton) 05/29/2002 07:59:34 PM DEV tps blks/s 07:59:39 PM dev3-0 9.00 70.80 07:59:39 PM dev22-0 0.40 1.60 07:59:39 PM DEV tps blks/s 07:59:44 PM dev3-0 61.80 494.40 07:59:44 PM dev22-0 10.80 43.20 07:59:44 PM DEV tps blks/s 07:59:49 PM dev3-0 96.60 772.80 07:59:49 PM dev22-0 0.00 0.00 Average: DEV tps blks/s Average: dev3-0 78.90 671.80 Average: dev22-0 1.12 4.48 The first column of every sar report is a time-stamp. The other columns give the transfer operations per second and blocks transferred per second for each disk. Note that devices are specified by their major and minor device numbers; in this case, we are examining two hard disks. 15.5.2 Getting the Most From the Disk SubsystemDisk performance is something that more effectively results from installation-time planning and configuration than from after-the-fact tuning. Different techniques are most effective for optimizing different kinds of I/O. This means that you'll need to understand the I/O performed by the applications/typical workload on the system. There are two sorts of disk I/O:
Three major factors affect disk I/O performance in general:
15.5.2.1 Disk hardwareIn general, the best advice is to choose the best hardware you can afford when disk I/O performance is an important consideration. Remember that the best SCSI disks are many times faster than the fastest EIDE ones, and also many times more expensive. These are some other points to keep in mind:
15.5.2.2 Distributing the data among the available disksThe next issue to consider after a system's hardware configuration is planning data distribution among the available disks: in other words, what files will go on which disk. The basic principle to take into account in such planning is to distribute the anticipated disk I/O across controllers and disks as evenly as possible (in an attempt to prevent any one resource from becoming a performance bottleneck). In its simplest form, this means spreading the files with the highest activity across two or more disks. Here are some example scenarios that illustrate this principle:
Of course, placing heavily accessed files on network rather than local drives is almost always a guarantee of poor performance. Finally, it is also almost always a good idea to use a separate disk for the operating system filesystem(s) (provided you can afford to do so) to isolate the effects of the operating system's own I/O operations from user processes. 15.5.2.3 Data placement on diskThe final disk I/O performance factor that we will consider is the physical placement of files on disk. The following general considerations apply to the relationship between file access patterns, physical disk location, and disk I/O performance:
15.5.3 Tuning Disk I/O PerformanceSome systems offer a few hooks for tuning disk I/O performance. We'll look at the most useful of them in this subsection. 15.5.3.1 Sequential read-aheadSome operating systems attempt to determine when a process is accessing data files in a sequential manner. When it decides that this is the access pattern being used, it attempts to aid the process by performing read-ahead operations: reading more pages from the file than the process has actually requested. For example, it might begin by retrieving two pages instead of one. As long as sequential access of the file continues, the operating system might double the number of pages read with each operation before settling at some maximum value. The advantage of this heuristic is that data has often already been read in from disk at the time the process asks for it, and so much of the process's I/O wait time is eliminated because no physical disk operation need take place. 15.5.3.1.1 AIXAIX provides this functionality. You can alter the default threshold value of 2 and 8 pages using these vmtune options:
Both parameters must be a power of 2. 15.5.3.1.2 LinuxLinux provides some kernel parameters related to read-ahead behavior. They may be accessed via these files in /proc/sys/vm:
Finally, the Linux Logical Volume Manager allows you to specify the read-ahead size when you create a logical volume with lvcreate, via its -r option. For example, this command specifies a read-ahead size of 8 sectors and also creates a contiguous logical volume: # lvcreate -L 800M -n bio_lv -r 8 -C y vg1 The valid range for -r is 2 to 120. 15.5.3.2 Disk I/O pacingAIX also provides a facility designed to prevent general system interactive performance from being adversely affected by large I/O operations. By default, write requests are serviced by the operating system in the order in which they are made (queued). A very large I/O operation can generate many pending I/O requests, and users needing disk access can be forced to wait for them to complete. This occurs most frequently when an application computes a large amount of new data to be written to disk (rather than processing a data set by reading it in and then writing it back out). You can experience this effect by copying a large file 32MB or more in the background and then running an ls command on any random directory you have not accessed recently on the same physical disk. You'll notice an appreciable wait time before the ls output appears. Disk I/O pacing is designed to prevent large I/O operations from degrading interactive performance. It is disabled by default. Consider enabling it only under circumstances like those described. This feature may be activated by changing the values of the minpout and maxpout system parameters using the chdev command. When these parameters are nonzero, if a process tries to write to a file for which there are already maxpout or more pending write operations, the process is suspended until the number of pending requests falls below minpout. maxpout must be one more than a multiple of 4: 5, 9, 13, and so on (i.e., of the form 4x+1). minpout must be a multiple of 4 and at least 4 less than maxpout. The AIX documentation suggests starting with values of 33 and 16, respectively, and observing the effects. The following command will set them to these values: # chdev -l sys0 -a maxpout=33 -a minpout=16 If interactive performance is still not as rapid as you want it to be, try decreasing these parameters; on the other hand, if the performance of the job doing the large write operation suffers more than you want it to, increase them. Note that their values do persist across boot because they are stored in the ODM. |