| 6.3. Disk DevicesDisk devices fill a central role in the UNIX kernel and thus have additional features and capabilities beyond those of the typical character device driver. Historically, UNIX provided two interfaces to disks. The first was a character-device interface that provided direct access to the disk in its raw form. This interface is still available in FreeBSD 5.2 and is described in Section 6.2. The second was a block-device interface that converted from the user abstraction of a disk as an array of bytes to the structure imposed by the underlying physical medium. Block devices were accessible directly through appropriate device special files. Block devices were eliminated in FreeBSD 5.2 because they were not needed by any common applications and added considerable complexity to the kernel. Entry Points for Disk Device DriversDevice drivers for disk devices contain all the usual character device entry points described in Section 6.2. In addition to those entry points there are three entry points that are used only for disk devices: | strategy | Start a read or write operation, and return immediately. I/O requests to or from filesystems located on a device are translated by the system into calls to the block I/O routines bread() and bwrite(). These block I/O routines in turn call the device's strategy routine to read or write data not in the memory cache. Each call to the strategy routine specifies a pointer to a buf structure containing the parameters for an I/O request. If the request is synchronous, the caller must sleep (on the address of the buf structure) until I/O completes. |  | dump | Write all physical memory to the device. The dump entry point saves the contents of memory on secondary storage. The system automatically takes a dump when it detects an unrecoverable error and is about to crash. The dump is used in postmortem analysis of the problem that caused the system to crash. The dump routine is invoked with context switching and interrupts disabled; thus, the device driver must poll for device status rather than wait for interrupts. At least one disk device is expected to support this entry point. | 
 
 Sorting of Disk I/O RequestsThe kernel provides a generic disksort() routine that can be used by all the disk device drivers to sort I/O requests into a drive's request queue using an elevator sorting algorithm. This algorithm sorts requests in a cyclic, ascending, block order, so that requests can be serviced with minimal one-way scans over the drive. This ordering was originally designed to support the normal read-ahead requested by the filesystem and also to counteract the filesystem's random placement of data on a drive. With the improved placement algorithms in the current filesystem, the effect of the disksort() routine is less noticeable; disksort() produces the largest effect when there are multiple simultaneous users of a drive. The disksort() algorithm is shown in Figure 6.3. A drive's request queue is made up of two lists of requests ordered by block number. The first is the active list; the second is the next-pass list. The request at the front of the active list shows the current position of the drive. If next-pass list is not empty, it is made up of requests that lie before the current position. Each new request is sorted into either the active or the next-pass list, according to the request's location. When the heads reach the end of the active list, the next-pass list becomes the active list, an empty next-pass list is created, and the drive begins servicing the new active list. Figure 6.3. Algorithm for disksort(). void disksort(     drive queue *dq,     buffer *bp); {     if (active list is empty) {         place the buffer at the front of the active list;         return;     }     if (request lies before the first active request) {         locate the beginning of the next-pass list;         sort by into the next-pass list;     } else         sort by into the active list; } 
 Disk sorting can also be important on machines that have a fast processor but that do not sort requests within the device driver. Here, if a write of several Mbyte is honored in order of queueing, it can block other processes from accessing the disk while it completes. Sorting requests provides some scheduling, which more fairly distributes accesses to the disk controller. Most modern disk controllers accept several concurrent I/O requests. The controller then sorts these requests to minimize the time needed to service them. If the controller could always manage all outstanding I/O requests, then there would be no need to have the kernel do any sorting. However, most controllers can handle only about 15 outstanding requests. Since a busy system can easily generate bursts of activity that exceed the number that the disk controller can manage simultaneously, disk sorting by the kernel is still necessary. Disk LabelsA disk may be broken up into several partitions, each of which may be used for a separate filesystem or swap area. A disk label contains information about the partition layout and usage including type of filesystem, swap partition, or unused. For the fast filesystem, the partition usage contains enough additional information to enable the filesystem check program (fsck) to locate the alternate superblocks for the filesystem. The disk label also contains any other driver-specific information. Having labels on each disk means that partition information can be different for each disk and that it carries over when the disk is moved from one system to another. It also means that, when previously unknown types of disks are connected to the system, the system administrator can use them without changing the disk driver, recompiling, and rebooting the system. The label is located near the beginning of each drive usually, in block zero. It must be located near the beginning of the disk to enable it to be used in the first-level bootstrap. Most architectures have hardware (or first-level) bootstrap code stored in read-only memory (ROM). When the machine is powered up or the reset button is pressed, the CPU executes the hardware bootstrap code from the ROM. The hardware bootstrap code typically reads the first few sectors on the disk into the main memory, then branches to the address of the first location that it read. The program stored in these first few sectors is the second-level bootstrap. Having the disk label stored in the part of the disk read as part of the hardware bootstrap allows the second-level bootstrap to have the disk-label information. This information gives it the ability to find the root filesystem and hence the files, such as the kernel, needed to bring up FreeBSD. The size and location of the second-level bootstrap are dependent on the requirements of the hardware bootstrap code. Since there is no standard for disk-label formats and the hardware bootstrap code usually understands only the vendor label, it is usually necessary to support both the vendor and the FreeBSD disk labels. Here, the vendor label must be placed where the hardware bootstrap ROM code expects it; the FreeBSD label must be placed out of the way of the vendor label but within the area that is read in by the hardware bootstrap code so that it will be available to the second-level bootstrap. For example, on the PC architecture, the BIOS expects sector 0 of the disk to contain boot code, a slice table, and a magic number. BIOS slices can be used to break the disk up into several pieces. The BIOS brings in sector 0 and verifies the magic number. The sector 0 boot code then searches the slice table to determine which slice is marked active. This boot code then brings in the operating-system specific bootstrap from the active slice and, if marked bootable, runs it. This operating-system specific bootstrap includes the disk-label described above and the code to interpret it. |