6.1. I/O Mapping from User to Device Computers store and retrieve data through supporting peripheral I/O devices. These devices typically include mass-storage devices, such as disk drives, archival-storage devices, and network interfaces. Storage devices such as disks are accessed through I/O controllers that manage the operation of their attached devices according to I/O requests from the CPU. Many hardware device peculiarities are hidden from the user by high-level kernel facilities, such as the filesystem and socket interfaces. Other such peculiarities are hidden from the bulk of the kernel itself by the I/O system. The I/O system consists of buffer-caching systems, general device-driver code, and drivers for specific hardware devices that must finally address peculiarities of the specific devices. An overview of the entire kernel is shown in Figure 6.1 (on page 216). The bottom third of the figure comprises the various I/O systems. Figure 6.1. Kernel I/O structure. There are three main kinds of I/O in FreeBSD: the character-device interface, the filesystem, and the socket interface with its related network devices. The character interface appears in the filesystem name space and provides unstructured access to the underlying hardware. The network devices do not appear in the filesystem; they are accessible through only the socket interface. Character devices are described in Section 6.2. The filesystem is described in Chapter 8. Sockets are described in Chapter 11. A character-device interface comes in two styles that depend on the characteristics of the underlying hardware device. For some character-oriented hardware devices, such as terminal multiplexers, the interface is truly character oriented, although higher-level software, such as the terminal driver, may provide a line-oriented interface to applications. However, for block-oriented devices such as disks, a character-device interface is an unstructured or raw interface. For this interface, I/O operations do not go through the filesystem or the page cache; instead, they are made directly between the device and buffers in the application's virtual address space. Consequently, the size of the operations must be a multiple of the underlying block size required by the device, and, on some machines, the application's I/O buffer must be aligned on a suitable boundary. Internal to the system, I/O devices are accessed through a set of entry points provided by each device's device driver. For a character-device interface, it accesses a cdevsw structure. A cdevsw structure is created for each device as the device is configured either at the time that the system is booted or later when the device is attached to the system. Devices are identified by a device number that is constructed from a major and a minor device number. The major device number uniquely identifies the type of device (really of the device driver). Historically it was used as the index of the device's entry in the character-device table. FreeBSD 5.2 has no character-device table. As devices are configured, entries are created for the device in the /dev filesystem. Each entry in the /dev filesystem has a direct reference to its corresponding cdevsw entry. FreeBSD 5.2 assigns a unique major device number to each device when it is configured to provide compatibility for applications that look at it. But it is not used internally by the kernel or the device driver. The minor device number is selected and interpreted solely by the device driver and is used by the driver to identify to which, of potentially many, hardware devices an I/O request refers. For disks, for example, minor device numbers identify a specific controller, disk drive, and partition. The minor device number may also specify a section of a device for example, a channel of a multiplexed device, or optional handling parameters. Device Drivers A device driver is divided into three main sections: Autoconfiguration and initialization routines Routines for servicing I/O requests (the top half) Interrupt service routines (the bottom half) The autoconfiguration portion of a driver is responsible for probing for a hardware device to see whether the latter is present and to initialize the device and any associated software state that is required by the device driver. This portion of the driver is typically called only once, either when the system is initialized or for transient devices when they are connected to the system. Autoconfiguration is described in Section 14.4. The section of a driver that services I/O requests is invoked because of system calls or by the virtual-memory system. This portion of the device driver executes synchronously in the top half of the kernel and is permitted to block by calling the sleep() routine. We commonly refer to this body of code as the top half of a device driver. Interrupt service routines are invoked when the system fields an interrupt from a device. Consequently, these routines cannot depend on any per-process state. Historically they did not have a thread context of their own, so they could not block. In FreeBSD 5.2 an interrupt has its own thread context, so it can block if it needs to do so. However, the cost of extra thread switches is sufficiently high that for good performance device drivers should attempt to avoid blocking. We commonly refer to a device driver's interrupt service routines as the bottom half of a device driver. In addition to these three sections of a device driver, an optional crash-dump routine may be provided. This routine, if present, is invoked when the system recognizes an unrecoverable error and wishes to record the contents of physical memory for use in postmortem analysis. Most device drivers for disk controllers provide a crash-dump routine. The use of the crash-dump routine is described in Section 14.6. I/O Queueing Device drivers typically manage one or more queues of I/O requests in their normal operation. When an input or output request is received by the top half of the driver, it is recorded in a data structure that is placed on a per-device queue for processing. When an input or output operation completes, the device driver receives an interrupt from the controller. The interrupt service routine removes the appropriate request from the device's queue, notifies the requester that the command has completed, and then starts the next request from the queue. The I/O queues are the primary means of communication between the top and bottom halves of a device driver. Because I/O queues are shared among asynchronous routines, access to the queues must be synchronized. Routines in both the top and bottom half of the device driver must acquire the mutex associated with the queue before manipulating it to avoid corruption from simultaneous modifications (mutexes were described in Section 4.3). For example, a bottom half interrupt might try to remove an entry that had not yet been fully linked in by the top half. Synchronization among multiple processes starting I/O requests is also serialized through the mutex associated with the queue. Interrupt Handling Interrupts are generated by devices to signal that an operation has completed or that a change in status has occurred. On receiving a device interrupt, the system schedules the appropriate device-driver interrupt service routine with one or more parameters that identify uniquely the device that requires service. These parameters are needed because device drivers typically support multiple devices of the same type. If the interrupting device's identity were not supplied with each interrupt, the driver would be forced to poll all the potential devices to identify the device that interrupted. The system arranges for the unit-number parameter to be passed to the interrupt service routine for each device by installing the address of an auxiliary glue routine in the interrupt-vector table. This glue routine, rather than the actual interrupt service routine, is invoked to service the interrupt; it takes the following actions: Collects the relevant hardware parameters and places them in the space reserved for them by the device. Updates statistics on device interrupts. Schedules the interrupt service thread for the device. Clears the interrupt-pending flag in the hardware. Returns from the interrupt. Because a glue routine is interposed between the interrupt-vector table and the interrupt service routine, special-purpose instructions that cannot be generated from C, which are needed by the hardware to support interrupts, can be kept out of the device driver. This interposition of a glue routine permits device drivers to be written without assembly language. |