I/O Framework Prior to the 10.0 release of HP-UX, there were two distinct families of HP products that used the HP-UX operating system. One was the workstation family, or S700 systems. These systems evolved from the Motorola 68000-based single-user systems. The other family was the server family, or S800 systems. These were the original PA-RISC machines, using the hardware developed for both the HP-UX servers and the for systems running Hewlett-Packard's proprietary operating system MPE. Over time, the S700 systems were moved to the PA-RISC platform. It became clear that a single operating system that ran on both workstations and servers made good sense. With release 7.0, HP-UX had merged most of the kernel functionality between the two platforms. Hewlett-Packard no longer had different releases for the two platforms. However, the I/O systems still remained incompatible a workstation had to run the workstation version of 7.0, and a server ran the server version of 7.0. Release 10.0 finally converged the two I/O systems. This coincided with the release of the K-class system, the first system to have I/O hardware from both the server and workstation platforms. This convergence was made possible by creating a framework in which the various drivers could coexist despite having been developed in very different environments. Figure 10-4 shows the block diagram of this I/O framework. Figure 10-4. Converged Workstation and Server I/O Systems The key components of the framework are the general I/O (GIO) system and the context-dependent I/O (CDIO) systems. GIO handles all the functions that are done on a systemwide level. The CDIOs provide environments for the drivers to work in. The SIO CDIO provides the environment for the server I/O subsystem, and the WSIO CDIO provides the environment for the workstation I/O subsystem. In addition, there are CDIOs for other driver environments and hardware bus types. One special case of the CDIO is the central bus CDIO (CB-CDIO). This CDIO is responsible for system platform hardware. Within this CB-CDIO environment are a set of platform support modules (PSMs). As new functionality is added to the system hardware for things like online addition and replacement (OLAR), for example new PSMs are created to support that functionality. A system that supports OLAR has the OLAR PSM included in the kernel to provide the software interface. General I/O As mentioned, the GIO subsystem provides global functionality to the I/O system. Every system has GIO in the kernel. Some of the functions of the GIO include the following: Management of I/O Configuration: The overall configuration of the I/O system is a global concept and is managed in the GIO. The GIO maintains a structure called the iotree, which describes the relationship between the modules in the system. Algorithms for Device Discovery: At system boot time (and when requested by the ioscan command) the system must probe the I/O modules to determine what is present and bind the hardware to the correct drivers. The GIO oversees this process, although the individual CDIOs do the actual interrogation of the modules. System Administration Interface: Users and system administrators must see a consistent view of all I/O modules. The GIO provides this through a pseudo-driver called devconfig. Services: The GIO provides services that can be used by all of the CDIOs. These services include such things as inter-CDIO communication and dynamic module loading and unloading. The GIO is not directly involved in individual I/O transactions or with individual devices. It provides a global environment in which the CDIOs operate and a uniform interface to the user. I/O Global Data Structures One of the most important data structures provided by the GIO is the iotree. The data structure of the iotree is an "object" in that the GIO provides procedures for manipulating the tree structure, and the structure may only be manipulated by those procedures. The basic building block of the iotree is an iotree node. Each node in the tree has a pointer to its parent, a pointer to its first child (the one with the lowest address), and a pointer its first sibling. Each node also has information about the driver that manages the node. Figure 10-5 is a diagram of a small iotree, showing the pointer structure. Figure 10-5. Iotree Example The figure illustrates that each node in the tree represents a module and that for each of these modules we record the driver name, the class, the hardware address, and the CDIO that corresponds to that module. The process of building the iotree starts with the routine io_virt_mode_config(), which is called by init_main() at boot time. This routine first initializes the GIO, then calls io_scan() to begin scanning the I/O busses. io_scan() puts all the nodes in the tree into the SCAN state, then calls gio_scan_subtree(), which starts configuring the central bus. Each CDIO in the system is given a chance to claim each device on the bus. Information about each device is passed to a Scan routine in the CDIO. If the CDIO recognizes the device, it claims it. This scanning of the bus continues recursively. As each CDIO is told to scan a module and given a chance to claim it, it also calls the claim routines for the interface card drivers within that CDIO. Those drivers in turn get a chance to claim any interface cards they support on the bus. As each module gets claimed by a driver, its iotree node gets put into the CLAIMED state. Once the entire scan is complete, any nodes that did not get claimed by a driver are still in the SCAN state. These are then moved to the UNCLAIMED state, indicating that there is no driver in the kernel that supports them. Another important data structure maintained by the GIO is the device switch tables. This pair of tables is used to associate device files in the system with device drivers. Each device file (also known as a special file) has three characteristics: the type (block or character), the major number, and the minor number. When a special file is accessed, the system uses either the block device switch (bdevsw) or the character device switch (cdevsw) depending on the type of the file. It then uses the major number as an index into that table to get a structure containing the information about the driver. The structures for block and for character devices are slightly different, but they contain similar information. Here's the declaration for the character device switch: struct cdevsw { d_open_t d_open; d_close_t d_close; d_read_t d_read; d_write_t d_write; d_ioctl_t d_ioctl; d_select_t d_select; d_option1_t d_option1; int d_flags void *d_drv_info; pfilter_t *d_pfilter_s; aio_ops_t *d_aio_ops; }; The declaration for the bdevsw looks like this: struct bdevsw { d_open_t d_open; d_close_t d_close; d_strategy_t d_strategy; d_dump_t d_dump; d_psize_t d_psize; int d_flags; int (*reserved) __(()); void *d_drv_info; pfilter_t *d_pfilter_s; aio_ops_t *d_aio_ops; }; With the exception of d_flags, all of these fields are pointers either to the functions that handle certain operations or to information about the driver. As an example, the major number for the stape driver is 205. The 205th entry in the cdevsw table will have an entry that looks something like this: {stape_open, stape_close, stape_read, stape_write, stape_ioctl, NULL, NULL, stape_info, NULL, C_ALLCLOSES | C_MGR_IS_MP, } The first five of these entries are pointers to the routines that handle opens, closes, and so on for SCSI tapes. The last entry, the d_flags field, indicates that the system should call the stape_close routine for all closes and that the driver is aware of multi-processor systems and can be used in that environment. The stape_info entry is a pointer to the drv_info_t for this driver. Exploring the system: Load up all of the cdevsw entries: q4> load struct cdevsw from &cdevsw max 256
Keep the entry for major number 205: q4> keep indexof == 205
Print the cdevsw data: q4> print tx
Get the name of the open routine: q4> ex d_open using a
|
The block and character switch tables are populated during the I/O discovery process described above. As each device is claimed by a driver, the GIO puts that driver's information into the appropriate device switch. Context-Dependent I/O CDIOs encapsulate functionality that is either specific to a particular type of bus or provides a driver environment. Not all CDIOs need be present in a kernel. The kernel contains only those CDIOs that are required by the particular hardware for which it was built. Thus, an S700 system would not have the SIO CDIO because that hardware doesn't exist in S700 systems, and an older S800 system such as the G-class wouldn't have the WSIO CDIO for the same reason. Each CDIO must provide a number of interface functions that allow the GIO to communicate with it. These functions are called by the GIO to initialize the CDIO and also as part of the device discovery process. The following are the functions used for CDIO initialization: install(): The first call made to a CDIO. This function is called while the system is still in real mode, before virtual memory translations have been turned on. Generally it just registers the CDIO with the GIO. module_init(): This function too is called in real mode. This is a second place for CDIOs to perform tasks that must be done before virtual memory translations are started. init_begin(): This is the first function in the CDIO that is called in virtual mode. This allows the CDIO to perform any configuration that needs to be done with virtual translations on. install_drv(): This function is used only by the WSIO CDIO. It allows the CDIO to set up structures that are needed before drivers are installed. init_middle(): This function is called after all drivers have been installed. The CDIO can expect that no more driver installation will take place until after the system is up and running. init_end(): This is the final configuration call made to the CDIO. CDIOs can do any post-configuration cleanup in this routine. Once the CDIO is initialized, there are three functions used as part of the device discovery process. These functions allow for the configuration of drivers within the CDIO: scan(): This function is called by the GIO to allow the CDIO to claim I/O modules. The GIO passes an ionode, which represents a particular hardware path, to the CDIO using this call. The CDIO probes the hardware modules connected to that hardware path, looking for any modules that it wishes to claim. If it finds hardware to claim, it creates a node in the iotree for the module. config(): This function is called by the GIO to tell the CDIO to configure the driver associated with a node. unconfig(): This function is called by the GIO to tell the CDIO to unconfigure a driver. This may be because a particular hardware module is no longer available, or it may be in preparation for unloading a dynamically loadable driver. The names of these routines are typically preceded with the name of the CDIO, so the config routine for WSIO is wsio_config(), for SIO it's sio_config(), for CORE it's core_config(), and so on. Of course, the remaining functionality in the CDIO is unique to each CDIO. For example, the SIO CDIO provides a routine called io_send(), which is used by all SIO drivers to send messages to other parts of the SIO subsystem. The WSIO CDIO provides DMA services that are used by all WSIO drivers to simplify doing DMA transfers. Each CDIO provides a set of functionality suited to the particular drivers that run under that environment. Central Bus CDIO The CB_CDIO is a special case of a CDIO. It is installed like any other CDIO and interfaces with GIO in the same way. But its job is to manage the central bus of the system rather than a particular I/O environment or set of drivers. Every system has the CB-CDIO compiled into it. Up through release 11.0 of HP-UX, much of this functionality was provided by a CDIO called the Precision Architecture CDIO (PA-CDIO). This CDIO supported the PA-RISC specific portions of the kernel. When Hewlett-Packard began looking at porting the HP-UX system to other platforms, it became apparent that a more generic interface was needed to support the platform hardware. So, with HP-UX 11.0 release 99OP, Hewlett-Packard removed the PA-CDIO and replaced it with the CB-CDIO. One of the key features of the CB-CDIO is the support for PSMs. A PSM is a piece of code that installs into the CB-CDIO just as a device driver installs into other CDIOs. But rather than supporting a particular type of device, a PSM supports a particular platform's specific functionality. These can support a particular functionality, such as the olar_psm, which supports OLAR, or they can support particular system hardware such as the sapic_psm, which supports a particular programmable interrupt controller (PIC). Providing PSMs as modules that can be loaded as if they were device drivers allows the kernel to include only those PSMs needed on a particular platform. In fact, it is possible for PSMs to be dynamically loaded after boot time, just as a driver is, to enable new platform functionality. The CB-CDIO provides a wide range of functionality to the kernel: GIO Interface: The CB-CDIO has to follow all the rules of any CDIO, so it must implement the standard interface to the GIO. PIK Interface: The platform-independent kernel (PIK) interface contains routines that provide services to kernel functions that are the same across all platforms. PDK Interface: The platform-dependent kernel (PDK) interface provides services to those parts of the kernel that are unique to a particular platform. PSI Switch Table: The platform support interface (PSI) switch table provides the access to the underlying PSM. Another important feature provided by the CB-CDIO is I/O forwarding. I/O forwarding is the process of having I/O requests issued by the processor that will receive the interrupt when the I/O completes. Remember that each I/O module is told at boot time what interrupt it should generate and which processor it should interrupt. In this sense, each module is "owned" by a particular processor. Ideally, we want the I/O to complete on the same processor on which it started. This improves performance because data is more likely to be associated with the I/O in the cache. The CB-CDIO manages this by maintaining an I/O queue for each processor, called mp_io_queue. When a driver is ready to queue an I/O request to be started, it calls ioforw_sched() to have the I/O placed on the queue of the correct processor. CB-CDIO maintains a map, called io_forw_map, which maps devices to the processors that handle the interrupts for those devices. ioforw_sched() places the I/O onto the mp_io_queue for the correct processor, and that processor then starts the I/O. This way, when the interrupt comes back from the device at I/O completion, that processor is likely to have the relevant data in its cache. |