IO Components That Affect Performance | Performance Tuning for Linux Servers

I/O Components That Affect Performance

The various I/O subsystems have to communicate with each other as efficiently and effectively as possible. In smaller systems, an I/O bus serves as the shared communication link between the I/O subsystems, utilizing a memory bus to link the CPUs to the memory subsystem. The two major advantages of a bus-based architecture are low cost and versatility. By defining a single interconnect, new devices can easily be added or even moved among computer systems that support a common I/O bus architecture.

The main drawback of a bus-based system is that the bus may create a communication bottleneck, possibly limiting the maximum I/O throughput. Larger systems bypass this potential bottleneck by incorporating multiple I/O buses. A refinement of a memory bus architecture that is designed to eliminate the potential bottleneck of the single shared path is known as a crossbar. A crossbar is a series of single buses arranged to provide multiple paths among the CPUs and the memory subsystem. The use of multiple buses in a two- or three-dimensional network implies that each CPU can have a unique access path to any part of the memory subsystem. This vastly reduces the potential for any bandwidth contention.

The two main factors that heavily impact overall system performance are the number of bus masters and the type of bus clocking mechanism, and whether the bus clocking mechanism is synchronous or asynchronous.

Bus masters are devices that can initiate a read() or write() request. A CPU is always considered a bus master. If multiple CPUs are configured, an arbitration schema is required to decide among the bus masters who get the bus next. In the case of multiple bus masters, a bus usually offers higher throughput when incorporating a split transaction technology, also referred to as a packet-switched bus technology. As an example, in a split transaction paradigm, a read() request is decomposed into a read() request transaction that contains the address and a memory reply transaction that holds the actual data. Each transaction has to be tagged so that the CPU and the memory subsystem can track the transaction.

In a split transaction design, the bus is made available to other bus masters while the memory subsystem services the request. A split transaction bus normally provides a higher throughput but also incurs a higher latency than a bus that is held throughout a transaction, referred to as a circuit-switched bus. Bus clocking implementation varies, depending on whether a bus is synchronous or asynchronous. In the synchronous case, the bus includes a clock in the control lines and utilizes a fixed protocol for addresses and data. Because little or no logic is needed to decide what to do next, synchronous buses are fast and inexpensive.

The two major disadvantages of a synchronized bus are that everything on the bus has to run at the same speed and, based on clock skew problems, the bus cannot be long. An asynchronous bus is not clocked. Instead, self-timed handshaking protocols are used between a bus sender and a bus receiver. Asynchrony provides easy accommodation of a vast variety of devices and allows for lengthening of the bus without encountering any clock skew or synchronization issues.