2.2 Single CPU Computer

Pull apart any computer and you will find some common hardware components. There is a Central Processing Unit (CPU), which is responsible for executing instructions. It is typically considered to be the brains of the computer.

There is some physical memory. The memory holds the code that is being executed and the data that is being accessed by the CPU currently. The CPU is designed to execute code that exists in memory. If the code is still out on the disk, the system will complain (cause an exception) and a page fault will take place. Upon completion of the page fault, the on-disk page containing the faulting instruction will have been brought into memory and placed into a page frame (in physical memory). The same instruction that caused the fault will then be re-executed.

There will also be some I/O capability. This can vary wildly from machine to machine but usually consists of disks, tapes, printers, network interface, and other hardware items.

Figure 2-2 depicts a single CPU computer with some physical memory and I/O capability.

click to expand
Figure 2-2: Computer Hardware Components

Most of the software features discussed in this chapter do not appear in the above list of hardware components. Tru64 UNIX and TruCluster Server are primarily software mechanisms. The software that is charged with managing the hardware is the operating system (Tru64 UNIX in our case). If you are wondering why we're taking you through a review of basic computer concepts, be patient. Our goal is to raise the issues and problems involved in implementing a cluster and to determine how the TruCluster Server software and components solve these problems.

The first step is to consider a standalone system. It allows no access to its resources other than by software running on the system itself. The resources in this case may be memory, disks, other I/O components, or even the CPU itself. The coordination and synchronization necessary to keep the various components of the system from stepping on each other (or causing corruption of some sort) is achieved through the use of processing modes and System Priority Levels (SPLs).

Note

SPLs are sometimes referred to as Interrupt Priority Levels (IPLs).

In essence, the CPU is always running in one of 8 SPLs (0-7) with the higher SPLs being more important than the lower SPLs.

The system will also be in one of two possible processing modes: user mode or kernel mode. Most of the standard processing in the machine will take place at SPL 0 and in user mode. If a more important event needs to take precedence over the current processing, an interrupt is issued, and an Interrupt Service Routine (ISR) executes in kernel mode.

You may be thinking that this is related to process priorities (or thread priorities). Nope. All of the process/thread activities (with certain kernel mode exceptions) take place at SPL 0 and in user mode. Even the payroll program that generates your paycheck executes at SPL 0. So what exactly is more important than the program generating your paycheck? Bear in mind that the system sees the payroll program as just another user-mode program. The more important processing is the software that has to execute in response to an interrupt and other kernel mode processing that may need access to system support structures such as device drivers and kernel routines.

The ISR must run in such a way that it will not be interrupted by subsequent interrupts at the same level or below. This is achieved by setting bits in a CPU register (the Processor Status or PS register) to indicate at which SPL the processor is currently running. There is another bit in the PS register that indicates kernel or user mode. Note carefully that the SPL is a per-CPU phenomenon since every CPU has a PS register.

For the purposes of our discussion, interrupts are generated by hardware devices to indicate that an event has taken place (I/O completion, for example). So the various devices can be assigned an SPL at which to run their ISR. A device that interrupts at a higher SPL takes precedence over the ISR of a device that interrupts at a lower SPL. When all of the ISRs have finished processing, the system lowers the SPL to the level it occupied before the interrupt occurred.

If you're wondering how this pertains to a clustered environment, keep in mind that a clustered environment includes multiple systems with many I/O devices, and each system has one or more CPUs, and each CPU has its own PS register. Therefore, each CPU in a computer in a cluster may be at a different SPL from the other cluster members' CPU(s) at any one instant. The point is that there have to be other synchronization mechanisms available both to processes and to the systems themselves in order to maintain a level of sanity and synchronization in a cluster. The next few sections will introduce a number of system mechanisms to handle synchronization and communication as the complexity of the system approaches full-blown clusterhood.