4.1 Implementations of Physical Memory

Let's start by looking at how memory is physically implemented in modern systems. All modern, fast memory implementations are accomplished via semiconductors, ^[2] of which there are two major types: dynamic random access memory (DRAM) and static random access memory (SRAM). The difference between them is how each memory cell is designed. Dynamic cells are charge-based, where each bit is represented by a charge stored in a tiny capacitor. The charge leaks away in a short period of time, so the memory must be continually refreshed to prevent data loss. The act of reading a bit also serves to drain the capacitor , so it's not possible to read that bit again until it has been refreshed. Static cells, however, are based on gates, and each bit is stored in four or six connected transistors . SRAM memories retain data as long as they have power; refreshing is not required. In general, DRAM is substantially cheaper and offers the highest densities of cells per chip; it is smaller, less power- intensive , and runs cooler . However, SRAM is as much as an order of magnitude faster, and therefore is used in high-performance environments. Interestingly, the Cray-1S supercomputer had a main memory constructed entirely from SRAM. The heat generated by the memory subsystem was the primary reason that system was liquid-cooled.

^[2] In some cases, such as where resistance to ionizing radiation is required, magnetic core memory is still used.

There are two primary performance specifications for memory. The first represents the amount of time required to read or write a given location in memory, and is called the memory access time . The second, the memory cycle time , describes how frequently you can repeat a memory reference. They sound identical, but they are often quite different due to phenomena such as the need to refresh DRAM cells.

There is quite a gap between the speed of memory and the speed of microprocessors. In the early 1980s, the access time of commonly available DRAM was about 200 ns, which was shorter than the clock cycle of the commonly used 4.77 MHz (210 ns) microprocessors of the day. Fastforwarding two decades, the clock cycle time of the average home microprocessor is down to about a nanosecond (1 GHz), but memory access times are hovering around 50 ns.

Many different kinds of memory modules have been developed to improve system performance. I'll present a few here.

The oldest option currently in use is fast-page mode (FPM) memory. It implements the ability to read a full page (4 KB or 8 KB) of data during a single memory access cycle.
A refinement of this technique goes into extended data output (EDO) memory. The refinements are largely based on electrical modifications.
A more revolutionary change was implemented in synchronous DRAM (SDRAM) memory, which uses a clock to synchronize the input and output of signals. This clock is coordinated with the CPU clock, so the timings of all the components are synchronized. SDRAM also implements two memory banks on each module, which essentially doubles the memory throughput; it also allows multiple memory requests to be pending at once. A variation on SDRAM, called double-data rate SDRAM (DDR SDRAM) is able to read data on both the rising and falling edges of the clock, which doubles the data rate of the memory chip.

SDRAM modules are usually described as PC66, PC100, or PC133, which refers to the clock rate they are driven at: 66 MHz (15 ns), 100 MHz (10 ns), or 133 MHz (8 ns), respectively. DDR SDRAM modules, however, are usually referred to by their peak throughput; a PC2100 module, for example, is theoretically capable of about 2.1 GB/sec peak throughput.
Direct Rambus (RDRAM) memory, however, takes an entirely different approach. It uses a narrow (16 bits wide) but extremely fast (600-800 MHz) path to memory, and allows sophisticated pipelining of operations. It is widely used in high-performance embedded applications (e.g., Sony's PlayStation 2 and the Nintendo64 consoles). RDRAM modules are usually called PC600, PC700, or PC800, which refers to their clock rate.

Workstations and servers tend to be able to interleave memory across banks, which is conceptually similar to disk striping (see Section 6.2.1); if we have 128 MB of memory configured as four 32 MB modules, we store one bit on each module in turn , rather than the first 32 MB on the first module, the second 32 MB on the second module, etc. This allows much higher performance because of the avoidance of cycle time delays. Interleaving is almost always done in power-of-two increments . ^[3]

^[3] One exception is the Sun Ultra Enterprise hardware line, which will create 6-way interleaves.

It can also lead to confusion. Let's say we have a system with seven memory banks filled. The memory controller may well decide to make a four-way interleave, a two-way interleave, and a one-way interleave, which would mean that a memory-intensive process's performance would depend on where exactly it fell in memory. The best resource on how to configure your system's memory for optimum interleaving is the hardware vendor.