4.3. Improving Memory Performance | HP ProLiant Servers AIS: Official Study Guide and Desk Reference

< Day Day Up >

Although processor speeds have increased dramatically, memory speeds continue to lag behind. One way that engineers try to improve memory performance is to shorten memory latency. Memory latency is the length of time it takes for a DRAM module to return data to the memory controller. Two main factors affect memory latency: access time and cycle time.

Access time is the amount of time it takes for data to show up on the data bus after the row is activated by RAS

At the beginning of a read operation, the memory controller lowers the RAS voltage, which sets the row address on the row address latch. The time it takes for the RAS line to fall to a predetermined low level is referred to as tRAC. The longer tRAC is, the longer the access time will be.

Later in the read operation, the memory controller lowers the CAS voltage, which sets the column address on the column address latch. The time it takes for the CAS line to fall to a predetermined low level is referred to as tCAC. Similar to tRAC, the longer tCAC is, the longer the access time will be.

Cycle time is the amount of time between read operations.

At the end of a read operation, the memory controller increases the voltage on the RAS and CAS lines to indicate the end of the operation. The time it takes to increase the voltage on both lines to a predetermined high level is called the precharge delay. The next operation cannot occur until after the voltage on both lines is high enough. The longer the precharge delay, the longer the cycle time.

4.3.1 CAS Latency

CAS latency is a number that refers to the ratio rounded to the next higher whole number between column access time and the clock cycle time. It is derived from dividing the column access time by the clock frequency and raising the result to the next whole number. The lower the CAS latency number, the less amount of time required for the first memory access in a data burst transfer.

4.3.2 Fast Page Mode

When a processor requests data, its next request is usually for data in the same row, or page. Therefore, to speed data to the processor, a technique called fast page mode (FPM) was developed. In FPM, when a processor makes a request for data, the memory controller retrieves the requested data plus an additional three columns of data from each memory chip.

The memory controller then assembles the data into four 1-byte chunks. Together these chunks are called a word. The memory controller sends, or bursts, the word to the cache, which stores it in a cache line. The cache then sends the requested byte of data to the processor.

If the next processor request is for data from the same page, the request can be filled quickly from the cache line. This is known as a page hit. If the requested data is not in the same page, the data is retrieved from memory. This is known as a page miss. A page miss can more than double the amount of cycles it takes to retrieve the data.

FPM decreases latency by eliminating the cycle time for RAS for the last 3 bits of data.

4.3.3 Extended Data Out DRAM

A drawback of FPM is that the data from one column had to be sent to the Data Out pins before the next column could be activated. Memory designers overcame this limitation in 1996 by introducing Extended Data Out (EDO) DRAM.

EDO decreases latency by activating the next column while data is still on the Data Out pins. This change to DRAM timing results in a 20% to 30% decrease in the amount of time it takes to get data from the memory module to the memory controller.

4.3.4 Synchronous DRAM

FPM and EDO are both asynchronous technologies because they do not function according to the system clock. They have their own timing mechanisms that coordinate reading and writing data to memory with the system clock.

In 1997, synchronous DRAM (SDRAM) was introduced. SDRAM relies on the same clock used by the memory bus. This development eliminated the need for the special timing mechanisms.

SDRAM differs from asynchronous RAM in another significant way. SDRAM DIMMs contain multiple banks of chips. While the memory controller is receiving data from one bank, it can precharge a row in the other bank. This process reduces the amount of time that the controller has to wait for data to be available.

4.3.5 Double Data Rate RAM

Double Data Rate (DDR) memory effectively doubles the number of data transfers per clock cycle by sending data on both the rising and falling edge of each clock signal. Engineers often call this transfer method a double-pumped bus. This technique is illustrated in Figure 4-9.

Figure 4-9. DDR RAM can transfer data on both the rising and falling edge of each clock cycle.

The net effect is that DDR RAM can send or receive twice as much data to or from the processor in one clock cycle. Therefore, although a memory bus might run at 100MHz or 133MHz, it can have an effective data transfer rate of a standard 200MHz or 266MHz bus, respectively.

A clock cycle is best represented as a wave, with a rising edge, a falling edge, and a plateau between the edges. SDRAM DIMMs transfer data only on the rising edge of the clock cycle.

DDR RAM was introduced in ProLiant ML530 G2 and DL580 G2 servers in 2002.

4.3.6 Comparing DDR Memory and SDRAM Memory Technologies

Performance is enhanced using DDR memory architecture and modules by taking advantage of transferring data on both sides of the clock.

The naming convention for DDR RAM relates to the peak bandwidth, rather than the clock rates like that used by PC100 and PC133 SDRAM.

DDR PC1600 SDRAM has the same data bus width as PC100 and PC133 SDRAM (64 bits plus ECC bits), but because it transfers data twice per clock cycle, PC1600 memory has twice the effective data transfer rate of PC100 DIMMs.

Although both PC133 SDRAM DIMMs and DDR PC2100 SDRAM DIMMs are used on a 133MHz memory bus, PC2100 SDRAM has twice the effective data transfer rate of PC133 DIMMs because of DDR technology on a double-pumped bus.

In addition, CAS latency on PC1600 DIMMs is 2 or 2.5 clocks, a half-clock faster than the CAS latency on PC100 or PC133 DIMMs. CAS latency on PC2100 DIMMs is 2.5 clocks, a half-clock faster than the CAS latency on PC133 DIMMs.

Physically, the DIMM types look different, as shown in Figure 4-10. The DDR DIMMs are smaller only 1.2 inches high, as compared to the 1.7 inches for the PC133 and PC100. In addition, DDR DIMMs have a single notch in the gold connectors; PC100 and PC133 DIMMs have a double notch.

Figure 4-10. A comparison between a PC 133 SDRAM DIMM and a PC2100 SDRAM DIMM.

4.3.7 Comparing DDR SDRAM to RDRAM

Most industry-standard servers have not implemented Rambus DRAM (RDRAM), which has a wider data bus and a faster clock speed than SDRAM. One reason is that no enterprise-class chipset is available for RDRAM memory systems.

Other reasons that SDRAM is implemented in servers instead of RDRAM include

Because of architectural limitations, the maximum memory supported by RDRAM chipsets is much lower that those designed for servers using SDRAM.
RDRAM DIMMs are much more expensive than comparably sized SDRAM DIMMs.
Latency in RDRAM is 14 to 18 clocks (compared to 2 or 2.5 clocks for PC1600 SDRAM).
RDRAM consumes more power and subsequently produces more heat than SDRAM DIMMs.

4.3.8 Pumped Buses

DDR memory works on a double-pumped bus. A pumped bus sends data more than once per clock cycle. It does this by using more than one clock. The clocks are out of phase with each other. Data is sent when their strobes intersect.

When four clocks are used, the bus sends data four times per clock cycle, because there are four intersections per cycle. This is known as a quad-pumped bus. Figure 4-11 illustrates the differences between buses.

Figure 4-11. Data transfer cycles for pumped buses.

4.3.8.1 HOW DOES THAT COMPUTE?

The maximum data transfer rate is calculated using the following formula:

In a 64-bit, 100MHz system, data is sent across the system bus once every clock cycle:

In a quad-pumped Xeon system, data is sent across the system bus four times every clock cycle:

Therefore, a quad-pumped Xeon system has the same data transfer rate as a 400MHz single-pumped bus:

4.3.9 Interleaved Memory

Interleaved memory technology increases the amount of data obtained in a single memory access.

When data is written to memory, the memory controller distributes the data across DIMMs in a bank, as shown in Figure 4-12. When the processor sends a read request to the memory controller, the memory controller sends the request to all DIMMs in the bank simultaneously. The data at the requested address is returned along with data from subsequent sequential addresses. The memory controller interleaves the data from all the DIMMs to put it back in its original order.

Figure 4-12. Interleaved memory technology.

Because more than one DIMM is used in this transaction, the amount of data that can be written or read is larger than if a single DIMM were used. For example, in dual-interleaved memory, where two DIMMs are used, the processor can read and write twice the amount of data in one memory access. In four-way interleaved memory, the processor can read and write four times the amount of data in one memory access.

The data is sent to the processor cache in anticipation of future data requirements. The processor uses the faster cache, which now contains much more sequential data than in a non-interleaved system, to fulfill subsequent requests until it cannot find the data it needs. At that point, it sends another request to the memory controller.

Using interleaved memory, the processor cache can meet the data requests from the processor more than 98% of the time, which provides two benefits. First, the bus between the processor and the cache runs at the processor speed. The bus between the cache and memory runs at a much slower speed. Interleaved memory fills the cache more quickly, so the processor fills its cache faster. Second, because most requests are filled from cache, fewer requests are sent to the memory controller. As a result, there is less chance that the second processor will have to wait to access the memory controller.

< Day Day Up >