Memory Modules


The CPU and motherboard architecture (chipset) dictates a particular computer's physical memory capacity and the types and forms of memory that can be installed. Over the years, two main changes have occurred in computer memory: It has gradually become faster and wider. The CPU and the memory controller circuitry indicate the speed and width requirements. In most cases, the server chipset's North Bridge contains the memory controller. For this reason, Intel chipsets based on hub architecture call this chip the memory controller hub chip. However, AMD Opteron processors incorporate the memory controller into the processor itself.

Even though a system might physically support a given amount of memory, the type of software you run could dictate whether all the memory can be used.

The first modern server-class processors (Pentium and Pentium-MMX) had 32 address lines, enabling them to use up to 4GB of memory; the Pentium Pro, Pentium II/III, and 4, and Xeon as well as the AMD Athlon family (including the Athlon 64), have 36 address lines and can manage an impressive 64GB. The Opteron uses 40-bit addressing, allowing up to 1TB of physical RAM. Itanium and Itanium 2 processors feature 44-bit addressing, which allows for up to 16TB (terabytes) of physical RAM!

See "Server Processor Specifications," p. 32.


In reality, the actual limit on today's server-class processors isn't the processor's memory-address capability. Instead, it's the cost of memory, in addition to the limitations of real-world server and memory design.

SIMMs, DIMMs, and RIMMs

Starting in the mid-1980s, motherboard designs using socketed or soldered individual memory chips (often referred to as dual inline package [DIP] chips) on systems began to be replaced by motherboards designed to use small circuit boards that had multiple memory chips soldered to them. The benefits of this approach included faster system assembly, increased reliability, and easier replacement of failed modules.

The first generation of memory modules used a design known as a single inline memory module (SIMM). For memory storage, most modern systems have adopted SIMMs, DIMMs, or RIMMs as an alternative to individual memory chips. These small boards plug in to special connectors on a motherboard or memory card. The individual memory chips are soldered to the module, so removing and replacing them is impossible. Instead, you must replace the entire module if any part of it fails. The module is treated as though it were one large memory chip.

One type of SIMMs, three main types of DIMMs, and one type of RIMM have been used in servers. The various types are often described by their pin count, memory row width, or memory type.

SIMMs, for example, were available in two main physical types30-pin (8 bits plus an option for 1 additional parity bit) and 72-pin (32 bits plus an option for 4 additional parity bits)with various capacities and other specifications. Most servers used the 72-pin version. The 30-pin SIMMs were physically smaller than the 72-pin versions, and either version could have chips on one or both sides. SIMMs were widely used from the late 1980s to the late 1990s and have become obsolete.

DIMMs are available in three main types. DIMMs usually hold standard SDRAM or DDR SDRAM chips and are distinguished by different physical characteristics. A standard DIMM has 168 pins, one notch on either side, and two notches along the contact area. The notches enabled the module to be keyed for motherboards using SDRAM DIMMs or for the much rarer EDO DRAM DIMMs. A DDR DIMM, on the other hand, has 184 pins, two notches on each side, and only one offset notch along the contact area. A DDR2 DIMM has 240 pins, two notches on each side, and one notch in the center of the contact area. All DIMMs are either 64 bits (non-ECC/parity) or 72 bits (parity or ECC) wide (data paths). The main physical difference between SIMMs and DIMMs is that DIMMs have different signal pins on each side of the module. That is why they are called dual inline memory modules, and why with only 1 inch of additional length, they have many more pins than a SIMM.

Double-Sided Memory Modules

There is confusion among users and even in the industry regarding the terms single-sided and double-sided with respect to memory modules. In truth, the single- or double-sided designation actually has nothing to do with whether chips are physically located on one or both sides of the module, and it has nothing to do with whether the module is a SIMM or DIMM (meaning whether the connection pins are single- or double-inline). Instead, the terms single-sided and double-sided are used to indicate whether the module has one or two banks of memory chips installed. A double-banked DIMM module has two complete 64-bit-wide banks of chips logically stacked so that the module is twice as deep (that is, has twice as many 64-bit rows). In most (but not all) cases, this requires chips to be on both sides of the module; therefore, the term double-sided has often been used to indicate that a module has two banks, even though that is technically incorrect. Single-banked modules (incorrectly referred to as single-sided) can have chips physically mounted on both sides of the module, and double-banked modules (incorrectly referred to as double-sided) can have chips physically mounted on only one side. It's a good idea to use the terms single-banked and double-banked instead of single-sided and double-sided because they are much more accurate and easily understood. Note that some systems cannot use double-banked modules.


RIMMs also have different signal pins on each side. Three different physical types of RIMMs are available: a 16/18-bit version with 184 pins, a 32/36-bit version with 232 pins, and a 64/72-bit version with 326 pins. Each of these plugs in to the same sized connector, but the notches in the connectors and RIMMs are different, to prevent mismatches. A given board will accept only one type. By far the most common type is the 16/18-bit version. The 32-bit version was introduced in late 2002, and the 64-bit version was introduced in 2004 but has never been produced.

A standard 16/18-bit RIMM has 184 pins, one notch on either side, and two notches centrally located in the contact area. 16-bit versions are used for non-ECC applications, whereas the 18-bit versions incorporate the additional bits necessary for ECC. Servers using RIMMs normally use the 18-bit versions.

Figures 5.4 through 5.9 show a typical 30-pin (8-bit) SIMM (seldom used in servers, by the way), 72-pin (32-bit) SIMM, 168-pin SDRAM DIMM, 184-pin DDR SDRAM (64-bit) DIMM, 240-pin DDR2 DIMM, and 184-pin RIMM, respectively. The pins are numbered from left to right and are connected through to both sides of the module on the SIMMs. The pins on the DIMM are different on each side, but on a SIMM, each side is the same as the other and the connections carry through. Note that all dimensions are in both inches and millimeters (in parentheses), and modules are generally available in ECC versions with 1 extra ECC (or parity) bit for every 8 data bits (multiples of 9 in data width) or versions that do not include ECC support (multiples of 8 in data width).

Figure 5.4. A typical 30-pin SIMM.


Figure 5.5. A typical 72-pin SIMM.


Figure 5.6. A typical 168-pin SDRAM DIMM.


Figure 5.7. A typical 184-pin DDR SDRAM DIMM.


Figure 5.8. A typical 240-pin DDR2 DIMM.


Figure 5.9. A typical 184-pin RIMM.


All these memory modules are fairly compact, considering the amount of memory they hold, and are available in several capacities and speeds. Table 5.10 lists the various capacities available for SIMMs, DIMMs, and RIMMs.

Table 5.10. SIMM, DIMM, and RIMM Capacities

Capacity

Standard

Parity/ECC

30-Pin SIMM

256KB

256KBx8

256KBx9

1MB

1MBx8

1MBx9

4MB

4MBx8

4MBx9

16MB

16MBx8

16MBx9

72-Pin SIMM

1MB

256KBx32

256KBx36

2MB

512KBx32

512KBx36

4MB

1MBx32

1MBx36

8MB

2MBx32

2MBx36

16MB

4MBx32

4MBx36

32MB

8MBx32

8MBx36

64MB

16MBx32

16MBx36

128MB

32MBx32

32MBx36

168/184-Pin DIMM/DDR DIMM

8MB

1MBx64

1MBx72

16MB

2MBx64

2MBx72

32MB

4MBx64

4MBx72

64MB

8MBx64

8MBx72

128MB

16MBx64

16MBx72

256MB

32MBx64

32MBx72

512MB

64MBx64

64MBx72

1,024MB

128MBx64

128MBx72

2,048MB

256MBx64

256MBx72

4,096MB

512MBx64

512MBx72

240-Pin DDR2 DIMM

256MB

32MBx64

32MBx72

512MB

64MBx64

64MBx72

1,024MB

128MBx64

128MBx72

2,048MB

256MBx64

256MBx72

4,096MB

512MBx64

512MBx72

184-Pin RIMM

64MB

32MBx16

32MBx18

128MB

64MBx16

64MBx18

256MB

128MBx16

128MBx18

512MB

256MBx16

256MBx18

1,024MB

512MBx16

512MBx18


SIMMs, DIMMs, DDR/DDR2 DIMMs, and RIMMs of each type and capacity are available in various speed ratings. You should consult your motherboard documentation for the correct memory speed and type for your system. It is usually best for the memory speed (also called throughput or bandwidth) to match the speed of the processor data bus (also called the FSB).

If a system requires a specific speed, you can almost always substitute faster speeds if the one specified is not available. Generally, no problems occur in mixing module speeds, as long as you use modules equal to or faster than what the system requires. Because there's little price difference between the various speed versions, buying faster modules than are necessary for a particular application might make them more usable in a future system that could require the faster speed.

Because DIMMs and RIMMs have onboard SPD that reports their speed and timing parameters to the system, most systems run the memory controller and memory bus at the speed matching the slowest DIMM/RIMM installed. Most DIMMs contain either SDRAM or DDR SDRAM memory chips.

Note

A bank is the smallest amount of memory needed to form a single row of memory addressable by the processor. It is the minimum amount of physical memory that is read or written by the processor at one time and usually corresponds to the data bus width of the processor. If a processor has a 64-bit data bus, a bank of memory is also 64 bits wide. If the memory is interleaved or runs dual-channel, a virtual bank is formed that is twice the absolute data bus width of the processor.


You can't always replace a module with a higher-capacity unit and expect it to work. Systems might have specific design limitations for the maximum capacity of module they can take. A larger-capacity module works only if the motherboard is designed to accept it in the first place. You should consult your system documentation to determine the correct capacity and speed to use.

Registered SDRAM and DDR DIMMs

SDRAM and DDR DIMMs are available in unbuffered and registered versions. Most desktop and some entry-level server motherboards are designed to use unbuffered modules, which allow the memory controller signals to pass directly to the memory chips on the module with no interference. This is not only the least expensive design but also the fastest and most efficient. Its only drawback is that the motherboard designer must place limits on how many modules (that is, module sockets) can be installed on the board and possibly also limit how many chips can be on a module. So-called double-sided modules that really have two banks of chips (twice as many as normal) on board might be restricted on some systems in certain combinations.

Systems designed to accept extremely large amounts of RAM, including most servers, often require registered modules. A registered module uses an architecture that has register chips on the module that act as an interface between the actual RAM chips and the chipset. The registers temporarily hold data passing to and from the memory chips and enable many more RAM chips to be driven or otherwise placed on the module than the chipset could normally support. This allows for motherboard designs that can support many modules and enables each module to have a larger number of chips. In general, registered modules are required by server or workstation motherboards designed to support more than 1GB or 2GB of RAM. The important thing to note is that you can use only the type of module your motherboard (or chipset) is designed to support.

To provide the space needed for the buffer chip, a registered DIMM is often taller than a standard DIMM. Figure 5.10 compares a typical registered DIMM to a typical unbuffered DIMM.

Figure 5.10. A typical registered DIMM is taller than a typical unbuffered DIMM to provide room for buffer and parity/ECC chips.


Tip

If you are installing registered DIMMs in a slim-line or blade server, clearance between the top of the DIMM and the case might be a problem in some situations. Some vendors sell low-profile registered DIMMs that are about the same height as an unbuffered DIMM. You should use this type of DIMM if your system does not have enough headroom for standard registered DIMMs. Some vendors sell only this type of DIMM for particular systems.


SIMM Pinouts

Table 5.11 shows the interface connector pinouts for standard 72-pin SIMMs. These SIMMs also include a special presence detect table that shows the configuration of the presence detect pins on various 72-pin SIMMs. The motherboard uses the presence detect pins to detect exactly what size and speed SIMM is installed. Industry-standard 30-pin SIMMs do not have a presence detect feature, but IBM did add this capability to its modified 30-pin configuration. Note that all SIMMs have the same pins on both sides of the module.

Table 5.11. Standard 72-Pin SIMM Pinout

Pin

SIMM Signal Name

1

Ground

2

Data Bit 0

3

Data Bit 16

4

Data Bit 1

5

Data Bit 17

6

Data Bit 2

7

Data Bit 18

8

Data Bit 3

9

Data Bit 19

10

+5 Vdc

11

Presence Detect 5

12

Address Bit 0

13

Address Bit 1

14

Address Bit 2

15

Address Bit 3

16

Address Bit 4

17

Address Bit 5

18

Address Bit 6

19

Address Bit 10

20

Data Bit 4

21

Data Bit 20

22

Data Bit 5

23

Data Bit 21

24

Data Bit 6

25

Data Bit 22

26

Data Bit 7

27

Data Bit 23

28

Address Bit 7

29

Address Bit 11

30

+5 Vdc

31

Address Bit 8

32

Address Bit 9

33

Address Bit 12

34

Address Bit 13

35

Parity Data Bit 2

36

Parity Data Bit 0

37

Parity Data Bit 1

38

Parity Data Bit 3

39

Ground

40

Column Address Strobe 0

41

Column Address Strobe 2

42

Column Address Strobe 3

43

Column Address Strobe 1

44

Row Address Strobe 0

45

Row Address Strobe 1

46

Reserved

47

Write Enable

48

ECC Optimized

49

Data Bit 8

50

Data Bit 24

51

Data Bit 9

52

Data Bit 25

53

Data Bit 10

54

Data Bit 26

55

Data Bit 11

56

Data Bit 27

57

Data Bit 12

58

Data Bit 28

59

+5 Vdc

60

Data Bit 29

61

Data Bit 13

62

Data Bit 30

63

Data Bit 14

64

Data Bit 31

65

Data Bit 15

66

EDO

67

Presence Detect 1

68

Presence Detect 2

69

Presence Detect 3

70

Presence Detect 4

71

Reserved

72

Ground (Gnd)


Notice that a 72-pin SIMM uses a set of four or five pins to indicate its type to the motherboard. These presence detect pins are either grounded or not connected to indicate the type of SIMM to the motherboard. Presence detect outputs must be tied to the ground through a 0-ohm resistor or jumper on the SIMMto generate a high logic level when the pin is open or a low logic level when the motherboard grounds the pin. This produces signals the memory interface logic can decode. If the motherboard uses presence detect signals, a POST procedure can determine the size and speed of the installed SIMMs and adjust control and addressing signals automatically. This enables autodetection of the memory size and speed.

In many ways, the presence detect pin function is similar to the industry-standard DX coding used on modern 35mm film rolls to indicate the ASA (speed) rating of the film to the camera. When you drop the film into the camera, electrical contacts can read the film's speed rating via an industry-standard configuration.

Table 5.12 shows the JEDEC industry-standard presence detect configuration listing for the 72-pin SIMM family. As discussed earlier in this chapter, JEDEC is an organization of U.S. semiconductor manufacturers and users that sets semiconductor standards.

Table 5.12. Presence Detect Pin Configurations for 72-Pin SIMMs[1]

Size

Speed

Pin 67

Pin 68

Pin 69

Pin 70

Pin 11

1MB

100ns

Gnd

Gnd

Gnd

1MB

80ns

Gnd

Gnd

1MB

70ns

Gnd

Gnd

1MB

60ns

Gnd

2MB

100ns

Gnd

Gnd

Gnd

2MB

80ns

Gnd

Gnd

2MB

70ns

Gnd

Gnd

2MB

60ns

Gnd

4MB

100ns

Gnd

Gnd

Gnd

Gnd

4MB

80ns

Gnd

Gnd

Gnd

4MB

70ns

Gnd

Gnd

Gnd

4MB

60ns

Gnd

Gnd

8MB

100ns

Gnd

Gnd

8MB

80ns

Gnd

8MB

70ns

Gnd

8MB

60ns

16MB

80ns

Gnd

Gnd

Gnd

16MB

70ns

Gnd

Gnd

Gnd

16MB

60ns

Gnd

Gnd

16MB

50ns

Gnd

Gnd

Gnd

Gnd

32MB

80ns

Gnd

Gnd

Gnd

32MB

70ns

Gnd

Gnd

Gnd

32MB

60ns

Gnd

Gnd

32MB

50ns

Gnd

Gnd

Gnd

Gnd


[1] Key: = no connection (open); Gnd = ground; Pin 67 = presence detect 1; Pin 68 = presence detect 2; Pin 69 = presence detect 3; Pin 70 = presence detect 4; and Pin 11 = presence detect 5.

Unfortunately, unlike in the photographic film industry, not everybody in the computer industry follows established standards, and presence detect signaling is not a standard throughout the PC industry. Different system manufacturers sometimes use different configurations for what is expected on these four pins. Many Compaq, IBM PS/2 systems, and Hewlett-Packard (HP) systems that used 72-pin SIMMs had nonstandard definitions for these pins. If you service very old servers that use 72-pin memory, don't assume that it can always be interchanged. Table 5.13 shows how IBM defined these pins.

Table 5.13. Presence Detect Pins for IBM 72-Pin SIMMs[1]

67

68

69

70

SIMM Type

IBM Part Number

Not a valid SIMM

N/A

Gnd

1MB 120ns

N/A

Gnd

2MB 120ns

N/A

Gnd

Gnd

2MB 70ns

92F0102

Gnd

8MB 70ns

64F3606

Gnd

Gnd

Reserved

N/A

Gnd

Gnd

2MB 80ns

92F0103

Gnd

Gnd

Gnd

8MB 80ns

64F3607

Gnd

Reserved

N/A

Gnd

Gnd

1MB 85ns

90X8624

Gnd

Gnd

2MB 85ns

92F0104

Gnd

Gnd

Gnd

4MB 70ns

92F0105

Gnd

Gnd

4MB 85ns

79F1003 (square notch) L40-SX

Gnd

Gnd

Gnd

1MB 100ns

N/A

Gnd

Gnd

Gnd

8MB 80ns

79F1004 (square notch) L40-SX

Gnd

Gnd

Gnd

2MB 100ns

N/A

Gnd

Gnd

Gnd

Gnd

4MB 80ns

87F9980

Gnd

Gnd

Gnd

Gnd

2MB 85ns

79F1003 (square notch) L40SX


[1] Key: = no connection (open); Gnd = ground; Pin 67 = presence detect 1; Pin 68 = presence detect 2; Pin 69 = presence detect 3; and Pin 70 = presence detect 4.

Although servers that use SIMMs are most likely to be outdated, you should keep these differences in mind if you are salvaging parts to keep an older server in service or if you must order memory for a server that uses SIMMs.

SIMM pins might be tin- or gold-plated. The plating on the module pins must match that on the socket pins, or corrosion will result.

Caution

To have the most reliable system, you must install modules with gold-plated contacts into gold-plated sockets and modules with tin-plated contacts into tin-plated sockets only. If you mix gold contacts with tin sockets, or vice versa, you are likely to experience memory failures from six months to one year after initial installation because a type of corrosion know as fretting takes place. This was a major problem with 72-pin SIMM-based systems because some memory and motherboard vendors opted for tin sockets and connectors, while others opted for gold. According to connector manufacturer AMP's "Golden Rules: Guidelines for the Use of Gold on Connector Contacts" (available at www.amp.com/products/technology/aurulrep.pdf) and "The Tin Commandments: Guidelines for the Use of Tin on Connector Contacts" (available at www.amp.com/products/technology/sncomrep.pdf), you should match connector metals. Commandment 7 from the Tin Commandments specifically states "Mating of tin-coated contacts to gold-coated contacts is not recommended."

If you are maintaining systems with mixed tin/gold contacts in which fretting has already taken place, use a wet contact cleaner. After cleaning, to improve electrical contacts and help prevent corrosion, you should use a liquid contact enhancer and lubricant called Stabilant 22 from D.W. Electrochemicals when installing SIMMs or DIMMs. Its website (www.stabilant.com/llsting.htm) has detailed application notes on this subject that provide more technical details.


DIMM Pinouts

Table 5.14 shows the pinout configuration of a 168-pin registered SDRAM DIMM. Note again that the pins on each side of the DIMM are different. All pins should be gold-plated.

Table 5.14. 168-Pin SDRAM DIMM Pinouts[1]

Pin

Signal

1

Gnd

2

Data Bit 0

3

Data Bit 1

4

Data Bit 2

5

Data Bit 3

6

+3.3V

7

Data Bit 4

8

Data Bit 5

9

Data Bit 6

10

Data Bit 7

11

Data Bit 8

12

Gnd

13

Data Bit 9

14

Data Bit 10

15

Data Bit 11

16

Data Bit 12

17

Data Bit 13

18

+3.3V

19

Data Bit 14

20

Data Bit 15

21

Parity Bit 0

22

Parity Bit 1

23

Gnd

24

25

26

+3.3V

27

WE#

28

I/O Mask 0

29

I/O Mask 1

30

Chip Select 0#

31

Do Not Use

32

Gnd

33

Address Bit 0

34

Address Bit 2

35

Address Bit 4

36

Address Bit 6

37

Address Bit 8

38

Address Bit 10

39

Bank Address 1

40

+3.3V

41

+3.3V

42

Clock 0

43

Gnd

44

Do Not Use

45

Chip Select 2#

46

I/O Mask 2

47

I/O Mask 3

48

Do Not Use

49

+3.3V

50

51

52

Parity Bit 2

53

Parity Bit 3

54

Gnd

55

Data Bit 16

56

Data Bit 17

57

Data Bit 18

58

Data Bit 19

59

+3.3V

60

Data Bit 20

61

62

63

Clock Enable 1

64

Gnd

65

Data Bit 21

66

Data Bit 22

67

Data Bit 23

68

Gnd

69

Data Bit 24

70

Data Bit 25

71

Data Bit 26

72

Data Bit 27

73

+3.3V

74

Data Bit 28

75

Data Bit 29

76

Data Bit 30

77

Data Bit 31

78

Gnd

79

Clock 2

80

81

SPD Write Protect

82

SPD Data

83

SPD Clock

84

+3.3V

85

Gnd

86

Data Bit 32

87

Data Bit 33

88

Data Bit 34

89

Data Bit 35

90

+3.3V

91

Data Bit 36

92

Data Bit 37

93

Data Bit 38

94

Data Bit 39

95

Data Bit 40

96

Gnd

97

Data Bit 41

98

Data Bit 42

99

Data Bit 43

100

Data Bit 44

101

Data Bit 45

102

+3.3V

103

Data Bit 46

104

Data Bit 47

105

Parity Bit 4

106

Parity Bit 5

107

Gnd

108

109

110

+3.3V

111

CAS#

112

I/O Mask 4

113

I/O Mask 5

114

Chip Select 1#

115

RAS#

116

Gnd

117

Address Bit 1

118

Address Bit 3

119

Address Bit 5

120

Address Bit 7

121

Address Bit 9

122

Bank Address 0

123

Address Bit 11

124

+3.3V

125

Clock 1

126

Reserved

127

Gnd

128

Clock Enable 0

129

Chip Select 3#

130

I/O Mask 6

131

I/O Mask 7

132

Reserved

133

+3.3V

134

135

136

Parity Bit 6

137

Parity Bit 7

138

Gnd

139

Data Bit 48

140

Data Bit 49

141

Data Bit 50

142

Data Bit 51

143

+3.3V

144

Data Bit 52

145

146

147

REGE[2]

148

Gnd

149

Data Bit 53

150

Data Bit 54

151

Data Bit 55

152

Gnd

153

Data Bit 56

154

Data Bit 57

155

Data Bit 58

156

Data Bit 59

157

+3.3V

158

Data Bit 60

159

Data Bit 61

160

Data Bit 62

161

Data Bit 63

162

Gnd

163

Clock 3

164

165

SPD Address 0

166

SPD Address 1

167

SPD Address 2

168

+3.3V


[1] Key: Gnd = ground; SPD = serial presence detect; = no connection; and REGE = register enable.

[2] No connection in an unbuffered DIMM.

A DIMM uses a completely different type of presence detect than a SIMM, called SPD. SPD consists of a small EEPROM, or flash memory, chip on the DIMM that contains specially formatted data indicating the DIMM's features. This serial data can be read via the serial data pins on the DIMM, and it enables the motherboard to autoconfigure to the exact type of DIMM installed.

DIMMs for PC-based servers use 3.3V power and might be unbuffered or registered. DIMMs made for Macintosh computers use a 5V buffered design. Keying in the socket and on the DIMM prevents the insertion of 5V DIMMs into a 3.3V slot or vice versa. See Figure 5.11.

Figure 5.11. 168-pin DRAM DIMM notch key definitions.


DDR DIMM Pinouts

Table 5.15 shows the pinout configuration of a 184-pin DDR SDRAM DIMM. Note again that the pins on each side of the DIMM are different. All pins are typically gold-plated.

Table 5.15. 184-Pin DDR DIMM Pinouts[1]

Pin

Signal

1

Reference +1.25V

2

Data Bit 0

3

Gnd

4

Data Bit 1

5

Data Strobe 0

6

Data Bit 2

7

+2.5 V

8

Data Bit 3

9

10

11

Gnd

12

Data Bit 8

13

Data Bit 9

14

Data Strobe 1

15

I/O +2.5V

16

Clock 1

17

Clock 1#

18

Gnd

19

Data Bit 10

20

Data Bit 11

21

Clock Enable 0

22

I/O +2.5V

23

Data Bit 16

24

Data Bit 17

25

Data Strobe 2

26

Gnd

27

Address Bit 9

28

Data Bit 18

29

Address Bit 7

30

I/O +2.5V

31

Data Bit 19

32

Address Bit 5

33

Data Bit 24

34

Gnd

35

Data Bit 25

36

Data Strobe 3

37

Address Bit 4

38

+2.5V

39

Data Bit 26

40

Data Bit 27

41

Address Bit 2

42

Gnd

43

Address Bit 1

44

Parity Bit 0

45

Parity Bit 1

46

+2.5V

47

Data Strobe 8

48

Address Bit 0

49

Parity Bit 2

50

Gnd

51

Parity Bit 3

52

Bank Address 1

53

Data Bit 32

54

I/O +2.5 V

55

Data Bit 33

56

Data Strobe 4

57

Data Bit 34

58

Gnd

59

Bank Address 0

60

Data Bit 35

61

Data Bit 40

62

I/O +2.5V

63

WE#

64

Data Bit 41

65

CAS#

66

Gnd

67

Data Strobe 5

68

Data Bit 42

69

Data Bit 43

70

+2.5V

71

S2#

72

Data Bit 48

73

Data Bit 49

74

Gnd

75

Clock 2#

76

Clock 2

77

I/O +2.5V

78

Data Strobe 6

79

Data Bit 50

80

Data Bit 51

81

Gnd

82

+2.5VID

83

Data Bit 56

84

Data Bit 57

85

+2.5V

86

Data Strobe 7

87

Data Bit 58

88

Data Bit 59

89

Gnd

90

SPD Write Protect

91

SPD Data

92

SPD Clock

93

Gnd

94

Data Bit 4

95

Data Bit 5

96

I/O +2.5V

97

Data Strobe 9

98

Data Bit 6

99

Data Bit 7

100

Gnd

101

102

103

Address Bit 13

104

I/O +2.5V

105

Data Bit 12

106

Data Bit 13

107

Data Strobe 10

108

+2.5V

109

Data Bit 14

110

Data Bit 15

111

Clock Enable 1

112

I/O +2.5V

113

Bank Address 2

114

Data Bit 20

115

Address Bit 12

116

Gnd

117

Data Bit 21

118

Address Bit 11

119

Data Strobe 11

120

+2.5V

121

Data Bit 22

122

Address Bit 8

123

Data Bit 23

124

Gnd

125

Address Bit 6

126

Data Bit 28

127

Data Bit 29

128

I/O +2.5V

129

Data Strobe 12

130

Address Bit 3

131

Data Bit 30

132

Gnd

133

Data Bit 31

134

Parity Bit 4

135

Parity Bit 5

136

I/O +2.5V

137

Clock 0

138

Clock 0#

139

Gnd

140

Data Strobe 17

141

Address Bit 10

142

Parity Bit 6

143

I/O +2.5V

144

Parity Bit 7

145

Gnd

146

Data Bit 36

147

Data Bit 37

148

+2.5V

149

Data Strobe 13

150

Data Bit 38

151

Data Bit 39

152

Gnd

153

Data Bit 44

154

RAS#

155

Data Bit 45

156

I/O +2.5V

157

S0#

158

S1#

159

Data Strobe 14

160

Gnd

161

Data Bit 46

162

Data Bit 47

163

S3#

164

I/O +2.5V

165

Data Bit 52

166

Data Bit 53

167

FETEN

168

+2.5V

169

Data Strobe 15

170

Data Bit 54

171

Data Bit 55

172

I/O +2.5V

173

174

Data Bit 60

175

Data Bit 61

176

Gnd

177

Data Strobe 16

178

Data Bit 62

179

Data Bit 63

180

I/O +2.5V

181

SPD Address 0

182

SPD Address 1

183

SPD Address 2

184

SPD +2.5V


[1] Key: Gnd = ground; SPD = serial presence detect; and = no connection.

DDR DIMMs use a single key notch to indicate voltage, as shown in Figure 5.12.

Figure 5.12. 184-pin DDR SDRAM DIMM keying.


A 184-pin DDR DIMM uses two notches on each side to enable compatibility with both low- and high-profile latched sockets. Note that the key position is offset with respect to the center of the DIMM to prevent it from being inserted in the socket backward. The key notch is positioned to the left, centered, or to the right of the area between pins 52 and 53. The position indicates the I/O voltage for the DDR DIMM and prevents the installation of the wrong type into a socket that might damage the DIMM.

DDR2 DIMM Pinouts

Table 5.16 shows the pinout configuration of a 240-pin DDR2 SDRAM DIMM. Pins 1120 are on the front side, and pins 121240 are on the back. All pins should be gold-plated.

Table 5.16. 240-Pin DDR2 DIMM Pinouts

Pin

Signal

1

VREF

2

VSS

3

DQ0

4

DQ1

5

VSS

6

-DQS0

7

DQS0

8

VSS

9

DQ2

10

DQ3

11

VSS

12

DQ8

13

DQ9

14

VSS

15

-DQS1

16

DQS1

17

VSS

18

19

20

VSS

21

DQ10

22

DQ11

23

VSS

24

DQ16

25

DQ17

26

VSS

27

-DQS2

28

DQS2

29

VSS

30

DQ18

31

DQ19

32

VSS

33

DQ24

34

DQ25

35

VSS

36

-DQS3

37

DQS3

38

VSS

39

DQ26

40

DQ27

41

VSS

42

43

44

VSS

45

46

47

VSS

48

49

50

VSS

51

VDDQ

52

CKE0

53

VDD

54

55

56

VDDQ

57

A11

58

A7

59

VDD

60

A5

61

A4

62

VDDQ

63

A2

64

VDD

65

VSS

66

VSS

67

VDD

68

69

VDD

70

A10/-AP

71

BA0

72

VDDQ

73

-WE

74

-CAS

75

VDDQ

76

-CS1

77

ODT1

78

VDDQ

79

SS

80

DQ32

81

DQ33

82

VSS

83

-DQS4

84

DQS4

85

VSS

86

DQ34

87

DQ35

88

VSS

89

DQ40

90

DQ41

91

VSS

92

-DQS5

93

DQS5

94

VSS

95

DQ42

96

DQ43

97

VSS

98

DQ48

99

DQ49

100

VSS

101

SA2

102

103

VSS

104

-DQS6

105

DQS6

106

VSS

107

DQ50

108

DQ51

109

VSS

110

DQ56

111

DQ57

112

VSS

113

-DQS7

114

DQS7

115

VSS

116

DQ58

117

DQ59

118

VSS

119

SDA

120

SCL

121

VSS

122

DQ4

123

DQ5

124

VSS

125

DM0

126

127

VSS

128

DQ6

129

DQ7

130

VSS

131

DQ12

132

DQ13

133

VSS

134

DM1

135

136

VSS

137

CK1

138

-CK1

139

VSS

140

DQ14

141

DQ15

142

VSS

143

DQ20

144

DQ21

145

VSS

146

DM2

147

148

VSS

149

DQ22

150

DQ23

151

VSS

152

DQ28

153

DQ29

154

VSS

155

DM3

156

157

VSS

158

DQ30

159

DQ31

160

VSS

161

162

163

VSS

164

165

166

VSS

167

168

169

VSS

170

VDDQ

171

CKE1

172

VDD

173

174

175

VDDQ

176

A12

177

A9

178

VDD

179

A8

180

A6

181

VDDQ

182

A3

183

A1

184

VDD

185

CK0

186

-CK0

187

VDD

188

A0

189

VDD

190

BA1

191

VDDQ

192

-RAS

193

-CS0

194

VDDQ

195

ODT0

196

A13

197

VDD

198

VSS

199

DQ36

200

DQ37

201

VSS

202

DM4

203

204

VSS

205

DQ38

206

DQ39

207

VSS

208

DQ44

209

DQ45

210

VSS

211

DM5

212

213

VSS

214

DQ46

215

DQ47

216

VSS

217

DQ52

218

DQ53

219

VSS

220

CK2

221

-CK2

222

VSS

223

DM6

224

225

VSS

226

DQ54

227

DQ55

228

VSS

229

DQ60

230

DQ61

231

VSS

232

DM7

233

234

VSS

235

DQ62

236

DQ63

237

VSS

238

VDDSPD

239

SA0

240

SA1


A 240-pin DDR2 DIMM uses two notches on each side to enable compatibility with both low- and high-profile latched sockets. The connector key is offset with respect to the center of the DIMM to prevent it from being inserted into the socket backward. The key notch is positioned in the center of the area between pins 64 and 65 on the front (184/185 on the back), and there is no voltage keying because all DDR2 DIMMs run on 1.8V.

RIMM Pinouts

RIMM modules and sockets are gold-plated and designed for 25 insertion/removal cycles. A 16/18-bit RIMM has 184 pins, split into two groups of 92 pins on opposite ends and sides of the module. Table 5.17 shows the pinout configuration of a RIMM.

Table 5.17. RIMM Pinout[1]

Pin

Signal

Pin

Signal

A1

Gnd

B1

Gnd

A2

LData Bit A8

B2

LData Bit A7

A3

Gnd

B3

Gnd

A4

LData Bit A6

B4

LData Bit A5

A5

Gnd

B5

Gnd

A6

LData Bit A4

B6

LData Bit A3

A7

Gnd

B7

Gnd

A8

LData Bit A2

B8

LData Bit A1

A9

Gnd

B9

Gnd

A10

LData Bit A0

B10

Interface Clock+

A11

Gnd

B11

Gnd

A12

LCTMN

B12

Interface Clock-

A13

Gnd

B13

Gnd

A14

LCTM

B14

A15

Gnd

B15

Gnd

A16

B16

LROW2

A17

Gnd

B17

Gnd

A18

LROW1

B18

LROW0

A19

Gnd

B19

Gnd

A20

LCOL4

B20

LCOL3

A21

Gnd

B21

Gnd

A22

LCOL2

B22

LCOL1

A23

Gnd

B23

Gnd

A24

LCOL0

B24

LData Bit B0

A25

Gnd

B25

Gnd

A26

LData Bit B1

B26

LData Bit B2

A27

Gnd

B27

Gnd

A28

LData Bit B3

B28

LData Bit B4

A29

Gnd

B29

Gnd

A30

LData Bit B5

B30

LData Bit B6

A31

Gnd

B31

Gnd

A32

LData Bit B7

B32

LData Bit B8

A33

Gnd

B33

Gnd

A34

LSCK

B34

LCMD

A35

VCMOS

B35

VCMOS

A36

SOUT

B36

SIN

A37

VCMOS

B37

VCMOS

A38

B38

A39

Gnd

B39

Gnd

A40

B40

A41

+2.5V

B41

+2.5V

A42

+2.5V

B42

+2.5V

A43

B43

A44

B44

A45

B45

A46

B46

A47

B47

A48

B48

A49

B49

A50

B50

A51

VREF

B51

VREF

A52

Gnd

B52

Gnd

A53

SPD Clock

B53

SPD Address 0

A54

+2.5V

B54

+2.5V

A55

SDA

B55

SPD Address 1

A56

SVDD

B56

SVDD

A57

SPD Write Protect

B57

SPD Address 2

A58

+2.5V

B58

+2.5V

A59

RSCK

B59

RCMD

A60

Gnd

B60

Gnd

A61

Rdata Bit B7

B61

RData Bit B8

A62

Gnd

B62

Gnd

A63

Rdata Bit B5

B63

RData Bit B6

A64

Gnd

B64

Gnd

A65

Rdata Bit B3

B65

RData Bit B4

A66

Gnd

B66

Gnd

A67

Rdata Bit B1

B67

RData Bit B2

A68

Gnd

B68

Gnd

A69

RCOL0

B69

RData Bit B0

A70

Gnd

B70

Gnd

A71

RCOL2

B71

RCOL1

A72

Gnd

B72

Gnd

A73

RCOL4

B73

RCOL3

A74

Gnd

B74

Gnd

A75

RROW1

B75

RROW0

A76

Gnd

B76

Gnd

A77

B77

RROW2

A78

Gnd

B78

Gnd

A79

RCTM

B79

A80

Gnd

B80

Gnd

A81

RCTMN

B81

RCFMN

A82

Gnd

B82

Gnd

A83

Rdata Bit A0

B83

RCFM

A84

Gnd

B84

Gnd

A85

Rdata Bit A2

B85

RData Bit A1

A86

Gnd

B86

Gnd

A87

Rdata Bit A4

B87

RData Bit A3

A88

Gnd

B88

Gnd

A89

Rdata Bit A6

B89

RData Bit A5

A90

Gnd

B90

Gnd

A91

Rdata Bit A8

B91

RData Bit A7

A92

Gnd

B92

Gnd


[1] Key: = no connection (open); Gnd = ground; Pin 67 = presence detect 1; Pin 68 = presence detect 2; Pin 69 = presence detect 3; and Pin 70 = presence detect 4.

A 16/18-bit RIMM is keyed with two notches in the center. This prevents backward insertion and prevents the wrong type (voltage) of RIMM from being used in a system. To allow for changes in RIMM designs, three keying options are possible in the design (see Figure 5.13). The left key (indicated as "DATUM A" in Figure 5.13) is fixed in position, but the center key can be in three different positions, spaced 1mm or 2mm to the right, indicating different types of RIMMs. The current default is Option A, as shown in Figure 5.13 and Table 5.18, which corresponds to 2.5V operation.

Figure 5.13. RIMM keying options.


Table 5.18. Possible Keying Options for RIMMs

Option

Notch Separation

Description

A

11.5mm

2.5V RIMM

B

12.5mm

Reserved

C

13.5mm

Reserved


A RIMM incorporates an SPD device, which is essentially a flash ROM onboard. This ROM contains information about the RIMM's size and type, including detailed timing information for the memory controller. The memory controller automatically reads the data from the SPD ROM to configure the system to match the RIMMs installed.

Figure 5.14 shows a typical RIMM installation. Note that RIMM sockets not occupied by a module cannot be left empty but must be filled with a continuity module (essentially a RIMM module without memory). This enables the memory bus to remain continuous from the controller through each module (and, therefore, each RDRAM device on the module) until the bus finally terminates on the motherboard.

Figure 5.14. Typical RDRAM bus layout showing a RIMM and one continuity module.


Determining a Memory Module's Size and Features

Most memory modules are labeled with a sticker indicating the module's type, speed rating, and manufacturer. If you are attempting to determine whether existing memory can be used in a new server, or if you need to replace memory in an existing server, this information can be very useful.

If you have memory modules that are not labeled, however, you can still determine the module type, speed, and capacity if the memory chips on the module are clearly labeled. For example, assume that you have a memory module with chips labeled thus:

MT46V64M8TG-75

By using an Internet search engine such as Google and entering the number from one of the memory chips, you can usually find the datasheet for the memory chips. Note for a registered memory module that you want to look up the part number for the memory chips (usually eight or more chips) rather than the buffer chips on the module (one to three, depending on the module design).

In this example, the part number turns out to be a Micron memory chip that decodes like this:

MT = Micron Technologies (the memory chip maker)

46 = DDR SDRAM

V = 2.5V DC

64M8 = 8 million rows x 8 (equals 64) x 8 banks (often written as 64MB x 8)

TG = 66-pin TSOP chip package

75 = 7.5ns @ CL2 latency (DDR 266)

The full datasheet for this example is located at http://download.micron.com/pdf/datasheets/dram/ddr/512MBDDRx4x8x16.pdf.

From this information, we can determine that the module has the following characteristics:

  • The module runs at DDR266 speeds, using standard 2.5V DC voltage.

  • The module has a latency of CL2, so it can be used on any system that requires CL2 or slower latencies.

  • Each chip has a capacity of 512Mb (64x8=512).

  • Each chip contains 8 bits. It takes 8 bits to make a byte, so the capacity of the module can be calculated by grouping the memory chips on the module into groups of 8. If each chip contains 512MB, a group of 8 means that the module has a size of 512MB (512Mbx8=512MB). A dual-bank module has two groups of 8 chips for a capacity of 1GB (512Mbx8x=1024MB, or 1GB).

If the module has 9, instead of 8, memory chips (or 18 instead of 16), the additional chips are used for parity checking and support ECC error correction on servers with this feature.

To determine the size of the module in megabytes or gigabytes and to determine whether the module supports ECC, count the memory chips on the module and compare this number to Table 5.19. Note that the size of each memory chip in Mb is the same as the size in MB if the memory chips use an 8-bit design.

Table 5.19. Module Capacity Using 512Mb (64Mb x 8) Chips

Number of Chips

Number of Bits in Each Bank

Module Size

Supports ECC

Single or Dual-Bank

8

64

512MB

No

Single

9

72

512MB

Yes

Single

16

64

1GB

No

Dual

18

72

1GB

Yes

Dual


The additional chip used by each group of 8 chips provides parity checking, which is used by the ECC function on most server motherboards to correct single-bit errors.

A registered module will contain full-sized memory chips plus additional chips for ECC/parity and buffering. These chips are usually smaller in size and located near the center of the module, as shown in Figure 5.10.

Note

Some modules use 16-bit-wide memory chips. In such cases, only 4 chips are needed for single-bank memory (5 with parity/ECC support) and 8 for double-bank memory (10 with parity/ECC support). These memory chips use a design listed as capacity x 16, such as 256Mbx16.


You can also see this information if you look up the manufacturer, the memory type, and the organization in a search engine. For example, this web search:

Micron "64 Meg x 8" DDR DIMM

locates a parts list for Micron's 512MB and 1GB modules at www.micron.com/products/modules/ddrsdram/partlist.aspx?pincount=184-pin&version=Registered&package=VLP%20DIMM. The Comp. Config column lists the chip design for each chip on the module.

As you can see, with a little detective work, you can determine the size, speed, and type of a memory module, even if the module isn't marked, as long as the markings on the memory chips themselves are legible.

Tip

If you are unable to decipher a chip part number, you can use the HWiNFO or SiSoftware Sandra program to identify your memory module, as well as many other facts about your computer, including chipset, processor, empty memory sockets, and much more. You can download shareware versions of HWiNFO from www.hwinfo.com and SiSoftware Sandra from www.sisoftware.net.


Memory Banks

Memory chips (DIPs, SIMMs, SIPPs, and DIMMs) are organized in banks on motherboards and memory cards. You should know your memory bank layout and position on the motherboard and memory cards.

You need to know the bank layout when adding memory to a system. In addition, memory diagnostics report error locations by byte and bit addresses, and you must use these numbers to locate which bank in your system contains the problem.

The banks usually correspond to the data bus capacity of the system's microprocessor. Table 5.20 shows the widths of individual banks, based on the type of server processor used and whether the chipset operates in single-channel or dual-channel mode.

Table 5.20. Memory Bank Widths on Various Systems

Processor

Data Bus

Memory Bank Size (No Parity)

Memory Bank Size (Parity/ECC)

30-Pin SIMMs Per Bank

72-Pin SIMMs per Bank

DIMMs per Bank

Pentium Pro, PII, Xeon, Pentium III, PIII Xeon, P4, Athlon MP, Xeon(single-channel mode), Itanium

64-bit

64 bits

72 bits

8[1]

2[2]

1

P4, Xeon, Opteron (dual-channel mode), Itanium 2

64-bit

128-bit[3]

144-bit[3]

2


[1] Very few, if any, motherboards using this type of memory were made for these processors.

[2] 72-pin SIMMs were used by some systems running Pentium Pro, Pentium, and Pentium II and Pentium II Xeon processors; they were replaced by SDRAM and newer types of DIMMs.

[3] These systems require matched pairs of memory to operate in dual-channel mode. If a single module or two different-sized modules are used, the system runs in single-channel mode.

DIMMs are ideal for Pentium and higher systems because the 64-bit width of the DIMM exactly matches the 64-bit width of the Pentium processor data bus. Therefore, each DIMM represents an individual bank, and DIMMs can be added or removed, one at a time. Note that for dual-channel operation, matched pairs of DIMMs must be inserted into the appropriate slots on the motherboard. Note that the Itanium 2 runs only in dual-channel mode.

The physical orientation and numbering of the memory module sockets used on a motherboard is arbitrary and determined by the board's designers, so documentation covering your system or card is handy, particularly if you want to take advantage of the additional performance available from running recent server designs in dual-channel mode.

Memory Module Speed

When you replace a failed memory module or install a new module as an upgrade, you typically must install a module of the same type and speed as the others in the system. You can substitute a module with a different speed, but only if the replacement module's speed is equal to or faster than that of the other modules in the system.

Some people have had problems when mixing modules of different speeds. With the wide variety of motherboards, chipsets, and memory types, few ironclad rules exist. When in doubt as to which speed module to install in your system, you should consult the motherboard documentation for more information.

Substituting faster memory of the same type doesn't result in improved performance if the system still operates the memory at the same speed. Systems that use DIMMs or RIMMs can read the speed and timing features of the module from a special SPD ROM installed on the module and then set chipset (memory controller) timing accordingly. In these systems, you might see an increase in performance by installing faster modules, to the limit of what the chipset will support.

To place more emphasis on timing and reliability, Intel and JEDEC standards govern memory types that require certain levels of performance. A number of common symptoms result when the system memory has failed or is simply not fast enough for the system's timing. The usual symptoms are frequent parity check errors or a system that does not operate at all. The POST might report errors, too. If you're unsure of which chips to buy for your system, you should contact the system manufacturer or a reputable chip supplier.

See "Parity Checking," p. 390.


Parity and ECC

Part of the nature of memory is that it inevitably fails. Memory failures are usually classified as two basic types: hard fails and soft errors.

The best understood memory failures are hard fails, in which the chip is working and then, because of some flaw, physical damage, or other event, becomes damaged and experiences a permanent failure. Fixing this type of failure normally requires replacing some part of the memory hardware, such as the chip, SIMM, or DIMM. Hard error rates are known as HERs.

The other, more insidious, type of failure is the soft error, which is a nonpermanent failure that might never recur or could occur only at infrequent intervals. (Soft fails are effectively "fixed" by powering the system off and back on.) Soft error rates are known as SERs.

Problems with early memory chips were caused by alpha particles, a very weak form of radiation coming from trace radioactive elements in chip packaging. Although this cause of soft errors was eliminated years ago, cosmic rays have proven to be a major cause of soft errors, particularly as memory chip densities have increased.

Although cosmic rays and other radiation events are the biggest cause of soft errors, soft errors can also be caused by the following:

  • Power glitches or noise on the line This can be caused by a defective power supply in the system or by defective power at the outlet.

  • Incorrect type or speed rating The memory must be the correct type for the chipset and match the system access speed.

  • Radio frequency interference (RFI) RFI is caused by radio transmitters in close proximity to the system, which can generate electrical signals in system wiring and circuits. Keep in mind that the increased use of wireless networks, keyboards, and mouse devices can lead to a greater risk of RFI.

  • Static discharges Static discharge causes momentary power spikes, which alter data.

  • Timing glitches In a timing glitch, data doesn't arrive at the proper place at the proper time, causing errors. They are often caused by improper settings in the BIOS Setup, by memory that is rated slower than the system requires, or by overclocked processors and other system components.

  • Heat buildup High-speed memory modules run hotter than older modules. Note that RDRAM RIMM modules were the first memory to include integrated heat spreaders, and many high-performance DDR and DDR2 memory modules now include heat spreaders to help fight heat buildup.

Most of these problems don't cause chips to permanently fail (although bad power or static can damage chips permanently), but they can cause momentary problems with data.

Although soft errors are regarded as an unavoidable consequence of desktop and portable computer operation, system lockups are absolutely unacceptable for servers and other mission-critical systems. The best way to deal with this problem is to increase the system's fault tolerance. This means implementing ways of detecting and possibly correcting errors in PC systems.

Historically, early PCs and servers used a type of fault tolerance known as parity checking, while more recent servers use fault-tolerance methods that actually correct memory errors.

Parity Checking

One standard that IBM set for the industry is that the memory chips in a bank of nine each handle 1 bit of data: 8 bits per character plus 1 extra bit, called the parity bit. The parity bit enables memory-control circuitry to keep tabs on the other 8 bitsa built-in cross-check for the integrity of each byte in the system. If the circuitry detects an error, the computer stops and displays a message, informing the user of the malfunction. In a GUI operating system, a parity error generally manifests itself as a locked system. Upon reboot, the BIOS should detect the error and display the appropriate error message.

SIMMs and DIMMs are available both with and without parity bits. Originally, all PC systems used parity-checked memory to ensure accuracy. Although desktop PCs began to abandon parity-checking in 1994 (saving 10% to 15% on memory costs), servers continue to use parity checking. Parity can't correct system errors, but because parity can detect errors, it can make the user aware of memory errors when they happen. This has two basic benefits:

  • Parity guards against the consequences of faulty calculations based on incorrect data.

  • Parity pinpoints the source of errors, which helps with problem resolution, thus improving system serviceability.

Let's look at how parity checking works and then examine in more detail the successor to parity checking, called ECC, which can not only detect but correct memory errors on-the-fly.

How Parity Checking Works

IBM originally established the odd parity standard for error checking. As the 8 individual bits in a byte are stored in memory, a parity generator/checker, which is either part of the CPU or located in a special chip on the motherboard, evaluates the data bits by adding up the number of 1s in the byte. If an even number of 1s is found, the parity generator/checker creates a 1 and stores it as the ninth bit (the parity bit) in the parity memory chip. That makes the sum for all 9 bits (including the parity bit) an odd number. If the original sum of the 8 data bits is an odd number, the parity bit created would be a 0, keeping the sum for all 9 bits an odd number. The basic rule is that the value of the parity bit is always chosen so that the sum of all 9 bits (8 data bits plus 1 parity bit) is stored as an odd number. If the system used even parity, the example would be the same, except the parity bit would ensure an even sum. It doesn't matter whether even or odd parity is used; the system uses one or the other, and it is completely transparent to the memory chips involved. Remember that the 8 data bits in a byte are numbered 0 1 2 3 4 5 6 7. The following examples might make it easier to understand:

Data bit number:  0 1 2 3 4 5 6 7  Parity bit Data bit value:   1 0 1 1 0 0 1 1  0 


In this example, because the total number of data bits with a value of 1 is an odd number (5), the parity bit must have a value of 0 to ensure an odd sum for all 9 bits.

Here is another example:

Data bit number:  0 1 2 3 4 5 6 7  Parity bit Data bit value:   1 1 1 1 0 0 1 1  1 


In this example, because the total number of data bits with a value of 1 is an even number (6), the parity bit must have a value of 1 to create an odd sum for all 9 bits.

When a system reads memory back from storage, it checks the parity information. If a (9-bit) byte has an even number of bits, that byte must have an error. The system can't tell which bit has changed or whether only a single bit has changed. If 3 bits changed, for example, the byte still flags a parity-check error; if 2 bits changed, however, the bad byte could pass unnoticed. Because multiple bit errors (in a single byte) are rare, this scheme gives a reasonable and inexpensive ongoing indication that memory is good or bad.

Parity error messages vary by system, but usually include a reference to parity check or NMI (non-maskable interrupt). Most systems that use parity checking do not halt the CPU when a parity error is detected; instead, they display an error message and offer the choice of rebooting the system or continuing as though nothing happened. Although you don't need to reboot a system after a parity error, it makes sense to do so because the contents of memory might be corrupted. Obviously, parity checking is not sufficient fault tolerance for servers.

ECC

ECC goes a big step beyond simple parity error detection. Instead of just detecting an error, ECC allows a single-bit error to be corrected, which means the system can continue without interruption and without corrupting data. Older implementations of ECC can only detect, not correct, double-bit errors. Because studies have indicated that approximately 98% of memory errors are the single-bit variety, the most commonly used type of ECC is one in which the attendant memory controller detects and corrects single-bit errors in an accessed data word (double-bit errors can be detected but not corrected). This type of ECC is known as single-bit error-correctiondouble-bit error detection (SEC-DED) and requires an additional 7 check bits over 32 bits in a 4-byte system and an additional 8 check bits over 64 bits in an 8-byte system. Consequently, you can use parity-checked (36-bit SIMM or 72-bit DIMM) memory in any system that supports ECC memory (as most recent servers do), and the system will use the parity bits for ECC mode. RIMMs are installed in singles or pairs, depending on the chipset and motherboard. They must be 18-bit or 36-bit versions if parity/ECC is desired.

ECC entails the memory controller calculating the check bits on a memory-write operation, performing a comparison between the read and calculated check bits on a read operation, and, if necessary, correcting bad bits. ECC has a slight effect on memory write performance. This is because the operation must be timed to wait for the calculation of check bits and, when the system waits for corrected data, reads. On a partial-word write, the entire word must first be read, the affected byte(s) rewritten, and then new check bits calculated. This turns partial-word write operations into slower read-modify writes. Fortunately, this performance hit is very small, on the order of a few percent at maximum, so the tradeoff for increased reliability is good.

An ECC-based system is a good choice for servers, workstations, or mission-critical applications in which the cost of a potential memory error outweighs the additional memory and system cost to correct it, along with ensuring that it does not detract from system reliability. If you value your data and use your system for important (to you) tasks, you want ECC memory, assuming, of course, that your system supports it. You should check the specifications for a new server or server motherboard to ensure that it supports ECC.

Advanced Error Correction Technologies

Although single-bit ECC is useful for entry-level servers with memory below 4GB, today's high-capacity servers (some of which have memory sizes up to 64GB) and higher memory module capacities (1GB and larger) need more effective error correction technologies.

Many recent servers support Advanced ECC (also known as ChipKill), which differs from standard ECC in its ability to correct up to 4-bit errors that take place within the same memory module. Early versions of Advanced ECC/ChipKill required special Advanced ECC memory modules, but current implementations support standard parity/ECC memory modules.

Another method used on high-end servers is hot-plug RAID memory. This technology uses five memory controllers to create a memory array, similar in concept to a RAID 5 disk array. Four of the controllers store memory data in a striped fashion, while the fifth controller stores parity information. If memory connected to one of the memory controllers used for data fails, it can be removed and replaced without taking down the server. The contents of the original memory are rebuilt from the striped data and parity information in the other modules.

Memory scrubbing is another technique many recent servers use. Memory scrubbing tests memory during idle periods for errors and, if possible, corrects them. If correction is not possible, the system informs the operator that the memory module that has failed.

If you are building or purchasing a midrange or high-end server, you should find out which types of advanced memory error correction technologies are used and choose hardware that provides the best combination of performance and reliability for your needs.




Upgrading and Repairing Servers
Upgrading and Repairing Servers
ISBN: 078972815X
EAN: 2147483647
Year: 2006
Pages: 240

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net