A.4 Main Memory | HP-UX CSE(c) Official Study Guide and Desk Reference

We have discussed at length the performance of SRAM in relation to DRAM. Main memory is a collection of DRAM cells . If we were to draw a simple diagram of the relationship between processor, cache, and memory, it would look something like Figure A-9.

Figure A-9. Simple processor model with cache.

graphics/ap01fig09.gif

With both instructions and data being stored in main memory, we will come to a point where pipelined instructions need to access memory. Were we to have a single pool of memory as in Figure A-9, we would quickly come across memory contention where both data and instructions are trying to be fetched via the same datum path . The result would be that our pipeline would stall for longer than we would like. One solution is to split the pool of memory into multiple blocks of memory with independent paths. This is a common practice because it is not overly expensive and we have increased the overall memory bandwidth by providing multiple locations to store instructions and data. The blocks of memory are known as memory banks and the process of using each bank simultaneously is known as interleaving . The idea here is that when the memory subsystem ( circuitry between the cache and memory) needs to transfer a cache line from main memory, it will go to a specific memory bank to retrieve it. Loading the very next cache line will require the memory subsystem to go to the very next memory bank; hence, we interleave between memory banks and reduce the overall access time by avoiding memory contention . Figure A-10 shows a memory subsystem with four memory banks (we have left off an L2 cache for simplicity).

Figure A-10. A cache with interleaved main memory.

The reason we have gone to the level of detail of showing you memory banks and the cache line size is to reiterate the importance that programmers can have on our overall system performance. Let's assume that we have data elements of varying size, but they are all a multiple of the word size that is 32 bytes. In a single cache line, we can accommodate eight words. Our program has used a structure or array to store these data elements. Unfortunately, two things have conspired against us: first, the design of our structure/array is like this:

 struct employee{ name; (= 3 words) employee_number; (=1 word) age; (=1word) address; ( =6 words) marital_status; (=1 word) job_code; (=2 words) job_title; (=3 words) job_location; (=3 words) deparment; (=2 words) manager; (=3 words) performance_rank; (=3 words) salary; (=2 words) salary_bank_sortcode; (=2 words) salary_bank_account; (=2 words) social_security_number; (=2 words) };

We will assume that storage of these data elements is sequential in memory and we can pack words together one after another. In total, we have a structure that is 32 words in size. The second thing conspiring against us is that we have four memory banks. If we were to load an entire employee structure from memory, it would take up four cache lines. Suppose that we wanted look up all employee number s: The employee_num b er is stored in cache line 1, 5, 9, and so on. It so happens that data access patterns like this highlight is what is known as bank contention . Every access to a subsequent employee_num b er means that we are fetching the next data item for the same memory bank. In this way, we are not utilizing the benefits that interleaving had promised us. We would need to instigate a detailed study to uncover whether this form of access pattern was common. If so, we would have to modify our data model to alleviate it; we would need to employ further empirical studies to measure the response times of our applications under varying workloads and varying data models before making a decision on how to reengineer our application ”a task that most people are not willing to undertake.