|  We have discussed at length the performance of SRAM in relation to DRAM. Main memory is a collection of DRAM cells . If we were to draw a simple diagram of the relationship between processor, cache, and memory, it would look something like Figure A-9.   Figure A-9. Simple processor model with cache.    
  With both instructions and data being stored in main memory, we will come to a point where  pipelined  instructions need to access memory. Were we to have a single  pool  of memory as in Figure A-9, we would quickly come across  memory contention  where both data and instructions are trying to be fetched via the same  datum path  . The result would be that our pipeline would stall for longer than we would like. One solution is to split the  pool  of memory into multiple blocks of memory with independent paths. This is a common practice because it is not overly expensive and we have increased the overall memory bandwidth by providing multiple locations to store instructions and data. The blocks of memory are known as  memory banks  and the process of using each  bank  simultaneously is known as  interleaving  . The idea here is that when the memory subsystem ( circuitry between the cache and memory) needs to transfer a cache line from main memory, it will go to a specific  memory bank  to retrieve it. Loading the very next cache line will require the memory subsystem to go to the very next memory bank; hence, we interleave between memory banks and reduce the overall access time by avoiding  memory contention  . Figure A-10 shows a memory subsystem with four memory banks (we have left off an L2 cache for simplicity).   Figure A-10. A cache with interleaved main memory.   
  The reason we have gone to the level of detail of showing you memory banks and the cache line size is to reiterate the importance that programmers can have on our overall system performance. Let's assume that we have data elements of varying size, but they are all a multiple of the word size that is 32 bytes. In a single cache line, we can accommodate eight words. Our program has used a  structure  or  array  to store these data elements. Unfortunately, two things have conspired against us: first, the design of our  structure/array  is like this:    
  struct employee{ name; (= 3 words) employee_number; (=1 word) age; (=1word) address; ( =6 words) marital_status; (=1 word) job_code; (=2 words) job_title; (=3 words) job_location; (=3 words) deparment; (=2 words) manager; (=3 words) performance_rank; (=3 words) salary; (=2 words) salary_bank_sortcode; (=2 words) salary_bank_account; (=2 words) social_security_number; (=2 words) }; 
  We will assume that storage of these data elements is sequential in memory and we can pack words together one after another. In total, we have a structure that is 32 words in size. The second thing conspiring against us is that we have four memory banks. If we were to load an entire  employee structure  from memory, it would take up four cache lines. Suppose that we wanted look up all  employee number  s:  The employee_num  b  er  is stored in cache line 1, 5, 9, and so on. It so happens that data access patterns like this highlight is what is known as  bank contention  . Every access to a subsequent  employee_num  b  er  means that we are fetching the next data item for the same memory bank. In this way, we are not utilizing the benefits that  interleaving  had promised us. We would need to instigate a detailed study to uncover whether this form of access pattern was common. If so, we would have to modify our data model to alleviate it; we would need to employ further empirical studies to measure the response times of our applications under varying workloads and varying data models before making a decision on how to reengineer our application ”a task that most people are not willing to undertake.  |