| ||
Several mechanisms have been put into place to squeeze optimal throughput from the processors. One method of cache manipulation discussed in Chapter 10, "Branching," is Intel's hint as to the prediction of logic flow through branches counter to the static prediction logic. Another mechanism is a hint to the processor about cache behavior so as to give the processor insight into how a particular piece of code is utilizing memory access. Here is a brief review of some terms that have already been discussed:
Temporal data Memory that requires multiple accesses and therefore needs to be loaded into a cache for better throughput.
Non-temporal hint A hint (an indicator) to the processor that memory only requires a single access (one shot). This would be similar to copying a block of memory or performing a calculation, but the result is not going to be needed for a while so there is no need to write it into the cache. Thus, the memory access has no need to read and load cache, and therefore the code can be faster.
For speed and efficiency, when memory is accessed for read or write a cache line containing that data (whose length is dependent upon manufacturer and version) is copied from system memory to high-speed cache memory. The processor performs read/write operations on the cache memory. When a cache line is invalidated, the write back of that cache line to system memory occurs. In a multiprocessor system, this occurs frequently due to non-sharing of internal caches. The second stage of writing the cache line back to system memory is called a "write back."
Different processors have different cache sizes for data and for code. These are dependent upon processor model, manufacturer, etc., as shown below:
CPU | L1 Cache (Data /Code) | L2 Cache |
---|---|---|
Celeron | 16Kb /16Kb | 256Kb |
Pentium 4 | 8Kb /12K m ops | 512Kb |
Athlon XP | 64Kb /64Kb | 256Kb |
Duron | 64Kb /64Kb | 64Kb |
Pentium M | 32Kb /32Kb | 1024Kb |
Xeon | 512Kb |
Depending on your code and level of optimization, the size of the cache may be of importance. For the purposes of this book, however, it is being ignored, as that topic is more suitable for a book very specifically targeting heavy-duty optimization. This book, however, is interested in the cache line size as that is more along the lightweight optimization that has been touched on from time to time. It should be noted that AMD uses a minimum size of 32 bytes.
The (code/data) cache line size determines how many instruction/data bytes can be preloaded.
Intel | Cache Line Size |
---|---|
PIII | 32 |
Pentium M | 64 |
P4 | 64 |
Xeon | 64 |
AMD | Cache Line Size |
---|---|
Athlon | 64 |
Opteron | 64 |
The cache line size can be obtained by using the CPUID instruction with EAX set to 1. The following calculation will give you the actual cache line size.
mov eax,1 cpuid and ebx,00000FF00h shr ebx,8-3 ; ebx = size of cache line
Mnemonic
P
PII
K6
3D!
3Mx+
SSE
SSE2
A64
SSE3
E64T
PREFETCH
EAN: 2147483647
Pages: 191
If you may any questions please contact us: flylib@qtcs.net