Cache Manipulation

Several mechanisms have been put into place to squeeze optimal throughput from the processors. One method of cache manipulation discussed in Chapter 10, "Branching," is Intel's hint as to the prediction of logic flow through branches counter to the static prediction logic. Another mechanism is a hint to the processor about cache behavior so as to give the processor insight into how a particular piece of code is utilizing memory access. Here is a brief review of some terms that have already been discussed:

  • Temporal data Memory that requires multiple accesses and therefore needs to be loaded into a cache for better throughput.

  • Non-temporal hint A hint (an indicator) to the processor that memory only requires a single access (one shot). This would be similar to copying a block of memory or performing a calculation, but the result is not going to be needed for a while so there is no need to write it into the cache. Thus, the memory access has no need to read and load cache, and therefore the code can be faster.

For speed and efficiency, when memory is accessed for read or write a cache line containing that data (whose length is dependent upon manufacturer and version) is copied from system memory to high-speed cache memory. The processor performs read/write operations on the cache memory. When a cache line is invalidated, the write back of that cache line to system memory occurs. In a multiprocessor system, this occurs frequently due to non-sharing of internal caches. The second stage of writing the cache line back to system memory is called a "write back."

Cache Sizes

Different processors have different cache sizes for data and for code. These are dependent upon processor model, manufacturer, etc., as shown below:

CPU

L1 Cache (Data /Code)

L2 Cache

Celeron

16Kb /16Kb

256Kb

Pentium 4

8Kb /12K m ops

512Kb

Athlon XP

64Kb /64Kb

256Kb

Duron

64Kb /64Kb

64Kb

Pentium M

32Kb /32Kb

1024Kb

Xeon

 

512Kb

Depending on your code and level of optimization, the size of the cache may be of importance. For the purposes of this book, however, it is being ignored, as that topic is more suitable for a book very specifically targeting heavy-duty optimization. This book, however, is interested in the cache line size as that is more along the lightweight optimization that has been touched on from time to time. It should be noted that AMD uses a minimum size of 32 bytes.

Cache Line Sizes

The (code/data) cache line size determines how many instruction/data bytes can be preloaded.

Intel

Cache Line Size

PIII

32

Pentium M

64

P4

64

Xeon

64

AMD

Cache Line Size

Athlon

64

Opteron

64

The cache line size can be obtained by using the CPUID instruction with EAX set to 1. The following calculation will give you the actual cache line size.

 mov   eax,1 cpuid     and   ebx,00000FF00h shr   ebx,8-3           ; ebx = size of cache line 

PREFETCH x Prefetch Data into Caches

Mnemonic

P

PII

K6

3D!

3Mx+

SSE

SSE2

A64

SSE3

E64T

PREFETCH

     



32.64-Bit 80X86 Assembly Language Architecture
32/64-Bit 80x86 Assembly Language Architecture
ISBN: 1598220020
EAN: 2147483647
Year: 2003
Pages: 191

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net