Memory Systems

Another code development tool is to do a memory allocation check for any compiled code to verify that the memory manager is indeed allocating properly aligned memory for the superset single instruction multiple data (SIMD) instruction sets. You can test this by executing a simple algorithm such as the following one.

RamTest Memory Alignment Test

Listing 2-2: ...\chap02\ramTest\Bench.cpp
image from book
 #define RAM_TBL_SIZE       4096 int main(int argc, char *argv[]) {   unsigned int n, nCnt, bSet, bTst, bMsk, a, b;   void *pTbl[RAM_TBL_SIZE];          // Allocate a series of incr. size blocks       for (n=0; n<RAM_TBL_SIZE; n++)   {   //pTbl[n]=(byte *) malloc(n+1);     pTbl[n]= new byte[n+1];     if (NULL==pTbl[n])     {       cout << "low memory.   (continuing)..." << endl;       break;     }   }   nCnt = n;                // # of entries allocated         //   Test memory for alignments       bSet = 16;               // Preset to 128-bit (12816)   bTst = bSet - 1;   bMsk = ~bTst;       for (n=0; n<nCnt; n++)   {       a = (unsigned int) pTbl[n];       do                  // round up to 'bSet' bits       {         b = (a + bTst) & bMsk;         if (a==b)         {            break;             // okay         }                               // Unaligned...         bSet >>= 1;           // reduce by a bit         bTst = bSet - 1;         bMsk = ~0 ^ bTst;       } while(1);   }         // Release all of memory to clean up       for (n=0; n<nCnt; n++)   {     byte *pRaw;     pRaw = pTbl[n];     pTbl[n] = NULL;         delete [] pRaw;   //free(pTbl[n]);   }       cout << "Ram Alignment is set to " << bSet;   cout << " bytes (" << (bSet<<3) << " bits).\n";   cout << flush;   return 0; } 
image from book
 

Please note that it loops up to 4096 times, slowly increasing the size of the allocation just in case the memory manager issues properly aligned memory for a short period of time before allocating any that might be skewed. Also, you will most likely get a memory low message but that is okay; you are only allocating about 8 MB or so. If everything is fine, there will be a match between your processor and the following table.

Table 2-1: SIMD instruction set with data width in bits and bytes

SIMD Instruction Set (Data Width)

Bits

Bytes

AMD 3D Now!

64

8

AMD 3D Now! Extensions

64

8

AMD 3D Now! MMX Extensions

64

8

AMD 3D Now! Professional

64/128

8/16

MMX

64

8

SSE

128

16

SSE2

128

16

SSE3

128

16

If there is a mismatch, then you have an alignment error problem. This can be rectified by using memory allocation code similar to that in Listing 2-4. This function is designed to wrap the standard function call to malloc() or new[]. Do not forget to add the assertion as a good programming practice.

Memory Header

The following header is hidden at the true base of memory allocated by our function. Basically, memory is slightly overallocated. The malloc function is in essence routed to the correct core allocation function.

Listing 2-3: \chap02\ram\ram.cpp
image from book
 typedef struct RamHeadType {   uint32  nReqSize;   // Requested size   uint32  extra[3];   // Padding to help align to 16 byte } RamHead; 
image from book
 

Allocate Memory (Malloc Wrapper)

Listing 2-4: \chap02\ram\ram.cpp
image from book
 void * ramAlloc(uint nReqSize) {   byte *pMem;   RamHead *pHead;   uint nSize;       ASSERT_ZERO(nReqSize);         // Force to 16-byte block + room for header       nSize = ALIGN16(nReqSize) + sizeof(RamHead);     //pMem = (byte*)malloc(nSize);   pMem = new byte[ nSize ];   pHead = (RamHead *)pMem;       if (NULL==pMem)   {            //   Allocation error   }   else   {            // Save Req Size       pHead->nReqSize = nReqSize + sizeof(RamHead);       pHead->extra[0] = 1;       pHead->extra[1] = 2;       pHead->extra[2] = 3;       // Align by adj header +4 to +16 bytes         pMem = (byte *) ALIGN16(((uint)pMem) + sizeof(uint32));   }   return (void*)pMem; } 
image from book
 

How this functions is that it aligns the amount of memory requested to the nearest 16-byte boundary. This will assist in maintaining memory to a 16-byte block size. An additional 16 bytes are allocated as the header. This is useful for two reasons:

  • The memory passed to the calling function can be forced to the proper alignment.

  • A side benefit of storing the requested size is that size adjustments similar to a realloc() can be issued and the calling function does not have to know what the current size is when releasing that memory back to the pool.

Hidden in the beginning of the allocated memory is a header where the requested size is stored in the first 32-bit word and the other three words are set to the values of {1, 2, 3}. The pointer is then advanced to a 16-byte alignment and passed to the calling function.

When releasing memory back to the system, the returned pointer needs to be unadjusted back to the true base address; otherwise a memory exception will occur. The following function wraps the release function free().

This may seem wasteful , but the base address of the memory being allocated by new or malloc is unknown. With current malloc libraries it tends to be 4- or 8-byte aligned, so there is a need to allocate for a worst case.

Release Memory (Free Wrapper)

Listing 2-5: ...\chap02\ram\ram.cpp
image from book
 void ramFree(const void * const pRaw) {   uint32 *pMem;   byte   *pbMem;       ASSERT_PTR4(pRaw);   ASSERT_PTR(*pRaw);       pMem = (uint32 *)pRaw;   if (*(--pMem)< sizeof(RamHead))   {     pMem -= *pMem;   }     // pMem original (unadjusted) pointer   pbMem = (byte *)pMem; // free(pbMem);   delete [] pbMem; } 
image from book
 

The memory release occurs by decrementing the word pointer by one 4-byte word. If that location contains a value between one and three, the pointer is decremented by that value so that it then points at the size information when cast to a RamHead pointer. This is the true memory base position and the pointer that gets returned to the system function free().

For C++ fans, the new and delete operators can be overloaded to this insulating memory module. I also recommend one final item: The memory allocation and release functions should require a pointer to be passed. This will allow the release function to nullify the pointer, and in future enhancements each pointer could be considered the "owner" of the memory and thus adjusted for any garbage collection algorithms instituted for a heap compaction in a flat memory environment.

Allocate Memory

A pointer is passed as ppMem and set.

Listing 2-6: ...\chap02\ram\ram.cpp
image from book
 bool ramGet(byte ** const ppMem, uint nReqSize) {   ASSERT_PTR4(ppMem);       *ppMem = (byte *) ramAlloc(nReqSize);   return (NULL!=*ppMem) ? true : false; } 
image from book
 

Allocate (Cleared) Memory

Listing 2-7: ...\chap02\ram\ram.cpp
image from book
 bool ramGetClr(byte **const ppMem, uint nReqSize) {   bool ret;       ASSERT_PTR4(ppMem);       ret = false;   *ppMem = (byte *)ramAlloc(nReqSize);   if (NULL!=*ppMem)   {     ramZero(*ppMem, nReqSize);     ret = true;   }       return ret; } 
image from book
 

Free Memory Pointer Is Set to NULL

Listing 2-8: ...\chap02\ram\ram.cpp
image from book
 void ramRelease(byte ** const ppMem) {   ASSERT_PTR4(ppMem);       ramFree(*ppMem);   *ppMem = NULL; } 
image from book
 


32.64-Bit 80X86 Assembly Language Architecture
32/64-Bit 80x86 Assembly Language Architecture
ISBN: 1598220020
EAN: 2147483647
Year: 2003
Pages: 191

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net