Memory Systems | 32/64-Bit 80x86 Assembly Language Architecture

Another code development tool is to do a memory allocation check for any compiled code to verify that the memory manager is indeed allocating properly aligned memory for the superset single instruction multiple data (SIMD) instruction sets. You can test this by executing a simple algorithm such as the following one.

RamTest Memory Alignment Test

Listing 2-2: ...\chap02\ramTest\Bench.cpp

 #define RAM_TBL_SIZE       4096 int main(int argc, char *argv[]) {   unsigned int n, nCnt, bSet, bTst, bMsk, a, b;   void *pTbl[RAM_TBL_SIZE];          // Allocate a series of incr. size blocks       for (n=0; n<RAM_TBL_SIZE; n++)   {   //pTbl[n]=(byte *) malloc(n+1);     pTbl[n]= new byte[n+1];     if (NULL==pTbl[n])     {       cout << "low memory.   (continuing)..." << endl;       break;     }   }   nCnt = n;                // # of entries allocated         //   Test memory for alignments       bSet = 16;               // Preset to 128-bit (12816)   bTst = bSet - 1;   bMsk = ~bTst;       for (n=0; n<nCnt; n++)   {       a = (unsigned int) pTbl[n];       do                  // round up to 'bSet' bits       {         b = (a + bTst) & bMsk;         if (a==b)         {            break;             // okay         }                               // Unaligned...         bSet >>= 1;           // reduce by a bit         bTst = bSet - 1;         bMsk = ~0 ^ bTst;       } while(1);   }         // Release all of memory to clean up       for (n=0; n<nCnt; n++)   {     byte *pRaw;     pRaw = pTbl[n];     pTbl[n] = NULL;         delete [] pRaw;   //free(pTbl[n]);   }       cout << "Ram Alignment is set to " << bSet;   cout << " bytes (" << (bSet<<3) << " bits).\n";   cout << flush;   return 0; }

Please note that it loops up to 4096 times, slowly increasing the size of the allocation just in case the memory manager issues properly aligned memory for a short period of time before allocating any that might be skewed. Also, you will most likely get a memory low message but that is okay; you are only allocating about 8 MB or so. If everything is fine, there will be a match between your processor and the following table.

Table 2-1: SIMD instruction set with data width in bits and bytes
SIMD Instruction Set (Data Width)	Bits	Bytes
AMD 3D Now!	64	8
AMD 3D Now! Extensions	64	8
AMD 3D Now! MMX Extensions	64	8
AMD 3D Now! Professional	64/128	8/16
MMX	64	8
SSE	128	16
SSE2	128	16
SSE3	128	16

If there is a mismatch, then you have an alignment error problem. This can be rectified by using memory allocation code similar to that in Listing 2-4. This function is designed to wrap the standard function call to malloc() or new[]. Do not forget to add the assertion as a good programming practice.

Memory Header

The following header is hidden at the true base of memory allocated by our function. Basically, memory is slightly overallocated. The malloc function is in essence routed to the correct core allocation function.

Listing 2-3: \chap02\ram\ram.cpp

 typedef struct RamHeadType {   uint32  nReqSize;   // Requested size   uint32  extra[3];   // Padding to help align to 16 byte } RamHead;

Allocate Memory (Malloc Wrapper)

Listing 2-4: \chap02\ram\ram.cpp

 void * ramAlloc(uint nReqSize) {   byte *pMem;   RamHead *pHead;   uint nSize;       ASSERT_ZERO(nReqSize);         // Force to 16-byte block + room for header       nSize = ALIGN16(nReqSize) + sizeof(RamHead);     //pMem = (byte*)malloc(nSize);   pMem = new byte[ nSize ];   pHead = (RamHead *)pMem;       if (NULL==pMem)   {            //   Allocation error   }   else   {            // Save Req Size       pHead->nReqSize = nReqSize + sizeof(RamHead);       pHead->extra[0] = 1;       pHead->extra[1] = 2;       pHead->extra[2] = 3;       // Align by adj header +4 to +16 bytes         pMem = (byte *) ALIGN16(((uint)pMem) + sizeof(uint32));   }   return (void*)pMem; }

How this functions is that it aligns the amount of memory requested to the nearest 16-byte boundary. This will assist in maintaining memory to a 16-byte block size. An additional 16 bytes are allocated as the header. This is useful for two reasons:

The memory passed to the calling function can be forced to the proper alignment.
A side benefit of storing the requested size is that size adjustments similar to a realloc() can be issued and the calling function does not have to know what the current size is when releasing that memory back to the pool.

Hidden in the beginning of the allocated memory is a header where the requested size is stored in the first 32-bit word and the other three words are set to the values of {1, 2, 3}. The pointer is then advanced to a 16-byte alignment and passed to the calling function.

When releasing memory back to the system, the returned pointer needs to be unadjusted back to the true base address; otherwise a memory exception will occur. The following function wraps the release function free().

This may seem wasteful , but the base address of the memory being allocated by new or malloc is unknown. With current malloc libraries it tends to be 4- or 8-byte aligned, so there is a need to allocate for a worst case.

Release Memory (Free Wrapper)

Listing 2-5: ...\chap02\ram\ram.cpp

 void ramFree(const void * const pRaw) {   uint32 *pMem;   byte   *pbMem;       ASSERT_PTR4(pRaw);   ASSERT_PTR(*pRaw);       pMem = (uint32 *)pRaw;   if (*(--pMem)< sizeof(RamHead))   {     pMem -= *pMem;   }     // pMem original (unadjusted) pointer   pbMem = (byte *)pMem; // free(pbMem);   delete [] pbMem; }

The memory release occurs by decrementing the word pointer by one 4-byte word. If that location contains a value between one and three, the pointer is decremented by that value so that it then points at the size information when cast to a RamHead pointer. This is the true memory base position and the pointer that gets returned to the system function free().

For C++ fans, the new and delete operators can be overloaded to this insulating memory module. I also recommend one final item: The memory allocation and release functions should require a pointer to be passed. This will allow the release function to nullify the pointer, and in future enhancements each pointer could be considered the "owner" of the memory and thus adjusted for any garbage collection algorithms instituted for a heap compaction in a flat memory environment.

Allocate Memory

A pointer is passed as ppMem and set.

Listing 2-6: ...\chap02\ram\ram.cpp

 bool ramGet(byte ** const ppMem, uint nReqSize) {   ASSERT_PTR4(ppMem);       *ppMem = (byte *) ramAlloc(nReqSize);   return (NULL!=*ppMem) ? true : false; }

Allocate (Cleared) Memory

Listing 2-7: ...\chap02\ram\ram.cpp

 bool ramGetClr(byte **const ppMem, uint nReqSize) {   bool ret;       ASSERT_PTR4(ppMem);       ret = false;   *ppMem = (byte *)ramAlloc(nReqSize);   if (NULL!=*ppMem)   {     ramZero(*ppMem, nReqSize);     ret = true;   }       return ret; }

Free Memory Pointer Is Set to NULL

Listing 2-8: ...\chap02\ram\ram.cpp

 void ramRelease(byte ** const ppMem) {   ASSERT_PTR4(ppMem);       ramFree(*ppMem);   *ppMem = NULL; }