| ||
Another code development tool is to do a memory allocation check for any compiled code to verify that the memory manager is indeed allocating properly aligned memory for the superset single instruction multiple data (SIMD) instruction sets. You can test this by executing a simple algorithm such as the following one.
#define RAM_TBL_SIZE 4096 int main(int argc, char *argv[]) { unsigned int n, nCnt, bSet, bTst, bMsk, a, b; void *pTbl[RAM_TBL_SIZE]; // Allocate a series of incr. size blocks for (n=0; n<RAM_TBL_SIZE; n++) { //pTbl[n]=(byte *) malloc(n+1); pTbl[n]= new byte[n+1]; if (NULL==pTbl[n]) { cout << "low memory. (continuing)..." << endl; break; } } nCnt = n; // # of entries allocated // Test memory for alignments bSet = 16; // Preset to 128-bit (12816) bTst = bSet - 1; bMsk = ~bTst; for (n=0; n<nCnt; n++) { a = (unsigned int) pTbl[n]; do // round up to 'bSet' bits { b = (a + bTst) & bMsk; if (a==b) { break; // okay } // Unaligned... bSet >>= 1; // reduce by a bit bTst = bSet - 1; bMsk = ~0 ^ bTst; } while(1); } // Release all of memory to clean up for (n=0; n<nCnt; n++) { byte *pRaw; pRaw = pTbl[n]; pTbl[n] = NULL; delete [] pRaw; //free(pTbl[n]); } cout << "Ram Alignment is set to " << bSet; cout << " bytes (" << (bSet<<3) << " bits).\n"; cout << flush; return 0; }
Please note that it loops up to 4096 times, slowly increasing the size of the allocation just in case the memory manager issues properly aligned memory for a short period of time before allocating any that might be skewed. Also, you will most likely get a memory low message but that is okay; you are only allocating about 8 MB or so. If everything is fine, there will be a match between your processor and the following table.
SIMD Instruction Set (Data Width) | Bits | Bytes |
---|---|---|
AMD 3D Now! | 64 | 8 |
AMD 3D Now! Extensions | 64 | 8 |
AMD 3D Now! MMX Extensions | 64 | 8 |
AMD 3D Now! Professional | 64/128 | 8/16 |
MMX | 64 | 8 |
SSE | 128 | 16 |
SSE2 | 128 | 16 |
SSE3 | 128 | 16 |
If there is a mismatch, then you have an alignment error problem. This can be rectified by using memory allocation code similar to that in Listing 2-4. This function is designed to wrap the standard function call to malloc() or new[]. Do not forget to add the assertion as a good programming practice.
The following header is hidden at the true base of memory allocated by our function. Basically, memory is slightly overallocated. The malloc function is in essence routed to the correct core allocation function.
typedef struct RamHeadType { uint32 nReqSize; // Requested size uint32 extra[3]; // Padding to help align to 16 byte } RamHead;
void * ramAlloc(uint nReqSize) { byte *pMem; RamHead *pHead; uint nSize; ASSERT_ZERO(nReqSize); // Force to 16-byte block + room for header nSize = ALIGN16(nReqSize) + sizeof(RamHead); //pMem = (byte*)malloc(nSize); pMem = new byte[ nSize ]; pHead = (RamHead *)pMem; if (NULL==pMem) { // Allocation error } else { // Save Req Size pHead->nReqSize = nReqSize + sizeof(RamHead); pHead->extra[0] = 1; pHead->extra[1] = 2; pHead->extra[2] = 3; // Align by adj header +4 to +16 bytes pMem = (byte *) ALIGN16(((uint)pMem) + sizeof(uint32)); } return (void*)pMem; }
How this functions is that it aligns the amount of memory requested to the nearest 16-byte boundary. This will assist in maintaining memory to a 16-byte block size. An additional 16 bytes are allocated as the header. This is useful for two reasons:
The memory passed to the calling function can be forced to the proper alignment.
A side benefit of storing the requested size is that size adjustments similar to a realloc() can be issued and the calling function does not have to know what the current size is when releasing that memory back to the pool.
Hidden in the beginning of the allocated memory is a header where the requested size is stored in the first 32-bit word and the other three words are set to the values of {1, 2, 3}. The pointer is then advanced to a 16-byte alignment and passed to the calling function.
When releasing memory back to the system, the returned pointer needs to be unadjusted back to the true base address; otherwise a memory exception will occur. The following function wraps the release function free().
This may seem wasteful , but the base address of the memory being allocated by new or malloc is unknown. With current malloc libraries it tends to be 4- or 8-byte aligned, so there is a need to allocate for a worst case.
void ramFree(const void * const pRaw) { uint32 *pMem; byte *pbMem; ASSERT_PTR4(pRaw); ASSERT_PTR(*pRaw); pMem = (uint32 *)pRaw; if (*(--pMem)< sizeof(RamHead)) { pMem -= *pMem; } // pMem original (unadjusted) pointer pbMem = (byte *)pMem; // free(pbMem); delete [] pbMem; }
The memory release occurs by decrementing the word pointer by one 4-byte word. If that location contains a value between one and three, the pointer is decremented by that value so that it then points at the size information when cast to a RamHead pointer. This is the true memory base position and the pointer that gets returned to the system function free().
For C++ fans, the new and delete operators can be overloaded to this insulating memory module. I also recommend one final item: The memory allocation and release functions should require a pointer to be passed. This will allow the release function to nullify the pointer, and in future enhancements each pointer could be considered the "owner" of the memory and thus adjusted for any garbage collection algorithms instituted for a heap compaction in a flat memory environment.
A pointer is passed as ppMem and set.
bool ramGet(byte ** const ppMem, uint nReqSize) { ASSERT_PTR4(ppMem); *ppMem = (byte *) ramAlloc(nReqSize); return (NULL!=*ppMem) ? true : false; }
bool ramGetClr(byte **const ppMem, uint nReqSize) { bool ret; ASSERT_PTR4(ppMem); ret = false; *ppMem = (byte *)ramAlloc(nReqSize); if (NULL!=*ppMem) { ramZero(*ppMem, nReqSize); ret = true; } return ret; }
void ramRelease(byte ** const ppMem) { ASSERT_PTR4(ppMem); ramFree(*ppMem); *ppMem = NULL; }