Compiler Intrinsics

The more recent Visual C++ and Intel compilers support a method of programming in assembly language referred to as intrinsics . This is where the functionality of SIMD instructions has been wrapped within C wrappers and compiled into code as inline code. Let us examine the following example:

 void test(float *c, float a, float b) {    *c = a + b; }

Not to oversimplify the power of using intrinsics to get code up and running quickly, the following code uses intrinisics in conjunction with (__m128) XMM registers with SSE single-precision floating-point instructions. Note that it looks more complicated, but I chose a simple scalar expression to resolve.

 #include <xmmintrin.h> void test(float *c, float a, float b) {   __m128 ta, tb;        ta = _mm_load_ps(&a);    tb = _mm_load_ps(&b);    ta = _mm_add_ps(ta, tb);    _mm_store_ps(c, ta); }

But underneath in the pure assembly code generated by the compiler this breaks down to something similar to the following:

 push     ebx mov      ebx,esp sub      esp,8 and      esp,0FFFFFFF0h    ; 16-byte align stack add      esp,4 push     ebp mov      ebp,dword ptr [ebx+4] mov      dword ptr [esp+4],ebp mov      ebp,esp sub      esp,98h push     esi push     edi   ; __m128 ta, tb   ; ta = _mm_load_ps(&a); lea      eax,[ebx+0Ch] movaps   xmm0,xmmword ptr [eax] movaps   xmmword ptr [ebp-30h],xmm0 movaps   xmm0,xmmword ptr [ebp-30h] movaps   xmmword ptr [ebp-10h],xmm0   ; tb = _mm_load_ps(&b); lea      eax,[ebx+10h] movaps   xmm0,xmmword ptr [eax] movaps   xmmword ptr [ebp-40h],xmm0 movaps   xmm0,xmmword ptr [ebp-40h] movaps   xmmword ptr [ebp-20h],xmm0   ; ta = _mm_add_ps(ta, tb); movaps   xmm0,xmmword ptr [ebp-20h] movaps   xmm1,xmmword ptr [ebp-10h] addps    xmm1,xmm0 movaps   xmmword ptr [ebp-50h],xmm1 movaps   xmm0,xmmword ptr [ebp-50h] movaps   xmmword ptr [ebp-10h],xmm0   ; _mm_store_ps(c, ta); movaps   xmm0,xmmword ptr [ebp-10h] mov      eax,dword ptr [ebx+8] movaps   xmmword ptr [eax],xmm0     pop      edi pop      esi mov      esp,ebp pop      ebp mov      esp,ebx pop      ebx ret