Sample CPU Detection Code | 32/64-Bit 80x86 Assembly Language Architecture

There are a lot of features in the CPUID, but most of them are not needed for what we are doing here. I have documented some of what this instruction does (a lot more than what I normally need), but I strongly recommend that if you are truly interested in this instruction that you download the manufacturer's technical manuals.

Most programs being written these days are primarily written for a Protected Mode environment and so we only need to deal with, at a minimum, the first processor capable of truly running in Protected Mode the 386 processor. (The 80286 does not count!) This CPU detection algorithm detects the model, manufacturer, and capabilities, and sets flags as such. As we really only deal with 32-bit modes in this book, we do not bother detecting for an 8086, 80186, or an 80286. We do, however, detect for a 386 or above. In our algorithm we use the following CPU IDs.

This instruction has been enhanced since I wrote Vector Game Math Processors as newer instructions have been added to the processor. It has been used throughout the book, but let us examine it a bit closer.

 ;       CPU Detect - definition IDs CPU_386          = 3       ; 80386 CPU_486          = 4       ; 80486 CPU_PENTIUM      = 5       ; P5 (Pentium) CPU_PENTIUM_PRO  = 6       ; Pentium Pro CPU_PII          = 6       ; PII

Prior to the Pentium processor, a computer system would optionally have a floating-point chip, which contained a FPU. In the case of CPUs, no functionality is lost as one upgrades to a more advanced processor; they are all downward compatible. This is not the case with the FPU. Some functionality was lost; so if writing any floating-point instructions, you should know which FPU you are coding for. Some external FP chips did not exactly match the processor but were compatible.

 ; Legacy CPUs and compatible FPU coprocessors ;               CPU_086        NONE, FPU_087 ;               CPU_186        NONE, FPU_087 ;               CPU_286        NONE, FPU_287 ;               CPU_386        NONE, FPU_287, FPU_387 ;               CPU_486        NONE, FPU_387, FPU_487     ;        FPU Detect - definition IDs     FPU_NONE        =   0          ; No FPU chip FPU_087         =   1          ; 8087 FPU_287         =   2          ; 80287 FPU_387         =   3          ; 80387 FPU_487         =   CPU_486 FPU_PENTIUM     =   CPU_PENTIUM FPU_PII         =   CPU_PII

The various manufacturers implemented the same functionality as Intel but recently have begun to do their own. Due to this, unions and intersections can be drawn, and so we use individual flags to indicate CPU capability.

x86 CPU Detect Bit Flags

 typedef enum {    CPUBITS_FPU          = 0x0001, // FPU flag    CPUBITS_MMX          = 0x0002, // MMX flag    CPUBITS_3DNOW        = 0x0004, // 3DNow! flag    CPUBITS_FXSR         = 0x0008, // Fast FP Store    CPUBITS_SSE          = 0x0010, // SSE    CPUBITS_SSE2         = 0x0020, // SSE (Ext 2)    CPUBITS_3DNOW_MMX    = 0x0040, // 3DNow! (MMX Ext)    CPUBITS_3DNOW_EXT    = 0x0080, // 3DNow! (Ext)    CPUBITS_3DNOW_SSE    = 0x0100, // 3DNow! Professional    CPUBITS_HTT          = 0x0200, // Hyperthreading Tech    CPUBITS_SSE3         = 0x0400, // Prescott NI    CPUBITS_EM64T        = 0x0800, // EM64T supported    CPUBITS_AMD64        = 0x1000, // AMD Long Mode } CPUBITS;

Each manufacturer has its own unique optimization methods and so we get a vendor name .

x86 CPU Detect Vendors

Listing 16-1: \inc???\CpuAsm.h

 typedef enum {    CPUVEN_UNKNOWN     = 0, // Unknown    CPUVEN_INTEL       = 1, // Intel    CPUVEN_AMD         = 2, // AMD    CPUVEN_CYRIX       = 3, // Cyrix    CPUVEN_CENTAUR     = 4, // IDT Centaur (WinChip)    CPUVEN_NATIONAL    = 5, // National Semiconductor    CPUVEN_UMC         = 6, // UMC    CPUVEN_NEXGEN      = 7, // NexGen    CPUVEN_RISE        = 8, // Rise    CPUVEN_TRANSMETA   = 9   // Transmeta } CPUVEN;

We use the following data structure to reference the extracted CPU information.

Cpu Detect Information

 typedef struct CpuInfoType {     uint   nCpuId;    // CPU type identifier     uint   nFpuId;    // floating-point Unit ID     uint   nBits;     // Feature bits     uint   nMfg;      // Manufacturer     byte   nProcCnt;  // # of logical processors     byte   pad[3]; } CpuInfo; CpuInfo struct 4         nCpuId     dd     0  ; CPU type identifier         nFpuId     dd     0  ; Floating-point unit identifier         nBits      dd     0  ; Feature bits         nMfg       dd     0  ; Manufacturer         nProcCnt   db     0  ; # of logical processors         pad        db     0,0,0 CpuInfo ends

This book's CPU detection uses the following data structure for finding matching vendor information. Each microprocessor that supports the CPUID instruction has encoded a 12-byte text string identifying the manufacturer.

 ;       Vendor Data Structure VENDOR STRUCT 4        vname  BYTE    '------------'        Id     DWORD   CPUVEN_UNKNOWN VENDOR ENDS     VENDOR { "AMD ISBETTER", CPUVEN_AMD }        ; AMD Proto VENDOR { "AuthenticAMD", CPUVEN_AMD }        ; AMD VENDOR { "CyrixInstead", CPUVEN_CYRIX }      ; Cyrix & IBM VENDOR { "GenuineIntel", CPUVEN_INTEL }      ; Intel VENDOR { "CentaurHauls", CPUVEN_CENTAUR }    ; Centaur VENDOR { "UMC UMC UMC ", CPUVEN_UMC }        ; UMC (retired) VENDOR { "NexGenDriver", CPUVEN_NEXGEN }     ; NexGen (retired) VENDOR { "RiseRiseRise", CPUVEN_RISE }       ; Rise VENDOR { "GenuineTMx86", CPUVEN_TRANSMETA }  ; Transmeta

Listing 16-2: \RootApp.cpp

 #include "CpuAsm.h"                // CPU module     CpuInfo cinfo;     char szBuf[ CPU_SZBUF_MAX ];         CpuDetect(&cinfo);           // Detect CPU         cout << "\nCPU Detection Code Snippet\n\n";           // Fills in buffer '  szBuf  ' with CPU information!     cout << CpuInfoStr(szBuf, &cinfo) << endl;         CpuSetup(&cinfo);             // Now set up function pointers

This is an example of what gets filled into the ASCII buffer with a call to the function CpuInfoStr().

 "CpuId:15 'INTEL' FPU MMX FXSR SSE SSE2 SSE3 HTT"

That took care of the initial detection code. Now comes the fun part function mapping. Every function you write should have a set of slower default code written in a high-level language such as C. This is really very simple. First there are the private definitions:

 void FmdSetup(const CpuInfo * const pcinfo); void vmp_FMulGeneric(float * const pfD, float fA, float fB); void vmp_FMulAsm3DNow(float * const pfD, float fA, float fB); void vmp_FMulAsmSSE(float * const pfD, float fA, float fB);     void vmp_FDivGeneric(float * const pfD, float fA, float fB); void vmp_FDivAsm3DNow(float * const pfD, float fA, float fB); void vmp_FDivAsmSSE(float * const pfD, float fA, float fB);     void vmp_FDivFastAsm3DNow(float * const pfD, float fA, float fB); void vmp_FDivFastAsmSSE(float * const pfD, float fA, float fB);

Then there are the public application definitions:

 // Multiplication typedef void (*vmp_FMulProc)(float * const pfD, float fA, float fB); extern vmp_FMulProc vmp_FMul;     // Division typedef void (*vmp_FDivProc)(float * const pfD, float fA, float fB); extern vmp_FDivProc vmp_FDiv; extern vmp_FDivProc vmp_FDivFast;

There are the generic as well as processor-based functions such as:

 // Multiplication void vmp_FMulGeneric(float * const pfD, float fA, float fB) {     ASSERT_PTR4(pfD);         *pfD = fA * fB; }

The initialization code assigns the appropriate processor-based function to the public function pointer:

 void CpuSetup(const CpuInfo * const pcinfo) {     ASSERT_PTR4(pcinfo);         if (CPUBITS_SSE & pcinfo->nBits)       {         vmp_FMul =            vmp_FMulAsmSSE;         vmp_FDiv =            vmp_FDivAsmSSE;         vmp_FDivFast =        vmp_FDivFastAsmSSE; // ***FAST***       }     else if (CPUBITS_3DNOW & pcinfo->nBits)       {         vmp_FMul =            vmp_FMulAsm3DNow;         vmp_FDiv =            vmp_FDivAsm3DNow;         vmp_FDivFast =        vmp_FDivFastAsm3DNow;     //***FAST***       }         else       {         vmp_FMul =            vmp_FMulGeneric;         vmp_FDiv =            vmp_FDivGeneric;         vmp_FDivFast =        vmp_FDivGeneric;       } }

You will probably need to play with the mapping until you get used to it. You could use case statements, function table lookups, or other methods, but due to similarity of processor types I find the conditional branching with Boolean logic seems to work best.

What is supplied should be thought of as a starting point. It should be included with most applications, even those that do not use any custom assembly code, as it will compile a breakdown of the computer that ran the application. With custom assembly code, it is the building block of writing cross processor code. There is one more bit of "diagnostic" information that you can use the processor speed. It can give you an idea of why your application is not running well. (Sometimes processors do not run at their marked speed either through misconfiguration or overheating .) This is discussed in Chapter 18, "System."

The listed information can be obtained by using the included function CpuDetect(); however, from your point of view, who manufactured the CPU is not nearly as important as to the bits CPUBITS listed above! Each of those bits being set indicates the existence of the associated functionality. Your program would merely check the bit and correlate the correct set of code. If the processor sets the CPUBITS_3DNOW bit, then it would need to vector to the 3DNow!-based algorithm. If the CPUBITS_SSE bit is set, then it would vector to that set of code. Keep in mind that when I first started writing this book neither existed on the same CPU, but while I was writing it, AMD came out with 3DNow! Professional. This is a union of the two superset families (excluding the SSE3) for which there is also a CPU bit definition. However, that can easily change in the future. My recommendation would be to rate their priority from highest to lowest performance in the initialization logic of your program based upon your applications' criteria.