| ||
There are a lot of features in the CPUID, but most of them are not needed for what we are doing here. I have documented some of what this instruction does (a lot more than what I normally need), but I strongly recommend that if you are truly interested in this instruction that you download the manufacturer's technical manuals.
Most programs being written these days are primarily written for a Protected Mode environment and so we only need to deal with, at a minimum, the first processor capable of truly running in Protected Mode the 386 processor. (The 80286 does not count!) This CPU detection algorithm detects the model, manufacturer, and capabilities, and sets flags as such. As we really only deal with 32-bit modes in this book, we do not bother detecting for an 8086, 80186, or an 80286. We do, however, detect for a 386 or above. In our algorithm we use the following CPU IDs.
This instruction has been enhanced since I wrote Vector Game Math Processors as newer instructions have been added to the processor. It has been used throughout the book, but let us examine it a bit closer.
; CPU Detect - definition IDs CPU_386 = 3 ; 80386 CPU_486 = 4 ; 80486 CPU_PENTIUM = 5 ; P5 (Pentium) CPU_PENTIUM_PRO = 6 ; Pentium Pro CPU_PII = 6 ; PII
Prior to the Pentium processor, a computer system would optionally have a floating-point chip, which contained a FPU. In the case of CPUs, no functionality is lost as one upgrades to a more advanced processor; they are all downward compatible. This is not the case with the FPU. Some functionality was lost; so if writing any floating-point instructions, you should know which FPU you are coding for. Some external FP chips did not exactly match the processor but were compatible.
; Legacy CPUs and compatible FPU coprocessors ; CPU_086 NONE, FPU_087 ; CPU_186 NONE, FPU_087 ; CPU_286 NONE, FPU_287 ; CPU_386 NONE, FPU_287, FPU_387 ; CPU_486 NONE, FPU_387, FPU_487 ; FPU Detect - definition IDs FPU_NONE = 0 ; No FPU chip FPU_087 = 1 ; 8087 FPU_287 = 2 ; 80287 FPU_387 = 3 ; 80387 FPU_487 = CPU_486 FPU_PENTIUM = CPU_PENTIUM FPU_PII = CPU_PII
The various manufacturers implemented the same functionality as Intel but recently have begun to do their own. Due to this, unions and intersections can be drawn, and so we use individual flags to indicate CPU capability.
typedef enum { CPUBITS_FPU = 0x0001, // FPU flag CPUBITS_MMX = 0x0002, // MMX flag CPUBITS_3DNOW = 0x0004, // 3DNow! flag CPUBITS_FXSR = 0x0008, // Fast FP Store CPUBITS_SSE = 0x0010, // SSE CPUBITS_SSE2 = 0x0020, // SSE (Ext 2) CPUBITS_3DNOW_MMX = 0x0040, // 3DNow! (MMX Ext) CPUBITS_3DNOW_EXT = 0x0080, // 3DNow! (Ext) CPUBITS_3DNOW_SSE = 0x0100, // 3DNow! Professional CPUBITS_HTT = 0x0200, // Hyperthreading Tech CPUBITS_SSE3 = 0x0400, // Prescott NI CPUBITS_EM64T = 0x0800, // EM64T supported CPUBITS_AMD64 = 0x1000, // AMD Long Mode } CPUBITS;
Each manufacturer has its own unique optimization methods and so we get a vendor name .
typedef enum { CPUVEN_UNKNOWN = 0, // Unknown CPUVEN_INTEL = 1, // Intel CPUVEN_AMD = 2, // AMD CPUVEN_CYRIX = 3, // Cyrix CPUVEN_CENTAUR = 4, // IDT Centaur (WinChip) CPUVEN_NATIONAL = 5, // National Semiconductor CPUVEN_UMC = 6, // UMC CPUVEN_NEXGEN = 7, // NexGen CPUVEN_RISE = 8, // Rise CPUVEN_TRANSMETA = 9 // Transmeta } CPUVEN;
We use the following data structure to reference the extracted CPU information.
typedef struct CpuInfoType { uint nCpuId; // CPU type identifier uint nFpuId; // floating-point Unit ID uint nBits; // Feature bits uint nMfg; // Manufacturer byte nProcCnt; // # of logical processors byte pad[3]; } CpuInfo; CpuInfo struct 4 nCpuId dd 0 ; CPU type identifier nFpuId dd 0 ; Floating-point unit identifier nBits dd 0 ; Feature bits nMfg dd 0 ; Manufacturer nProcCnt db 0 ; # of logical processors pad db 0,0,0 CpuInfo ends
This book's CPU detection uses the following data structure for finding matching vendor information. Each microprocessor that supports the CPUID instruction has encoded a 12-byte text string identifying the manufacturer.
; Vendor Data Structure VENDOR STRUCT 4 vname BYTE '------------' Id DWORD CPUVEN_UNKNOWN VENDOR ENDS VENDOR { "AMD ISBETTER", CPUVEN_AMD } ; AMD Proto VENDOR { "AuthenticAMD", CPUVEN_AMD } ; AMD VENDOR { "CyrixInstead", CPUVEN_CYRIX } ; Cyrix & IBM VENDOR { "GenuineIntel", CPUVEN_INTEL } ; Intel VENDOR { "CentaurHauls", CPUVEN_CENTAUR } ; Centaur VENDOR { "UMC UMC UMC ", CPUVEN_UMC } ; UMC (retired) VENDOR { "NexGenDriver", CPUVEN_NEXGEN } ; NexGen (retired) VENDOR { "RiseRiseRise", CPUVEN_RISE } ; Rise VENDOR { "GenuineTMx86", CPUVEN_TRANSMETA } ; Transmeta
#include "CpuAsm.h" // CPU module CpuInfo cinfo; char szBuf[ CPU_SZBUF_MAX ]; CpuDetect(&cinfo); // Detect CPU cout << "\nCPU Detection Code Snippet\n\n"; // Fills in buffer ' szBuf ' with CPU information! cout << CpuInfoStr(szBuf, &cinfo) << endl; CpuSetup(&cinfo); // Now set up function pointers
This is an example of what gets filled into the ASCII buffer with a call to the function CpuInfoStr().
"CpuId:15 'INTEL' FPU MMX FXSR SSE SSE2 SSE3 HTT"
That took care of the initial detection code. Now comes the fun part function mapping. Every function you write should have a set of slower default code written in a high-level language such as C. This is really very simple. First there are the private definitions:
void FmdSetup(const CpuInfo * const pcinfo); void vmp_FMulGeneric(float * const pfD, float fA, float fB); void vmp_FMulAsm3DNow(float * const pfD, float fA, float fB); void vmp_FMulAsmSSE(float * const pfD, float fA, float fB); void vmp_FDivGeneric(float * const pfD, float fA, float fB); void vmp_FDivAsm3DNow(float * const pfD, float fA, float fB); void vmp_FDivAsmSSE(float * const pfD, float fA, float fB); void vmp_FDivFastAsm3DNow(float * const pfD, float fA, float fB); void vmp_FDivFastAsmSSE(float * const pfD, float fA, float fB);
Then there are the public application definitions:
// Multiplication typedef void (*vmp_FMulProc)(float * const pfD, float fA, float fB); extern vmp_FMulProc vmp_FMul; // Division typedef void (*vmp_FDivProc)(float * const pfD, float fA, float fB); extern vmp_FDivProc vmp_FDiv; extern vmp_FDivProc vmp_FDivFast;
There are the generic as well as processor-based functions such as:
// Multiplication void vmp_FMulGeneric(float * const pfD, float fA, float fB) { ASSERT_PTR4(pfD); *pfD = fA * fB; }
The initialization code assigns the appropriate processor-based function to the public function pointer:
void CpuSetup(const CpuInfo * const pcinfo) { ASSERT_PTR4(pcinfo); if (CPUBITS_SSE & pcinfo->nBits) { vmp_FMul = vmp_FMulAsmSSE; vmp_FDiv = vmp_FDivAsmSSE; vmp_FDivFast = vmp_FDivFastAsmSSE; // ***FAST*** } else if (CPUBITS_3DNOW & pcinfo->nBits) { vmp_FMul = vmp_FMulAsm3DNow; vmp_FDiv = vmp_FDivAsm3DNow; vmp_FDivFast = vmp_FDivFastAsm3DNow; //***FAST*** } else { vmp_FMul = vmp_FMulGeneric; vmp_FDiv = vmp_FDivGeneric; vmp_FDivFast = vmp_FDivGeneric; } }
You will probably need to play with the mapping until you get used to it. You could use case statements, function table lookups, or other methods, but due to similarity of processor types I find the conditional branching with Boolean logic seems to work best.
What is supplied should be thought of as a starting point. It should be included with most applications, even those that do not use any custom assembly code, as it will compile a breakdown of the computer that ran the application. With custom assembly code, it is the building block of writing cross processor code. There is one more bit of "diagnostic" information that you can use the processor speed. It can give you an idea of why your application is not running well. (Sometimes processors do not run at their marked speed either through misconfiguration or overheating .) This is discussed in Chapter 18, "System."
The listed information can be obtained by using the included function CpuDetect(); however, from your point of view, who manufactured the CPU is not nearly as important as to the bits CPUBITS listed above! Each of those bits being set indicates the existence of the associated functionality. Your program would merely check the bit and correlate the correct set of code. If the processor sets the CPUBITS_3DNOW bit, then it would need to vector to the 3DNow!-based algorithm. If the CPUBITS_SSE bit is set, then it would vector to that set of code. Keep in mind that when I first started writing this book neither existed on the same CPU, but while I was writing it, AMD came out with 3DNow! Professional. This is a union of the two superset families (excluding the SSE3) for which there is also a CPU bit definition. However, that can easily change in the future. My recommendation would be to rate their priority from highest to lowest performance in the initialization logic of your program based upon your applications' criteria.