Managing Secrets in Memory

When maintaining secret data in memory, you should follow some simple guidelines:

Acquire the secret data.
Use the secret data.
Discard the secret data.
Scrub the memory.

The time between acquiring the secret data and scrubbing the memory holding the data should be as short as possible to reduce the chance that the secret data is paged to the paging file. Admittedly, the threat of someone accessing the secret data in the page file is slim. However, if the data is highly sensitive, such as long-lived signing keys and administrator passwords, you should take care to make sure the data is not leaked through what seems like innocuous means. In addition, if the application fails with an access violation, the ensuing crash dump file might contain the secret information.

Once you've used the secret in your code, overwrite the buffer with bogus data (or simply zeros) by using memset or ZeroMemory, which is a simple macro around memset:

#define ZeroMemory RtlZeroMemory #define RtlZeroMemory(Destination,Length)- memset((Destination),0,(Length))

There's a little trick you should know for cleaning out dynamic buffers if you lose track or do not store the buffer size in your code. (To many people, not keeping track of a dynamic buffer size is bad form, but that's another discussion!) If you allocate dynamic memory by using malloc, you can use the _msize function to determine the size of the data block. If you use the Windows heap functions, such as HeapCreate and HeapAlloc, you can determine the block size later by calling the HeapSize function. Once you know the dynamic buffer size, you can safely zero it out. The following code snippet shows how to do this:

void *p = malloc(N); ... size_t cb = _msize(p); memset(p,0,cb);

A Compiler Optimization Caveat

Today's C and C++ compilers have incredible optimization capabilities. They can determine how best to use machine registers (register coloring), move code that manipulates or generates invariant data out of loops (code hoisting), and much more. One of the more interesting optimizations is dead code removal. When the compiler analyzes the code, it can determine whether some code is used based in part on whether the code is called by other code or whether the data the code operates on is used. Look at the following fictitious code can you spot the security flaw?

void DatabaseConnect(char *szDB) { char szPwd[64]; if (GetPasswordFromUser(szPwd,sizeof(szPwd))) { if (ConnectToDatabase(szDB, szPwd)) { // Cool, we're connected // Now do database stuff } } ZeroMemory(szPwd,sizeof(szPwd)); }

Here's the answer: there is no bug; this C code is fine! It's the code generated by the compiler that exhibits the security flaw. If you look at the assembly language output, you'll notice that the call to ZeroMemory has been removed by the compiler! The compiler removed the call to ZeroMemory because it realized the szPwd variable was no longer used by the DatabaseConnect function. Why spend CPU cycles scrubbing the memory of something that's no longer used? Below is the slightly cleaned up assembly language output of the previous code created by Microsoft Visual C++ .NET. It contains the C source code, as well as the Intel x86 instructions. The C source code lines start with a semicolon (;) followed by the line number (starting at 30, in this case) and the C source. Below the C source lines are the assembly language instructions.

; 30 : void DatabaseConnect(char *szDB) { sub esp, 68 ; 00000044H mov eax, DWORD PTR ___security_cookie xor eax, DWORD PTR __$ReturnAddr$[esp+64] ; 31 : char szPwd[64]; ; 32 : if (GetPasswordFromUser(szPwd,sizeof(szPwd))) { push 64 ; 00000040H mov DWORD PTR __$ArrayPad$[esp+72], eax lea eax, DWORD PTR _szPwd$[esp+72] push eax call GetPasswordFromUser add esp, 8 test al, al je SHORT $L1344 ; 33 : if (ConnectToDatabase(szDB, szPwd)) { mov edx, DWORD PTR _szDB$[esp+64] lea ecx, DWORD PTR _szPwd$[esp+68] push ecx push edx call ConnectToDatabase add esp, 8 $L1344: ; 34 : //Cool, we're connected ; 35 : //Now do database stuff ; 36 : } ; 37 : } ; 38 : ; 39 : ZeroMemory(szPwd,sizeof(szPwd)); ; 40 : } mov ecx, DWORD PTR __$ArrayPad$[esp+68] xor ecx, DWORD PTR __$ReturnAddr$[esp+64] add esp, 68 ; 00000044H jmp @__security_check_cookie@4 DatabaseConnect ENDP

The assembly language code after line 30 is added by the compiler because of the GS compiler stack-based cookie option. (Refer to Chapter 5, Public Enemy #1: the Buffer Overrun, for more information about this option.) However, take a look at the code after lines 34 to 40. This code checks that the cookie created by the code after line 30 is valid. But where is the code to zero out the buffer? It's not there! Normally, you would see a call to _memset. (Remember: ZeroMemory is a macro that calls memset.)

Compiler Optimization 101

Compiler optimizations come in many forms, and the most obvious is removing unnecessary code. For instance, an unreachable code block when the condition of an if statement always evaluates to false is easy to optimize away. Similarly, an optimizer removes code that manipulates local variables with no noticeable effect. For instance, a function in which the last thing done to a local variable is a write will have the same noticeable effect as if there was no write. This is because, at the end of the function, the local variable goes out of scope and is no longer accessible. A compiler eliminates these writes by constructing a data structure called a control flow graph that represents all paths of execution in the program. By running backward over this graph, the optimizer can see if the last action to a local variable (more on this in a moment) is always a write, and if it is, it can eliminate that code. This optimization is called dead store elimination. The optimized program has exactly the same observable behavior as that of the non-optimized program, which is an application of the AS IF rule which appears in many language specifications.

Note, if the variable is not local, the compiler cannot always conclusively determine the lifetime of the variable. The control flow graph alone cannot determine whether the non-local variable is later used, therefore dead store elimination cannot occur without more data. This information is difficult to obtain, so the optimization may only occur in limited cases. Currently, Visual C++ will not optimize in this case at all, but it may do so in the future.

The problem is that the compiler should not remove this code, because we always want the memory scrubbed of the secret data. But because the compiler determined that szPwd was no longer used by the function, it removed the code. I've seen this behavior in Microsoft Visual C++ version 6 and version 7 and the GNU C Compiler (GCC) version 3.x. No doubt other compilers have this issue also. During the Windows Security Push see Chapter 2, The Proactive Security Development Process, for more information we created an inline version of ZeroMemory named SecureZeroMemory that is not removed by the compiler and that is available in winbase.h. The code for this inline function is as follows:

#ifndef FORCEINLINE #if (MSC_VER >= 1200) #define FORCEINLINE __forceinline #else #define FORCEINLINE __inline #endif #endif ... FORCEINLINE PVOID SecureZeroMemory( void *ptr, size_t cnt) { volatile char *vptr = (volatile char *)ptr; while (cnt) { *vptr = 0; vptr++; cnt--; } return ptr; }

Feel free to use this code in your application if you do not have the updated Windows header files. Please be aware that this code is slow, relative to ZeroMemory or memset, and should be used only for small blocks of sensitive data. Do not use it as a general memory-wiping function, unless you want to invite the wrath of your performance people!

You can use other techniques to prevent the optimizer from removing the calls to memset. You can add a line of code after the scrubbing function to read the sensitive data in memory, but be wary of the optimizer again. You can fool the optimizer by casting the pointer to a volatile pointer; because a volatile pointer can be manipulated outside the scope of the application, it is not optimized by the compiler. Changing the code to include the following line after the call to ZeroMemory will keep the optimizer at bay:

*(volatile char*)szPwd = *(volatile char *)szPwd;

The problem with the previous two techniques is that they rely on the fact that volatile pointers are not optimized well by the C/C++ compilers this only works today. Optimizer developers are always looking at ways to squeeze that last ounce of size and speed from your code, and who knows, three years from now, there might be a way to optimize volatile pointer code safely.

Another way to solve the issue that does not require compiler tricks is to turn off optimizations for the code that scrubs the data. You can do this by wrapping the function(s) in question with the #pragma optimize construct:

#pragma optimize("",off) // Memory-scrubbing function(s) here. #pragma optimize("",on)

This will turn off optimizations for the entire function. Global optimizations, -Og (implied by the -Ox, -O1 and -O2 compile-time flags), are what Visual C++ uses to remove dead stores. But remember, global optimizations are a very good thing, so keep the code affected by the #pragma constructs to a minimum.

Encrypting Secret Data in Memory

If you must use long-lived secret data in memory, you should consider encrypting the memory while it is not being used. Once again, this helps mitigate the threat of the data being paged out. You can use any of the CryptoAPI samples shown previously to perform this task. While this works, you'll have to manage keys.

In Windows .NET Server 2003, we added two new APIs along the same lines as DPAPI but for protecting in-memory data. The function calls are CryptProtectMemory and CryptUnprotectMemory. The base key used to protect the data is re-created each time the computer is booted, and other key material is used depending on flags passed to the functions. Your application need never see an encryption key when using these functions. The following code sample shows how to use the functions.

#include <wincrypt.h> #define SECRET_LEN 15 //includes null HRESULT hr = S_OK; LPWSTR pSensitiveText = NULL; DWORD cbSensitiveText = 0; DWORD cbPlainText = SECRET_LEN * sizeof(WCHAR); DWORD dwMod = 0; //Memory to encrypt must be a multiple //of CYPTPROTECTMEMORY_BLOCK_SIZE. if (dwMod = cbPlainText % CRYPTPROTECTMEMORY_BLOCK_SIZE) cbSensitiveText = cbPlainText + (CRYPTPROTECTMEMORY_BLOCK_SIZE - dwMod); else cbSensitiveText = cbPlainText; pSensitiveText = (LPWSTR)LocalAlloc(LPTR, cbSensitiveText); if (NULL == pSensitiveText) return E_OUTOFMEMORY; //Place sensitive string to encrypt in pSensitiveText. //Then encrypt in place if (!CryptProtectMemory(pSensitiveText, cbSensitiveText, CRYPTPROTECTMEMORY_SAME_PROCESS)) { //on failure clean out the data SecureZeroMemory(pSensitiveText, cbSensitiveText); LocalFree(pSensitiveText); pSensitiveText = NULL; return GetLastError(); } //Call CryptUnprotectMemory to decrypt and use the memory. ... //Now clean up SecureZeroMemory(pSensitiveText, cbSensitiveText); LocalFree(pSensitiveText); pSensitiveText = NULL; return hr;

You can learn more about these new functions in the Platform SDK.