Preventing Canonicalization Mistakes

Now that I've paraded numerous issues and you've read the bad news, let's look at solutions for canonicalization mistakes. The solutions include avoiding making decisions based on names, restricting what is allowed in a name, and attempting to canonicalize the name. Let's look at each in detail.

Don't Make Decisions Based on Names

The simplest, and by far the most effective way of avoiding canonicalization bugs is to avoid making decisions based on the filename. Let the file system and operating system do the work for you, and use ACLs or other operating system based authorization technologies. Of course, it's not quite as simple as that! Some security semantics cannot currently be represented in the file system. For example, IIS supports scripting. In other words, a script file, such as an ASP page containing Visual Basic Scripting Edition (VBScript) or Microsoft JScript, is read and processed by a script engine, and the results of the script are sent to the user. This is not the same as read access or execute access; it's somewhere in the middle. IIS, not the operating system, has to determine how to process the file. All it takes is a mistake in IIS's canonicalization, such as that in the ::$DATA exploit, and IIS sends the script file source code to the user rather than processing the file correctly.

As mentioned, you can limit access to resources based on the user's IP address. However, this security semantics currently cannot be represented as an ACL, and applications supporting restrictions based on IP address, Domain Name System (DNS) name, or subnet must use their own access code.

IMPORTANT
Refrain from making security decisions based on the name of a file. The wrong choice might have dire security consequences.

Use a Regular Expression to Restrict What's Allowed in a Name

I covered this in detail in Chapter 10, but it's worth repeating. If you must make name-based security decisions, restrict what you consider a valid name and deny all other formats. For example, you might require that all filenames be absolute paths containing a restricted pool of characters. Or you might decide that the following must be true for a file to be determined as valid:

The file must reside on drive c: or d:.
The path is a series of backslashes and alphanumeric characters.
The filename follows the path; the filename is also alphanumeric, is not longer than 32 characters, is followed by a dot, and ends with the txt, jpg, or gif extension.

The easiest way to do this is to use regular expressions. Learning to define and use good regular expressions is critical to the security of your application. A regular expression is a series of characters that define a pattern which is then compared with target data, such as a string, to see whether the target includes any matches of the pattern. For example, the following regular expression will represent the example absolute path just described:

^[cd]:(?:\\\w+)+\\\w{1,32}\.(txt jpg gif)$

Refer to Chapter 10 for details about what this expression means.

This expression is strict the following are valid:

c:\mydir\myotherdir\myfile.txt
d:\mydir\myotherdir\someotherdir\picture.jpg

The following are invalid:

e:\mydir\myotherdir\myfile.txt (invalid drive letter)
c:\fred.txt (must have a directory before the filename)
c:\mydir\myotherdir\..\mydir\myfile.txt (can't have anything but A-Za-z0-9 and an underscore in a directory name)
c:\mydir\myotherdir\fdisk.exe (invalid file extension)
c:\mydir\myothe~1\myfile.txt (the tilde [~] is invalid)
c:\mydir\myfile.txt::$DATA (the colon [:] is invalid other than after the drive letter; $ is also invalid)
c:\mydir\myfile.txt. (the trailing dot is invalid)
\\myserver\myshare\myfile.txt (no drive letter)
\\?\c:\mydir\myfile.txt (no drive letter)

As you can see, using this simple expression can drastically reduce the possibility of using a noncanonical name. However, it does not detect whether a filename represents a device; we'll look at that shortly.

IMPORTANT
Regular expressions teach an important lesson. A regular expression determines what is valid, and everything else is therefore invalid. Determining whether or not an expression is valid is the correct way to parse any kind of input. You should never look for and block invalid data and then allow everything else through; you will likely miss a rare edge case. This is incredibly important. I repeat: look for that which is provably valid, and disallow everything else.

Stopping 8.3 Filename Generation

You should also consider preventing the file system from generating short filenames. This is not a programmatic option it's an administrative setting. You can stop Windows from creating 8.3 filenames by adding the following setting to the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem registry key:

NtfsDisable8dot3NameCreation : REG_DWORD : 1

This option does not remove previously generated 8.3 filenames.

Don't Trust the PATH Use Full Path Names

Never depend on the PATH environment variable to find files. You should be explicit about where your files reside. For all you know, an attacker might have changed the PATH to read c:\myhacktools;%systemroot% and so on! When was the last time you checked the PATH on your systems? The lesson here is to use full path names to your data and executable files, rather than relying on an untrusted variable to determine which files to access.

More Info
A new registry setting in Windows XP allows you to search some of the folders specified in the PATH environment variable before searching the current directory. Normally, the current directory is searched first, which can make it easy for attackers to place Trojan horses on the computer. The registry key is HKEY_LOCAL_MACHINE \System\CurrentControlSet\Control\ Session Manager\SafeDllSearchMode. You need to add this registry key. The value is a DWORD type and is 0 by default. If the value is set to 1, the current directory is searched after system32.

Restricting what is valid in a filename and rejecting all else is reasonably safe, as long as you use a good regular expression. However, if you want more flexibility, you might need to attempt to canonicalize the filename for yourself, and that's the next topic.

Attempt to Canonicalize the Name

Canonicalizing a filename is not as hard as it seems; you just need to be aware of some Win32 functions to help you. The goal of canonicalization is to get as close as possible to the file system's representation of the file in your code and then to make decisions based on the result. In my opinion, you should get as close as possible to the canonical representation and reject the name if it still does not look valid. For example, the CleanCanon application I've written performs robust canonicalization functions as described in the following steps:

It takes an untrusted filename request from a user for example, mysecretfile.txt.
It determines whether the filename is well formed. For example, mysecretfile.txt is valid; mysecr~1.txt, mysecretfile.txt::$DATA, and mysecretfile.txt. (trailing dot) are all invalid.
The code determines whether the combined length of the filename and the directory is greater than MAX_PATH in length. If so, the request is rejected. This is to help mitigate denial of service attacks and buffer overruns.
It prepends an application-configurable directory to the filename for example, c:\myfiles, to yield c:\myfiles\mysecretfile.txt. It also adds \\?\ to the start of the filename, this instructs the operating system to handle the filename literally, and not perform any extra canonicalization steps.
It determines the correct directory structure that allows for two dots (..) this is achieved by calling GetFullPathName.
It evaluates the long filename of the file in case the user uses the short filename version. For example, mysecr~1.txt becomes mysecretfile.txt, achieved by calling GetLongPathName. This is technically moot because of the filename validation in step 2. However, it's a defense-in-depth measure!
It determines whether the filename represents a file or a device. This is something a regular expression cannot achieve. If the GetFileType function determines the file to be of type FILE_TYPE_DISK, it's a real file and not a device of some kind.

NOTE
Earlier I mentioned that device name issues exist in Linux and UNIX also. C or C++ programs running on these operating systems can determine whether a file is a file or a device by calling the stat function and checking the value of the stat.st_mode variable. If its value is S_IFREG (0x0100000), the file is indeed a real file and not a device or a link.

Let's look at this Win32 C++ code, written using Visual C++ .NET, that performs these steps:

/* CleanCanon.cpp */ #include "stdafx.h" #include "atlrx.h" #include "strsafe.h" #include <new> enum errCanon { ERR_CANON_NO_ERROR = 0, ERR_CANON_INVALID_FILENAME, ERR_CANON_INVALID_PATH, ERR_CANON_NOT_A_FILE, ERR_CANON_NO_FILE, ERR_CANON_NO_PATH, ERR_CANON_TOO_BIG, ERR_CANON_NO_MEM}; errCanon GetCanonicalFileName(LPCTSTR szFilename, LPCTSTR szDir, LPTSTR *pszNewFilename) { //STEP 1 //Must provide a path and must be smaller than MAX_PATH if (szDir == NULL) return ERR_CANON_NO_PATH; size_t cchDirLen = 0; if (StringCchLength(szDir,MAX_PATH,&cchDirLen) != S_OK cchDirLen > MAX_PATH) return ERR_CANON_TOO_BIG; *pszNewFilename = NULL; LPTSTR szTempFullDir = NULL; HANDLE hFile = NULL; errCanon err = ERR_CANON_NO_ERROR; try { //STEP 2 //Check filename is valid (alphanum '.' 1-4 alphanums) //Check path is valid (alphanum and '\' only) //Case insensitive CAtlRegExp<> reFilename, reDirname; CAtlREMatchContext<> mc; reFilename.Parse(_T("^\\a+\\.\\a\\a?\\a?\\a?$"),FALSE); if (!reFilename.Match(szFilename,&mc)) throw ERR_CANON_INVALID_FILENAME; reDirname.Parse(_T("^\\c:\\\\[a-z0-9\\\\]+$"),FALSE); if (!reDirname.Match(szDir,&mc)) throw ERR_CANON_INVALID_FILENAME; size_t cFilename = lstrlen(szFilename); size_t cDir = lstrlen(szDir); //Temp new buffer size, allow for added '\' size_t cNewFilename = cFilename + cDir + 1; //STEP 3 //Make sure filesize is small enough if (cNewFilename > MAX_PATH) throw ERR_CANON_TOO_BIG; //Allocate memory for the new filename //Accommodate for prefix \\?\ and for trailing '\0' LPCTSTR szPrefix = _T("\\\\?\\"); size_t cchPrefix = lstrlen(szPrefix); size_t cchTempFullDir = cNewFilename + 1 + cchPrefix; szTempFullDir = new TCHAR[cchTempFullDir]; if (szTempFullDir == NULL) throw ERR_CANON_NO_MEM; //STEP 4 //Join the dir and filename together. //Prepending \\?\ forces the OS to treat many characters //literally by not performing extra interpretation/canon steps if (StringCchPrintf(szTempFullDir, cchTempFullDir, _T("%s%s\\%s"), szPrefix, szDir, szFilename) != S_OK) throw ERR_CANON_INVALID_FILENAME; // STEP 5 // Get the full path, // Accommodates for .. and trailing '.' and spaces TCHAR szFullPathName [MAX_PATH + 1]; LPTSTR szFilenamePortion = NULL; DWORD dwFullPathLen = GetFullPathName(szTempFullDir, MAX_PATH, szFullPathName, &szFilenamePortion); if (dwFullPathLen > MAX_PATH) throw ERR_CANON_NO_MEM; // STEP 6 // Get the long filename if (GetLongPathName(szFullPathName, szFullPathName, MAX_PATH) == 0) { errCanon errName = ERR_CANON_TOO_BIG; switch (GetLastError()) { case ERROR_FILE_NOT_FOUND : errName = ERR_CANON_NO_FILE; break; case ERROR_NOT_READY : case ERROR_PATH_NOT_FOUND : errName = ERR_CANON_NO_PATH; break; default : break; } throw errName; } // STEP 7 // Is this a file or a device? hFile = CreateFile(szFullPathName, 0,0,NULL, OPEN_EXISTING, SECURITY_SQOS_PRESENT   SECURITY_IDENTIFICATION, NULL); if (hFile == INVALID_HANDLE_VALUE) throw ERR_CANON_NO_FILE; if (GetFileType(hFile) != FILE_TYPE_DISK) throw ERR_CANON_NOT_A_FILE; //Looks good! //Caller must delete [] pszNewFilename const size_t cNewFilenane = lstrlen(szFullPathName)+1; *pszNewFilename = new TCHAR[cNewFilenane]; if (*pszNewFilename != NULL) StringCchCopy(*pszNewFilename,cNewFilenane,szFullPathName); else err = ERR_CANON_NO_MEM; } catch(errCanon e) { err = e; } catch (std::bad_alloc a) { err = ERR_CANON_NO_MEM; } delete [] szTempFullDir; if (hFile) CloseHandle(hFile); return err; }

The complete code listing is available in the companion content, in the folder Secureco2\Chapter11\CleanCanon. CreateFile has a side effect when it's determining whether the file is a drive-based file. The function will fail if the file does not exist, saving your application from performing the check.

Calling CreateFile Safely

You may have noticed that dwFlagsAndAttributes flags is nonzero in the CreateFile call in the previous code. There's a good reason for this. This code does nothing more than verify that a filename is valid, and is not a device or an interprocess communication mechanism, such as a mailslot or a named pipe. That's it. If it were a named pipe, the process owning the pipe could impersonate the process identity of the code making the request. However, in the interests of security, I don't want any code I don't trust impersonating me. So setting this flag prevents the code at the other end impersonating you.

Note that there is a small issue with setting this flag, although it doesn't affect this code, because the code is not attempting to manipulate the file. The problem is that the constant SECURITY_SQOS_PRESENT SECURITY_IDENTIFICATION is the same as FILE_FLAG_OPEN_NO_RECALL, which indicates the file is not to be pulled from remote storage if the file exists. This flag is intended for use by the Hierarchical Storage Management system or remote storage systems.

Now let's move our focus to fixing Web-based canonical representation issues.