File-Based Canonicalization Issues | Hunting Security Bugs

Can you think of how many ways there are to represent a path to a single file? Look at the following examples:

C:\ WINDOWS \system32\calc.exe
C:/WINDOWS/system32/calc.exe
\\?\C:\WINDOWS\system32\calc.exe
file://c:\WINDOWS\system32\calc.exe
%windir%\system32\calc.exe
\\127.0.0.1\C$\WINDOWS\system32\calc.exe
C:\WINDOWS\..\WINDOWS\.\system32\calc.exe
C:\WINDOWS\.\system32\calc.exe

Although this might seem like a lot of different ways to access the same file, more variations could be created. The following sections discuss examples like these, but the preceding list helps illustrate how easy it can be to make a bad security decision based only on the name of a resource.

Directory Traversal

Directory traversal occurs when an attacker references the parent or root folder as part of the filename and/or path and is able to coerce the target application into processing something that would otherwise be off limits to the attacker. This type of vulnerability is extremely common because many system calls programmers use automatically resolve relative file paths. For example, if a Web application must block requests to /secret , it might parse incoming requests to see whether the root folder name is equivalent to secret . Perhaps the developer understands and accounted for different encoding techniques and case-sensitivity issues in the application s request-processing code, but can you think of a way that directory traversal techniques might fool the parser? If the request is to http://www.example.com/somedir/../secret , the parser would compare somedir to secret and so this request would not be blocked, but the canonical form of the path is /secret , which results in a canonicalization bug.

Table 12-1 shows some common symbols that can be used to help traverse directories. By including the symbols listed in the table, the following are considered equivalent:

C:\WINDOWS
\.\WINDOWS\..\.\WINDOWS\.\system32\drivers\..\..\system32\.\..

Table 12-1: Common Symbols Used in Directory Traversal
Symbol	Description
(dot dot dot)	Obscure method of traversing up two directory levels or of referring to the same directory.
.. (dot dot)	Traverses up one directory level.
. (dot)	Refers to the same directory.
Leading forward slash (/) or backslash ( )	Refers to the root directory.

Tip	If the Microsoft Windows operating system is installed on the C drive in a folder named Windows, you can experiment with this by opening a command prompt and using DIR with the directory traversal values to see which directories are returned.

Defeating Filename Extension Checks

Even if an application is doing something that seems straightforward, such as blocking files based on the extension, it can be difficult to determine all of the valid possibilities that an attacker could use to bypass your check. Take the following C# sample code, for example:

 // Filename is specified from user input, so block // if it is an .exe file. if (filename.EndsWith(".exe") == true) {     allowUpload = false; // Block upload. } else {     allowUpload = true; // Allow upload. }

This example seems pretty simple: most filenames have an extension, so checking the end of the filename for the extension you want to block seems like a good check. But the preceding code might overlook some problems. The first is that it is always better to use a white-list (or allow-list ) approach when making security decisions. In this case, the code should check only for the files it allows, and then block everything else. That issue aside, there are canonicalization issues that would allow an attacker to bypass the extension check, thus causing allowUpload to equal true . For instance, if the user specified a file with any of the following extensions, and if the file is not in canonical form first, the preceding code would fail to block an .exe file from being uploaded to the server:

.exe. (trailing dot)
.EXE (different casing )
.exe%20 (hexadecimal representation of a trailing space)

Issues with trailing characters are discussed in more detail later in this chapter. The point here is to show how easy it is to bypass even filename extension checks.

Understanding Filename Extension Precedence

At the command prompt, when you type the name of an executable, are you sure you know what application will run? Obviously, you can run applications that are not located in the current working folder. The Windows operating system launches executable files in an order based on extension precedence. If executables with the same filename but different extension exist in the same directory, the Windows operating system will launch them in the following order:

.com files
.exe files
.bat files

Because most applications do not use files with a .com extension any more, attackers can create Trojan applications that use the same filename as a legitimate executable and place the bogus file in the same directory as the actual executable. For instance, if an application is installed in C:\Application and contains an executable called Program.exe, attackers might get their code to run by placing a malicious Program.com, which uses the uncommon .com extension, in the same directory. To prevent this, your application should use the full path when referring to files. Also, permissions should be set properly on the Application folder so this attack is not possible. Permissions are discussed further in Chapter 13, Finding Weak Permissions.

Using Trailing Characters

Some examples of trailing characters that could cause canonicalization problems where mentioned previously. In certain environments, an illegal trailing character might be removed automatically from the filename by the system before the file is actually accessed. This behavior has caused many applications to parse filenames improperly when such characters as a dot (.) or forward slash (/) are appended to the filename. Remember, as a tester you are trying to fool the parser by providing values that slip past data validation checks, yet that are still considered the same in canonical form. Here are more ways you might be able to bypass a trailing characters check by appending characters to the extension:

.exe. (trailing dot)
.exe (trailing space after extension)
. exe (space after dot)
.exe x (trailing other character)
.exe%08 (trailing nonprintable character, such as a BACKSPACE)
.e&xe (embedded ampersand)
.txt%00 (trailing null character)
.txt%0d%0a.exe (embedded carriage return/line feed, or CRLF)
.txt\n.exe (embedded newline character, useful in spoofing attempts because .exe is moved to the next line)

Because an application might even strip out characters from the middle of the filename, you might be able to use that behavior to bypass any checks. For instance, we tested an application that removed ampersands (&) from the filenames. It also blocked users from uploading certain file types. Unfortunately, it checked the extension prior to sanitizing the filename. Can you see the problem? We attempted to upload a file called evilfile.e&xe, and the parser did not block the filename based on the extension. Then the parser removed the illegal character from the filename, allowing evilfile.exe to be uploaded.

NTFS Data Streams

The NTFS file system supports multiple data streams for a file, meaning even though the file s content is in the main data stream, you can create a new stream associated with the file that can be accessed as well. To create or access the additional stream, append a colon and the name of the stream to the file. Let s look at an example. You can create a file called test.txt on the command line with the following syntax:

 echo hello > test.txt

Now you can add additional content to a new data stream called newstream:

 echo world > text.txt:newstream

To view the contents of the streams, use the more command. Notice the following output shows how you can access the different streams:

 D:\examples>more < test.txt hello D:\examples>more < test.txt::$DATA hello D:\examples>more < test.txt:newstream world D:\examples>more < test.txt:newstream:$DATA world

Probably the most well known NTFS data stream vulnerability was the one in Microsoft Internet Information Server (IIS) 4.0 that revealed the source of the Active Server Pages (ASP) file when a user would browse to a file and append ::$DATA to the filename. Essentially, IIS did not render the contents through the ASP engine because it did not recognize the extension as the correct type; thus, it simply showed the contents of the file to the user.

More Info

For more information about this bug, see http://www.microsoft.com/technet/security/bulletin/MS98-003.asp (without the ::$ DATA, of course).

Depending on how the application determines the file extension, specifying alternate data streams, such as ::$DATA , might bypass any checks. And because the data stream can also contain information, there might be a way to get an application to process that data as well. For instance, imagine that an application parses a file upon being uploaded to the server and removes all malicious input. However, there could also be data stored in the alternative data stream that would not be parsed.

When Filename Extensions Do Not Matter

Depending on the file type, including an extension might not matter on certain operating systems. For instance, you might think that on all systems Microsoft Office PowerPoint files use the .ppt extension, and that when you click a .ppt file, the file will attempt to open in PowerPoint. Some systems open files in the correct application regardless of the filename extension because the application uses the GetClassFile API to determine how to handle the file. For example, if you create a PowerPoint file called Example.ppt and rename it to Example.ext, the file will still open in PowerPoint when you double-click it. To understand why this works, refer to http://msdn.microsoft.com/library/en-us/com/html/dc3cb263-7b9a-45f9-8eab-3a88aa9392db.asp , but this example illustrates how relying only on the extension in a filename could lead to problems.

Other Common Mistakes That Lead to Canonicalization Issues

Because there are so many different ways a file or path could be represented, security decisions based on names will likely lead to canonicalization issues. Following are additional issues that programmers commonly overlook when they attempt to block certain files and paths from being accessed.

Using Short Filenames vs. Long Filenames

In the early days of MS-DOS and the FAT file system, filenames were restricted to using a maximum of eight characters with a three-character extension, known as 8.3 filenames. Now, file systems such as FAT32 and NTFS allow for long filenames, too. To maintain backward compatibility, the NTFS and FAT32 file systems automatically generate the 8.3 representation of a filename as well as the long filename. For example, if you have a file on your FAT32 or NTFS Windows system called LongFilename.txt, LONGFI~1.TXT will also get generated.

As you might imagine, an application that makes decisions based on a specified path and filename might somehow become vulnerable if the equivalent short filename can be specified. So if the developer checks LongFilename.txt and LONGFI ^~ 1.TXT, that should be good enough when making a security decision, right? Not exactly! The general format of the 8.3 naming convention is to include the first six characters of the long filename followed by a tilde (~), an incremental number, and then the three-digit extension. As such, a developer might use a regular expression that checks for the six characters, tilde, and then a digit to attempt to plug the hole. But this method has flaws, too. If files already exist in the same directory with filenames that start with the same first six characters of the long filename, the naming convention for the autogenerated 8.3 name changes. Look at Figure 12-1. In this example, 10 files named LongFilename0.txt to LongFilename9.txt were created before LongFilename.txt was created. As you can see, the 8.3 filename for LongFilename.txt is LOF12D~1.TXT.

Figure 12-1: Using dir /x to display the short filename form of a file with a long filename

Although developers can continue to refine the checks, they will more than likely miss several other cases. Instead, the canonical form and system APIs should be used when you are making decisions based off the name. Also, do not forget to combine this knowledge with what you learned earlier in this chapter about bypassing filename extension checks. For instance, if an e-mail application is supposed to block .exe attachments, what do you think happens when you can create a file called runme.exempt and e-mail it as an attachment to a recipient? The filename extension check might be bypassed, but the short filename version for the attachment file is runme~1.exe .

Note	The Windows operating system is not the only system that supports both short and long filenames. Also, other file systems might have their own algorithm for determining the short version of a long filename.

Exploiting Casing Issues

The file system of some operating systems, such as UNIX, is case-sensitive, meaning myfile, MyFile, and MYFILE are considered different filenames and can be located in the same directory. The Windows NTFS file system, however, is not case-sensitive, but does preserve the case of filenames. If your application restricts certain filenames, consider using alternate casing to attempt to bypass any of the casing checks.

Note	Systems that use the Portable Operating System Interface for UNIX (POSIX) on Windows perform case-sensitive name comparison on files.

Another common casing problem involves internationalization issues concerning certain characters. In the following C# code, the lowercase values of two strings are compared in a case-insensitive manner:

 if (folder.ToLower() == "private") {     throw new UnauthorizedAccessException(); } else {     // Allow access }

Sometimes functions, such as ToUpper or ToLower , are used to compare strings. In many cases, the preceding code might not cause a problem. However, because the system locale is used to make the conversion, the code might not work as expected in other locales, for example, Turkish. In English, there are 26 unique letters in the alphabet that can be uppercase or lowercase, such as i or I . Turkish, on the other hand, has four i s: and i. As such, if your application prevents access to a folder called private by using code like what s shown in the preceding example, it won t work properly on a system set to the Turkish locale because calling ToLower on a folder called PRIVATE would result in private . Notice, the i is not dotted . The comparison would fail because in the Turkish locale, i and I are one uppercase-lowercase equivalent and i and I are another. Attackers can take advantage of this functionality if they are able to specify the encoding or if they target users of the Turkish locale.

Specifying DOS Device Names

Another MS-DOS feature that also made it into the Windows operating system for backward compatibility is the use of device names. These are reserved words that refer to devices, such as COM1 for the first communications port. Examples of several device names include AUX, COM0, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9, CON, LPT0, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, LPT9, NUL, PRN, and CLOCK$.

Applications that allow a user to specify filenames without ensuring the filename is not equivalent to a DOS device name can experience denial of service (DoS) attacks. For example, imagine if your application creates a file based on a name that is provided by the user. If the user specifies a name such as COM1 for the file, the application would try talking with the communication port instead of the file.

On January 8, 2005, Dennis Rand discovered a DoS in Novell eDirectory 8.7.3 if a DOS device name was specified when requesting a URL, such as http://www.example.com:8008/COM1 . If an attacker made a request to a vulnerable server, the service would stop until it was restarted. Although the problem has been resolved, it shows that such vulnerabilities still exist. You can read more about this vulnerability at http://cirt.dk/advisories/cirt-33-advisory.pdf .

Note	Device name vulnerabilities are not limited to the Windows operating system other operating systems such as UNIX also have device names that could lead to similar attacks /dev/ttyp0, /dev/hda1, and /dev/cd0 are just a few.

If your application takes a filename as input, here are some variations you might want to test (replace COM1 with various device names):

COM1:other
Other.COM1
COM1.ext
http://www.example.com/COM1
c:\somefolder\COM1\file.txt

Tip

Some filenames need to be created with the CreateFile or CreateFileW API because they cannot be created using the Windows shell. There are also many illegal characters for file and folder names that you can use those APIs to create. Refer to http://msdn.microsoft.com/library/en-us/fileio/fs/naming_a_file.asp for more information about naming a file.

Accessing UNC Shares

A Universal Naming Convention (UNC) share allows access to file and printer resources on both local and remote machines. UNC shares are treated as part of the file system, and users can access them by using a mapped drive letter or the UNC path. For example, if you have a file share named public on a machine called FranksMachine , you could create a mapped drive on a system that runs the Windows operating system by using the command net use x: \\FranksMachine\public . This would allow you to access the files on the newly mapped X drive, or you could use the UNC path notation \\FranksMachine\public to access the share.

Some applications might do simple checking to make sure that a file path specifies a UNC share, whereas other applications might allow only drive letters to be used. Either of these methods could be bypassed. Let s say a backup application wants the backup file specified to be on a local machine. The developer might check that the first character of the path is an alphabetic character followed by a colon. Do you see a problem with this assumption? If a UNC share is mapped to a drive letter, the check will allow the file to be saved to a network share ”that isn t what the developer intended.

On the other hand, an application feature might allow saving only to a network share by using the UNC path method (to prevent a user from saving data to the local machine). If the machine on which the application is running has no UNC shares, the application shouldn t allow the file to be saved. If the only check the developer includes is to look for two backslashes (\\) at the beginning of the path, that does not ensure that only a UNC path is used because it simply ensures that the path begins with two backslashes and it does not protect the application from saving data to the local machine. Here are a few ways a malicious user could supply a path that meets the requirement of starting with two backslashes, but that still allows access to the local machine:

\\FranksMachine\C$
\\127.0.0.1\C$
\\ MACHINE_IP_ADDRESS\ C$
\\?\UNC\127.0.0.1\C$
\\?\c:\

C$ represents a default hidden share for the C drive, also known as an administrative share. Also, ADMIN$ maps to the Windows directory. The Windows operating system automatically creates these default shares for all the local hard disk volumes and requires the user to be an administrator to access the share. Even if the administrative shares are deleted, they are re-created after you stop and start the Server service or restart the computer. To find all the shares on a machine, type net share in a command prompt. You can prevent the Windows operating system from creating these shares automatically by using the AutoShareServer and AutoShareWks registry key settings ( http://support.microsoft.com/default.aspx?scid=kb;en-us;816524 ).

Note	The \\?\ format enables you to extend the Unicode versions of many file-manipulation functions to 32,000 total Unicode characters in the path plus filename by turning off the path parsing in many of the file-handling APIs. No path element can be greater than 256, however.

Understanding Search Paths

When an application is provided a filename to open, but the path is not specified, you are at the mercy of the operating system to determine which file is started. Also, if your application links to dynamic-link libraries (DLLs) without specifying the full path to the file, your application might load a malicious file rather than the one you where expecting ”put in a different way, Trojan DLLs that allow arbitrary code execution. This flaw has caused problems with several applications.

Generally, the search path for loading a DLL is the following:

The set of preinstalled DLLs, such as KERNEL32.DLL and USER32.DLL
The current directory of the executable
The current working directory
The Windows system directory
The Windows directory
The directories specified by the PATH environment variable

Note

Microsoft Windows XP Service Pack 1 (SP1) and later and Windows Server 2003 change the search path so that the system directories are searched first, and then the current directory is searched. However, systems that are upgraded to Microsoft Windows XP SP1 or Windows Server 2003 still default to using the previous search algorithm when loading a DLL. It is best to make sure that you specify the full path of the file you wish to use, rather than letting the operating system decide which file to open.

To find out how your application is loading certain files, you can try the following checks:

Perform a code review to look for places where files are opened. Pay attention to how the name can be manipulated by the attacker. Look for APIs such as LoadLibrary , LoadLibraryEx , CreateProcess , CreateProcessAsUser , CreateProcessWithLogon , WinExec , ShellExecute , SearchPath , CreateFile , CreateFileW , CreateFileA , and the like to make sure the full path is specified and is quoted. Also, APIs that allow command-line arguments to be specified separately, such as CreateProcess , ideally should do so instead of passing the arguments as part of the process path.
Use Microsoft Application Verifier to make sure CreateProcess and similar APIs are called properly. Refer to http://www.microsoft.com/technet/prodtechnol/windows/appcompatibility/appverifier.mspx for more information on Application Verifier.
Attach to a running process in the debugger and see where modules and files are actually loaded from.