Auditing Application Protocols | The Art of Software Security Assessment: Identifying and Preventing Software Vulnerabilities

Before you jump into selected protocols, this section explains some general procedures that are useful when auditing a client or server product. The steps offer brief guidelines for auditing a protocol you're unfamiliar with. If you're already familiar with the protocol, you might be able to skip some early steps.

Note

At the time of this writing, there has been a big trend in examining software that deals with file formats processed by client (and, less often, server) software. The steps outlined in this section could also be applied to examining programs dealing with file formats, as both processes use similar procedures.

Collect Documentation

So how do you audit software that's parsing data in a format you know nothing about? You read the protocol specification, of course! If the protocol is widely used, often there's an RFC or other formal specification detailing its inner workings and what an implementation should adhere to (often available at www.ietf.org/rfc.html). Although specifications can be tedious to read, they're useful to have on hand to help you understand protocol details. Books or Web sites that describe protocols in a more approachable format are usually available, too, so start with an Internet search. Even if you're familiar with a protocol, having these resources available will help refresh your memory, and you might discover recent new features or find some features perform differently than you expected. For proprietary protocols, official documentation might not be available. However, searching the Internet is worth the time, as invariably other people with similar goals have invested time in documenting or reverse-engineering portions of these protocols.

When reading code that implements a protocol, there are two arguments for acquiring additional documentation:

Why not use all the tools you have available at your disposal? There's nothing to lose by reading the specifications, and often they help you quickly understand what certain portions of code are attempting to accomplish.
Reading the documentation can give you a good idea of where things are likely to go wrong and give you a detailed understanding of how the protocol works, which might help you see what could go wrong from a design perspective (discussed in depth in Chapter 2, "Design Review").

Identify Elements of Unknown Protocols

Sometimes you encounter a proprietary protocol with no documentation, which means you have to reverse-engineer it. This skill can take some time to master, so don't be discouraged if you find it cumbersome and difficult the first few times. There are two ways to identify how a protocol works: You can observe the traffic or reverse-engineer the applications that handle the traffic. Both methods have their strengths and weaknesses. Reverse-engineering applications give you a more thorough understanding, but doing so might be impractical in some situations. The following sections present some ideas to help get you on the right track.

Using Packet Sniffers

Packet-sniffing utilities are invaluable tools for identifying fields in unknown protocols. One of the first steps to understanding a protocol is to watch what data is exchanged between two hosts participating in a communication. Many free sniffing tools are available, such as tcpdump (available from www.tcpdump.org/) and Wireshark (previously Ethereal, available from www.wireshark.org/). Of course, the protocol must be unencrypted for these tools to be useful. However, even encrypted protocols usually begin with some sort of initial negotiation, giving you insight into how the protocol works and whether the cryptographic channel is established securely.

One of the most obvious characteristics you'll notice is whether the protocol is binary or text based. With a text-based protocol, you can usually get the hang of how it works because the messages aren't obscured. Binary protocols are more challenging to comprehend by examining packet dumps. Here are some tips for understanding the fields. When reading this section and trying to analyze a protocol, keep in mind the types of fields that usually appear in protocols: connection IDs, length fields, version fields, opcode or result fields, and so on. Most undocumented protocols aren't much different from the multitude of open protocols, and you're likely to find similarities in how proprietary and open protocols work. This chapter focuses on simple one-layer protocols for the sake of clarity. You can apply the same principles to complex multilayer protocols, but analyzing them takes more work and more practice.

Initiate the Connection Several Times

Start at the beginning with connection initiation. Usually, it's easier to start there and branch out. Establishing new connections between the same test hosts multiple times and noting what values change can be useful. Pay special attention to the top of the message, where there's presumably a header of some sort. Note the offsets of data that changes. It's your job to pinpoint why those values changed. Asking yourself some simple questions, such as the following, might help identify the cause of those changes:

Did a single field change by a lot or a little?
Was the change of values in a field drastic? Could it be random, such as a connection ID?
Did the size of the packet change? Did a field change in relation to the size of the packet? Could it be a size field?

Answer these questions and keep detailed notes for each field that changes. Then try to come up with additional questions that might help you determine the purpose of certain fields. Pay attention to how many bytes change in a particular area. For example, if it's two bytes, it's probably a word field; four bytes of change could mean an integer field; and so forth.

Because many protocols are composed of messages that have a similar header format and a varying body, you should write down all the findings you have made and see where else they might apply in the protocol. This method can also help you identify unknown fields. For example, say you have figured out a header format such as the following:

struct header {     unsigned short id;        /* seems random */     unsigned short unknown1;     unsigned long length;     /* packet len including header */ }

You might have deduced that unknown1 is always the value 0x01 during initiation, but in later message exchanges, it changes to 0x03, 0x04, and so forth. You might then infer that unknown1 is a message type or opcode.

Replay Messages

When you examine packet dumps, replaying certain messages with small changes to see how the other side responds can prove helpful. This method can give you insight on what certain fields in the packet represent, how error messages are conveyed, and what certain error codes mean. It's especially useful when the same protocol errors happen later when you replay other messagesa good way to test previous deductions and see whether you were right.

Reverse-Engineering the Application

Reverse-engineering is both a science and an art, and it's a subject that could easily take an entire book to cover. Reverse-engineering isn't covered in depth in this chapter; instead, it's mentioned as a technique that can be used on clients and servers to gain an in-depth understanding of how a protocol works. The following sections introduce the first steps to take to understand a protocol.

Use Symbols

If you can get access to binary code with symbols, by all means, use it! Function names and variable names can provide invaluable information as to what a protocol does. Using these symbols can help isolate the code you need to concentrate on because functions dealing with messages are aptly named. Some programs you audit might have additional files containing symbols and debugging information (such as PDB, Program Debug Database, files for Windows executables). These files are a big help if you can get your hands on them. For instance, you might be doing auditing for a company that refuses to give you its source code but might be open to disclosing debugging builds or PDB files.

Note

Microsoft makes PDB symbol packages available at http://msdl.microsoft.com/, and these timesavers are invaluable tools for gaining insight into Microsoft programs. If getting source code isn't an option, it's recommended that you negotiate with whoever you're doing code auditing for to get debug symbols.

Examine Strings in the Binary

Sometimes binaries don't contain symbols, but they contain strings indicating function names, especially when debugging information has been compiled into the production binary. It's not uncommon to see code constructs such as the following:

#define DEBUG1(x)    if(debug) printf(x) int parse_message(int sock) {     DEBUG1("Entering parse_message\n");     ... process message ... }

Although debugging is turned off for the production release, the strings appear in the binary, so you can see the function names with debugging messages in them.

Strings also come in useful when you're looking for certain strings that appear in the protocol or errors that appear in the protocols or logs. For example, you send a message that disconnects but leaves a log message such as "[fatal]: malformed packet received from 192.168.1.1: invalid packet length." This string tells you that the length field (wherever it appears in the packet) is invalid, and you also have a string to search for. By searching through the binary for "invalid packet length" or similar, you might be able to locate the function that's processing the packet length and, therefore, discover where in the binary to start auditing.

Examine Special Values

As well as helpful strings in the executable, you might find unique binary values in the protocol that can be used to locate code for processing certain messages. These values are commonly found when you're dealing with file formats because they contain "signature" values to identify the file type at the beginning of the file (and possibly in other parts of the file). Although unique signatures are a less common practice in protocols sent over the network (as they're often unnecessary), there might be tag values or something similar in the protocol that have values unlikely to appear naturally in a binary. "Appearing naturally" means that if you search the binary for that value (using an IDA text search on the disassembly), it's unlikely to occur in unrelated parts of the program. For example, the value 0x0C would occur often in a binary, usually as an offset into a structure. Frequent occurrence makes it a poor unique value to search for in the binary. A more unusual value, such as 0x8053, would be a better search choice, as it's unlikely that structures have members at this offset (because the structures would have to be large and because the value is odd, so aligned pointer, integer, and word values don't appear at unaligned memory offsets).

Debug

Debugging messages were mentioned in the section on examining strings, and you saw an example of debugging messages appearing in the compiled code. This means you can turn on debugging and automatically receive all debugging output. Usually, vendors have a command-line option to turn on debugging, but they might remove it for the production release. However, if you cross-reference a debugging string such as "Entering parse_message," you see a memory reference to where the debug variable resides in memory. So you can just change it to nonzero at runtime and receive all the debugging messages you need.

Find Communication Primitives

When all else fails, you can revert to finding entry points you know about; protocol software has to send and receive data at some point. For protocols that operate over TCP, entry points might include read(), recv(), recvmsg(), and WSArecv(). UDP protocols might also use recvfrom() and WSArecvfrom(). Locating where these functions are used points you to where data is read in from the network. Sometimes this method is an easy route to identifying where data is being processed. Unfortunately, it might take some tracing back through several functions, as many applications make wrappers to communication primitives and use them indirectly (by having the communication primitives in the form of class methods). Still, in these cases, you can break on one of the aforementioned functions at runtime and let it return a few times to see where processing is taking place.

Use Library Tracing

Another technique that can aid in figuring out what a program is doing is using system tools to trace the application's library calls or system resource accesses. These tools include TRuss for Solaris, ltrace for Linux, ktrace for BSD, and Filemon/Regmon for Windows (www.sysinternals.com/). This technique is best used with the other techniques described.

Match Data Types with the Protocol

After you're more familiar with a protocol, you start to get a sense of where things could go wrong. Don't worry if this doesn't happen right away; the more experience you get, the more you develop a feel for potential problem areas. One way to identify potential problem areas is to analyze the structure of untrusted data processed by a server or client application, and then match elements of those structures with vulnerability classes covered in this book, as explained in the following sections.

Binary Protocols

Binary protocols express protocol messages in a structural format that's not readable by humans. Text data can be included in parts of the protocol, but you also find elements in nontext formats, such as integers or Booleans. Domain Name System (DNS) is one example of a binary protocol; it uses bit fields to represent status information, two-byte integer fields to represent lengths and other data (such as IDs), and counted text fields to represent domain labels.

Binary protocols transmit data in a form that's immediately recognizable by the languages that implement servers and clients. Therefore, they are more susceptible to boundary condition vulnerabilities when dealing with those data types. Specifically, when dealing with integers, a lot of the typing issues discussed in Chapter 6, "C Language Issues," are relevant. For this reason, the following sections summarize integer-related vulnerabilities that commonly occur in binary protocols.

Integer Overflows and 32-Bit Length Values

Integer overflows often occur when 32-bit length variables are used in protocols to dynamically allocate space for user-supplied data. This vulnerability usually results in heap corruption, allowing a remote attacker to crash the application performing the parsing or, in many cases, exploit the bug to run arbitrary code. This code shows a basic example of an integer overflow when reading a text string:

char *read_string(int sock) {    char *string;    size_t length;    if(read(sock, (void *)&length, sizeof(length)) !=            sizeof(length))        return NULL;    length = ntohl(length);    string = (char *)calloc(length+1, sizeof(char));    if(string == NULL)        return NULL;    if(read_bytes(sock, string, length) < 0){        free(string);        return NULL;    }    string[length] = '\0';    return string; }

In the fictitious protocol the code is parsing, a 32-bit length is supplied, indicating the length of the string followed by the string data. Because the length value isn't checked, a value of the highest representable integer (0xFFFFFFFF) triggers an integer overflow when 1 is added to it in the call to calloc().

Integer Underflows and 32-Bit Length Values

Integer underflows typically occur when related variables aren't adequately checked against each other to enforce a relationship, as shown in this example:

struct _pkthdr {     unsigned int operation;     unsigned int id;     unsigned int size; }; struct _tlv {     unsigned short type, length;     char value[0]; } int read_packet(int sock) {     struct _pkthdr header;     struct _tlv tlv;     char *data;     size_t length;     if(read_header(sock, &header) < 0)         return 1;     data = (char *)calloc(header.size, sizeof(char));     if(data == NULL)         return 1;     if(read_data(sock, data, header.size) < 0){         free(data);         return 1;     }     for(length = header.size; length > sizeof(struct tlv); ){         if(read_tlv(sock, &tlv) < 0)             goto fail;         ... process tlv ...         length -= tlv.length;     }     return 0; }

In this fictitious protocol, a packet consists of a header followed by a series of type, length, and value (TLV) structures. There's no check between the size in the packet header and the size in the TLV being processed. In fact, the TLV length field can be bigger than the length in the packet header. Sending this packet would cause the length variable to underflow and the loop of TLV processing to continue indefinitely, processing arbitrary process memory until it hits the end of the segment and crashes.

Integer underflows can also occur when length values are required to hold a minimum length, but the parsing code never verifies this requirement. For example, a binary protocol has a header containing an integer specifying the packet size. The packet size is supposed to be at least the size of the header plus any remaining data. Here's an example:

#define MAX_PACKET_SIZE 512 #define PACKET_HDR_SIZE 12 struct pkthdr {     unsigned short type, operation;     unsigned long id;     unsigned long length; } int read_header(int sock, struct pkthdr *hdr) {     hdr->type = read_short(sock);     hdr->operation = read_short(sock);     hdr->id = read_long(sock);     hdr->length = read_long(sock);     return 0; } int read_packet(int sock) {     struct pkthdr header;     char data[MAX_PACKET_SIZE];     if(read_header(sock, &header) < 0)         return 1;     if(hdr.length > MAX_PACKET_SIZE)         return 1;     if(read_bytes(sock, data, hdr.length  PACKET_HDR_SIZE) < 0)         return 1;     ... process data ... }

This code assumes that hdr.length is at least PACKET_HDR_SIZE (12) bytes long, but this is never verified. Therefore, the read_bytes() size parameter can be underflowed if hdr.length is less than 12, resulting in a stack overflow.

Small Data Types

The issues with length specifiers smaller than 32 bits (8- or 16-bit lengths) are a bit different from issues with large 32-bit sizes. First, sign-extension issues are more relevant because programs often natively use 32-bit variables, even when dealing with smaller data types. These sign-extension issues can result in memory corruption or possibly denial-of-service conditions. Listing 16-1 shows a simple example of DNS server code.

Listing 16-1. Name Validation Denial of Service

.text:0101D791 .text:0101D791                 push    ebx .text:0101D792                 push    esi .text:0101D793                 mov     esi, [esp+arg_0] .text:0101D797                 xor     ebx, ebx .text:0101D799                 movzx   edx, byte ptr [esi] .text:0101D79C                 lea     eax, [esi+2] .text:0101D79F                 mov     ecx, eax .text:0101D7A1                 add     ecx, edx .text:0101D7A3 .text:0101D7A3 loc_101D7A3:                           ; CODE XREF: Name_ValidateCountName(x)+21 .text:0101D7A3                 cmp     eax, ecx .text:0101D7A5                 jnb     short loc_101D7B6 .text:0101D7A7                 movsx   edx, byte ptr [eax] .text:0101D7AA                 inc     eax .text:0101D7AB                 test    edx, edx .text:0101D7AD                 jz      short loc_101D7B4 .text:0101D7AF                 add     eax, edx .text:0101D7B1                 inc     ebx .text:0101D7B2                 jmp     short loc_101D7A3

This piece of assembly code contains a sign-extension problem (which is bolded). It roughly translates to this C code:

int Name_ValidateCountName(char *name) {     char *ptr = name + 2;     unsigned int length = *(unsigned char *)name;     for(ptr = name + 2, end = ptr + length; ptr < end; )     {         int string_length = *ptr++;         if(!domain_length)             break;         ptr += domain_length;      }      ... }

This code loops through a series of counted strings until it reaches the end of the data region. Because the pointer is pointing to a signed character type, it's sign-extended when it's stored as an integer. Therefore, you can jump backward to data appearing earlier in the buffer and create a situation that causes an infinite loop. You could also jump to data in random memory contents situated before the beginning of the buffer with undefined results.

Note

In fact, the length parameter at the beginning of the function isn't validated against anything. So based on this code, you should be able to indicate that the size of the record being processed is larger than it really is; therefore, you can process memory contents past the end of the buffer.

Text-Based Protocols

Text-based protocols tend to have different classes of vulnerabilities than binary protocols. Most vulnerabilities in binary protocol implementations result from type conversions and arithmetic boundary conditions. Text-based protocols, on the other hand, tend to contain vulnerabilities related more to text processingstandard buffer overflows, pointer arithmetic errors, off-by-one errors, and so forth.

Note

One exception is text-based protocols specifying lengths in text that are converted to integers, such as the Content-Length HTTP header discussed in "Posting Data" later in this chapter.

Buffer Overflows

Because text-based protocols primarily manipulate strings, they are more vulnerable to simpler types of buffer overflows than to type conversion errors. Text-based protocol vulnerabilities include buffer overflows resulting from unsafe use of string functions (discussed in Chapter 9, "Strings and Metacharacters"), as shown in this simple example:

int smtp_respond(int fd, int code, char *fmt, ...) {     char buf[1024];     va_list ap;     sprintf(buf, "%d ", code);     va_start(ap, fmt);     vsprintf(buf+strlen(buf), fmt, ap);     va_end(ap);     return write(fd, buf, strlen(buf)); } int smtp_docommand(int fd) {     char *host, *line;     char commandline[1024];     if(read_line(fd, commandline, sizeof(commandline)-1) < 0)         return -1;     if(getcommand(commandline, &line) < 0)         return -1;     switch(smtpcommand)     {         case EHLO:         case HELO:             host = line;             smtp_respond(fd, SMTP_SUCCESS,                         "hello %s, nice to meet you\n", host);             break;         ...     } }

The smtp_respond() function causes problems when users supply long strings as arguments, which they can do in smtp_docommand(). Simple buffer overflows like this one are more likely to occur in applications that haven't been audited thoroughly, as programmers are usually more aware of the dangers of using strcpy() and similar functions. These simple bugs still pop up from time to time, however.

Pointer arithmetic errors are more common than these simple bugs because they are generally more subtle. It's fairly easy to make a mistake when dealing with pointers, especially off-by-one errors (discussed in more detail in Chapter 7). These mistakes are especially likely when there are multiple elements in a single line of text (as in most text-based protocols).

Text-Formatting Issues

Using text strings opens the doors for specially crafted strings that might cause the program to behave in an unexpected way. With text strings, you need to pay attention to string-formatting issues (discussed in Chapter 8, "Program Building Blocks") and resource accesses (discussed in more detail in "Access to System Resources"). However, you need to keep your eye out for other problems in text data decoding implementations, such as faulty hexadecimal or UTF-8 decoding routines. Text elements might also introduce the potential for format string vulnerabilities in the code.

Note

Format string vulnerabilities can occur in applications that deal with binary or text-based formats. However, they're more likely to be exploitable in applications dealing with text-based protocols because they are more likely to accept a format string from an untrusted source.

Data Verification

In many protocols, the modification (or forgery) of exchanged data can represent a security threat. When analyzing a protocol, you must identify the potential risks if false data is accepted as valid and whether the protocol has taken steps to prevent modifications or forgeries. To determine whether data needs to be secured, ask these simple questions:

Is it dangerous for third parties to read the information going across the network?
Could forged or modified data result in a security breach of the receiver?

If the answer to the first question is yes, is encryption necessary? This chapter doesn't cover the details of validating the strength of a cryptographic implementation, but you can refer to the discussion of confidentiality in Chapter 2 on enforcing this requirement in design. If the answer to the second question is yes, verification of data might be required. Again, if cryptographic hashing is already used, you need to verify whether it's being applied in a secure fashion, as explained in Chapter 2. Forging data successfully usually requires that the protocol operate over UDP rather than TCP because TCP is generally considered adequate protection against forged messages. However, modification is an issue for protocols that operate over both UDP and TCP.

If you're auditing a well-known and widely used protocol, you need not worry excessively about answering the questions on authentication and sensitivity of information. Standards groups have already performed a lot of public validation. However, any implementation could have a broken authentication mechanism or insecure use of a cryptographic protocol. For example, DNS message forging using the DNS ID field is covered in "DNS Spoofing" later in this chapter. This issue is the result of a weakness in the DNS protocol; however, whether a DNS client or server is vulnerable depends on certain implementation decisions affecting how random the DNS ID field is.

Access to System Resources

A number of protocols allow users access to system resources explicitly or implicitly. With explicit access, users request resources from the system and are granted or denied access depending on their credentials, and the protocol is usually designed as a way for users to have remote access to some system resources. HTTP is an example of just such a protocol; it gives clients access to files on the system and other resources through the use of Web applications or scripts. Another example is the Registry service available on versions of Microsoft Windows over RPC.

Implicit access is more of an implementation issue; the protocol might not be designed to explicitly share certain resources, but the implementation provisions access to support the protocols functionality. For example, you might audit a protocol that uses data values from a client request to build a Registry key that's queried or even written to. This access isn't mentioned in the protocol specification and happens transparently to users. Implicit access is often much less protected that explicit access because a protocol usually outlines a security model for handling explicit resource access. Additionally, explicit resource accesses are part of the protocol's intended purpose, so people tend to focus more on security measures for explicit resource access. Of course, they might be unaware of implicit accesses that happen when certain requests are made.

When you audit an application protocol, you should note any instances in which clients can access resourcesimplicitly and explicitlyon the system, including reading resources, modifying existing resources, and creating new ones. Any application accesses quite a lot of resources, and it's up to you to determine which resource accesses are important in terms of security. For example, an application might open a configuration file in a static location before it even starts listening for network traffic. This resource access probably isn't important because clients can't influence any part of the pathname to the file or any part of the file data. (However, the data in the file is important in other parts of the audit because it defines behavioral characteristics for the application to adhere to.)

After you note all accesses that are interesting from a security perspective, you need to determine any potential dangers of handling these resources. To start, ask the following questions:

Is credential verification for accessing the resource adequate? You need to determine whether users should be allowed to access a resource the application provides. Maybe no credentials are required, and this is fine for a regular HTTP server providing access to public HTML documents, for example. For resources that do require some level of authentication, is that authentication checked adequately? The answer depends on how the authentication algorithm is designed and implemented. Some algorithms rely on cryptographic hashes; others might require passwords or just usernames, ala RPC_AUTH_UNIX. Even if cryptography is used, it doesn't mean authentication is foolproof. Small implementation oversights can lead to major problems. Refer to Chapter 2 to help you determine whether any cryptographic authentication in use is adequate for your purposes.
Does the application give access to resources that it's supposed to? Often an application intends to give access to a strict subset of resources, but the implementation is flawed and specially crafted requests might result in disclosure of resources that should be off-limits. For example, the Line Printer Daemon (LPD) service takes files from a client and puts them in a spool directory for printing. However, if filenames are supplied with leading double dots (..), some implementations erroneously allowed connecting clients to place files anywhere on the system! When assessing an application for similar problems, the material from Chapter 8 offers detailed information on reviewing code that handles path-based access to resources.