Before you jump into selected protocols, this section explains some general procedures that are useful when auditing a client or server product. The steps offer brief guidelines for auditing a protocol you're unfamiliar with. If you're already familiar with the protocol, you might be able to skip some early steps. Note At the time of this writing, there has been a big trend in examining software that deals with file formats processed by client (and, less often, server) software. The steps outlined in this section could also be applied to examining programs dealing with file formats, as both processes use similar procedures. Collect DocumentationSo how do you audit software that's parsing data in a format you know nothing about? You read the protocol specification, of course! If the protocol is widely used, often there's an RFC or other formal specification detailing its inner workings and what an implementation should adhere to (often available at www.ietf.org/rfc.html). Although specifications can be tedious to read, they're useful to have on hand to help you understand protocol details. Books or Web sites that describe protocols in a more approachable format are usually available, too, so start with an Internet search. Even if you're familiar with a protocol, having these resources available will help refresh your memory, and you might discover recent new features or find some features perform differently than you expected. For proprietary protocols, official documentation might not be available. However, searching the Internet is worth the time, as invariably other people with similar goals have invested time in documenting or reverse-engineering portions of these protocols. When reading code that implements a protocol, there are two arguments for acquiring additional documentation:
Identify Elements of Unknown ProtocolsSometimes you encounter a proprietary protocol with no documentation, which means you have to reverse-engineer it. This skill can take some time to master, so don't be discouraged if you find it cumbersome and difficult the first few times. There are two ways to identify how a protocol works: You can observe the traffic or reverse-engineer the applications that handle the traffic. Both methods have their strengths and weaknesses. Reverse-engineering applications give you a more thorough understanding, but doing so might be impractical in some situations. The following sections present some ideas to help get you on the right track. Using Packet SniffersPacket-sniffing utilities are invaluable tools for identifying fields in unknown protocols. One of the first steps to understanding a protocol is to watch what data is exchanged between two hosts participating in a communication. Many free sniffing tools are available, such as tcpdump (available from www.tcpdump.org/) and Wireshark (previously Ethereal, available from www.wireshark.org/). Of course, the protocol must be unencrypted for these tools to be useful. However, even encrypted protocols usually begin with some sort of initial negotiation, giving you insight into how the protocol works and whether the cryptographic channel is established securely. One of the most obvious characteristics you'll notice is whether the protocol is binary or text based. With a text-based protocol, you can usually get the hang of how it works because the messages aren't obscured. Binary protocols are more challenging to comprehend by examining packet dumps. Here are some tips for understanding the fields. When reading this section and trying to analyze a protocol, keep in mind the types of fields that usually appear in protocols: connection IDs, length fields, version fields, opcode or result fields, and so on. Most undocumented protocols aren't much different from the multitude of open protocols, and you're likely to find similarities in how proprietary and open protocols work. This chapter focuses on simple one-layer protocols for the sake of clarity. You can apply the same principles to complex multilayer protocols, but analyzing them takes more work and more practice. Initiate the Connection Several TimesStart at the beginning with connection initiation. Usually, it's easier to start there and branch out. Establishing new connections between the same test hosts multiple times and noting what values change can be useful. Pay special attention to the top of the message, where there's presumably a header of some sort. Note the offsets of data that changes. It's your job to pinpoint why those values changed. Asking yourself some simple questions, such as the following, might help identify the cause of those changes:
Answer these questions and keep detailed notes for each field that changes. Then try to come up with additional questions that might help you determine the purpose of certain fields. Pay attention to how many bytes change in a particular area. For example, if it's two bytes, it's probably a word field; four bytes of change could mean an integer field; and so forth. Because many protocols are composed of messages that have a similar header format and a varying body, you should write down all the findings you have made and see where else they might apply in the protocol. This method can also help you identify unknown fields. For example, say you have figured out a header format such as the following: struct header { unsigned short id; /* seems random */ unsigned short unknown1; unsigned long length; /* packet len including header */ } You might have deduced that unknown1 is always the value 0x01 during initiation, but in later message exchanges, it changes to 0x03, 0x04, and so forth. You might then infer that unknown1 is a message type or opcode. Replay MessagesWhen you examine packet dumps, replaying certain messages with small changes to see how the other side responds can prove helpful. This method can give you insight on what certain fields in the packet represent, how error messages are conveyed, and what certain error codes mean. It's especially useful when the same protocol errors happen later when you replay other messagesa good way to test previous deductions and see whether you were right. Reverse-Engineering the ApplicationReverse-engineering is both a science and an art, and it's a subject that could easily take an entire book to cover. Reverse-engineering isn't covered in depth in this chapter; instead, it's mentioned as a technique that can be used on clients and servers to gain an in-depth understanding of how a protocol works. The following sections introduce the first steps to take to understand a protocol. Use SymbolsIf you can get access to binary code with symbols, by all means, use it! Function names and variable names can provide invaluable information as to what a protocol does. Using these symbols can help isolate the code you need to concentrate on because functions dealing with messages are aptly named. Some programs you audit might have additional files containing symbols and debugging information (such as PDB, Program Debug Database, files for Windows executables). These files are a big help if you can get your hands on them. For instance, you might be doing auditing for a company that refuses to give you its source code but might be open to disclosing debugging builds or PDB files. Note Microsoft makes PDB symbol packages available at http://msdl.microsoft.com/, and these timesavers are invaluable tools for gaining insight into Microsoft programs. If getting source code isn't an option, it's recommended that you negotiate with whoever you're doing code auditing for to get debug symbols. Examine Strings in the BinarySometimes binaries don't contain symbols, but they contain strings indicating function names, especially when debugging information has been compiled into the production binary. It's not uncommon to see code constructs such as the following: #define DEBUG1(x) if(debug) printf(x) int parse_message(int sock) { DEBUG1("Entering parse_message\n"); ... process message ... } Although debugging is turned off for the production release, the strings appear in the binary, so you can see the function names with debugging messages in them. Strings also come in useful when you're looking for certain strings that appear in the protocol or errors that appear in the protocols or logs. For example, you send a message that disconnects but leaves a log message such as "[fatal]: malformed packet received from 192.168.1.1: invalid packet length." This string tells you that the length field (wherever it appears in the packet) is invalid, and you also have a string to search for. By searching through the binary for "invalid packet length" or similar, you might be able to locate the function that's processing the packet length and, therefore, discover where in the binary to start auditing. Examine Special ValuesAs well as helpful strings in the executable, you might find unique binary values in the protocol that can be used to locate code for processing certain messages. These values are commonly found when you're dealing with file formats because they contain "signature" values to identify the file type at the beginning of the file (and possibly in other parts of the file). Although unique signatures are a less common practice in protocols sent over the network (as they're often unnecessary), there might be tag values or something similar in the protocol that have values unlikely to appear naturally in a binary. "Appearing naturally" means that if you search the binary for that value (using an IDA text search on the disassembly), it's unlikely to occur in unrelated parts of the program. For example, the value 0x0C would occur often in a binary, usually as an offset into a structure. Frequent occurrence makes it a poor unique value to search for in the binary. A more unusual value, such as 0x8053, would be a better search choice, as it's unlikely that structures have members at this offset (because the structures would have to be large and because the value is odd, so aligned pointer, integer, and word values don't appear at unaligned memory offsets). DebugDebugging messages were mentioned in the section on examining strings, and you saw an example of debugging messages appearing in the compiled code. This means you can turn on debugging and automatically receive all debugging output. Usually, vendors have a command-line option to turn on debugging, but they might remove it for the production release. However, if you cross-reference a debugging string such as "Entering parse_message," you see a memory reference to where the debug variable resides in memory. So you can just change it to nonzero at runtime and receive all the debugging messages you need. Find Communication PrimitivesWhen all else fails, you can revert to finding entry points you know about; protocol software has to send and receive data at some point. For protocols that operate over TCP, entry points might include read(), recv(), recvmsg(), and WSArecv(). UDP protocols might also use recvfrom() and WSArecvfrom(). Locating where these functions are used points you to where data is read in from the network. Sometimes this method is an easy route to identifying where data is being processed. Unfortunately, it might take some tracing back through several functions, as many applications make wrappers to communication primitives and use them indirectly (by having the communication primitives in the form of class methods). Still, in these cases, you can break on one of the aforementioned functions at runtime and let it return a few times to see where processing is taking place. Use Library TracingAnother technique that can aid in figuring out what a program is doing is using system tools to trace the application's library calls or system resource accesses. These tools include TRuss for Solaris, ltrace for Linux, ktrace for BSD, and Filemon/Regmon for Windows (www.sysinternals.com/). This technique is best used with the other techniques described. Match Data Types with the ProtocolAfter you're more familiar with a protocol, you start to get a sense of where things could go wrong. Don't worry if this doesn't happen right away; the more experience you get, the more you develop a feel for potential problem areas. One way to identify potential problem areas is to analyze the structure of untrusted data processed by a server or client application, and then match elements of those structures with vulnerability classes covered in this book, as explained in the following sections. Binary ProtocolsBinary protocols express protocol messages in a structural format that's not readable by humans. Text data can be included in parts of the protocol, but you also find elements in nontext formats, such as integers or Booleans. Domain Name System (DNS) is one example of a binary protocol; it uses bit fields to represent status information, two-byte integer fields to represent lengths and other data (such as IDs), and counted text fields to represent domain labels. Binary protocols transmit data in a form that's immediately recognizable by the languages that implement servers and clients. Therefore, they are more susceptible to boundary condition vulnerabilities when dealing with those data types. Specifically, when dealing with integers, a lot of the typing issues discussed in Chapter 6, "C Language Issues," are relevant. For this reason, the following sections summarize integer-related vulnerabilities that commonly occur in binary protocols. Integer Overflows and 32-Bit Length ValuesInteger overflows often occur when 32-bit length variables are used in protocols to dynamically allocate space for user-supplied data. This vulnerability usually results in heap corruption, allowing a remote attacker to crash the application performing the parsing or, in many cases, exploit the bug to run arbitrary code. This code shows a basic example of an integer overflow when reading a text string: char *read_string(int sock) { char *string; size_t length; if(read(sock, (void *)&length, sizeof(length)) != sizeof(length)) return NULL; length = ntohl(length); string = (char *)calloc(length+1, sizeof(char)); if(string == NULL) return NULL; if(read_bytes(sock, string, length) < 0){ free(string); return NULL; } string[length] = '\0'; return string; } In the fictitious protocol the code is parsing, a 32-bit length is supplied, indicating the length of the string followed by the string data. Because the length value isn't checked, a value of the highest representable integer (0xFFFFFFFF) triggers an integer overflow when 1 is added to it in the call to calloc(). Integer Underflows and 32-Bit Length ValuesInteger underflows typically occur when related variables aren't adequately checked against each other to enforce a relationship, as shown in this example: struct _pkthdr { unsigned int operation; unsigned int id; unsigned int size; }; struct _tlv { unsigned short type, length; char value[0]; } int read_packet(int sock) { struct _pkthdr header; struct _tlv tlv; char *data; size_t length; if(read_header(sock, &header) < 0) return 1; data = (char *)calloc(header.size, sizeof(char)); if(data == NULL) return 1; if(read_data(sock, data, header.size) < 0){ free(data); return 1; } for(length = header.size; length > sizeof(struct tlv); ){ if(read_tlv(sock, &tlv) < 0) goto fail; ... process tlv ... length -= tlv.length; } return 0; } In this fictitious protocol, a packet consists of a header followed by a series of type, length, and value (TLV) structures. There's no check between the size in the packet header and the size in the TLV being processed. In fact, the TLV length field can be bigger than the length in the packet header. Sending this packet would cause the length variable to underflow and the loop of TLV processing to continue indefinitely, processing arbitrary process memory until it hits the end of the segment and crashes. Integer underflows can also occur when length values are required to hold a minimum length, but the parsing code never verifies this requirement. For example, a binary protocol has a header containing an integer specifying the packet size. The packet size is supposed to be at least the size of the header plus any remaining data. Here's an example: #define MAX_PACKET_SIZE 512 #define PACKET_HDR_SIZE 12 struct pkthdr { unsigned short type, operation; unsigned long id; unsigned long length; } int read_header(int sock, struct pkthdr *hdr) { hdr->type = read_short(sock); hdr->operation = read_short(sock); hdr->id = read_long(sock); hdr->length = read_long(sock); return 0; } int read_packet(int sock) { struct pkthdr header; char data[MAX_PACKET_SIZE]; if(read_header(sock, &header) < 0) return 1; if(hdr.length > MAX_PACKET_SIZE) return 1; if(read_bytes(sock, data, hdr.length PACKET_HDR_SIZE) < 0) return 1; ... process data ... } This code assumes that hdr.length is at least PACKET_HDR_SIZE (12) bytes long, but this is never verified. Therefore, the read_bytes() size parameter can be underflowed if hdr.length is less than 12, resulting in a stack overflow. Small Data TypesThe issues with length specifiers smaller than 32 bits (8- or 16-bit lengths) are a bit different from issues with large 32-bit sizes. First, sign-extension issues are more relevant because programs often natively use 32-bit variables, even when dealing with smaller data types. These sign-extension issues can result in memory corruption or possibly denial-of-service conditions. Listing 16-1 shows a simple example of DNS server code. Listing 16-1. Name Validation Denial of Service
This piece of assembly code contains a sign-extension problem (which is bolded). It roughly translates to this C code: int Name_ValidateCountName(char *name) { char *ptr = name + 2; unsigned int length = *(unsigned char *)name; for(ptr = name + 2, end = ptr + length; ptr < end; ) { int string_length = *ptr++; if(!domain_length) break; ptr += domain_length; } ... } This code loops through a series of counted strings until it reaches the end of the data region. Because the pointer is pointing to a signed character type, it's sign-extended when it's stored as an integer. Therefore, you can jump backward to data appearing earlier in the buffer and create a situation that causes an infinite loop. You could also jump to data in random memory contents situated before the beginning of the buffer with undefined results. Note In fact, the length parameter at the beginning of the function isn't validated against anything. So based on this code, you should be able to indicate that the size of the record being processed is larger than it really is; therefore, you can process memory contents past the end of the buffer. Text-Based ProtocolsText-based protocols tend to have different classes of vulnerabilities than binary protocols. Most vulnerabilities in binary protocol implementations result from type conversions and arithmetic boundary conditions. Text-based protocols, on the other hand, tend to contain vulnerabilities related more to text processingstandard buffer overflows, pointer arithmetic errors, off-by-one errors, and so forth. Note One exception is text-based protocols specifying lengths in text that are converted to integers, such as the Content-Length HTTP header discussed in "Posting Data" later in this chapter. Buffer OverflowsBecause text-based protocols primarily manipulate strings, they are more vulnerable to simpler types of buffer overflows than to type conversion errors. Text-based protocol vulnerabilities include buffer overflows resulting from unsafe use of string functions (discussed in Chapter 9, "Strings and Metacharacters"), as shown in this simple example: int smtp_respond(int fd, int code, char *fmt, ...) { char buf[1024]; va_list ap; sprintf(buf, "%d ", code); va_start(ap, fmt); vsprintf(buf+strlen(buf), fmt, ap); va_end(ap); return write(fd, buf, strlen(buf)); } int smtp_docommand(int fd) { char *host, *line; char commandline[1024]; if(read_line(fd, commandline, sizeof(commandline)-1) < 0) return -1; if(getcommand(commandline, &line) < 0) return -1; switch(smtpcommand) { case EHLO: case HELO: host = line; smtp_respond(fd, SMTP_SUCCESS, "hello %s, nice to meet you\n", host); break; ... } } The smtp_respond() function causes problems when users supply long strings as arguments, which they can do in smtp_docommand(). Simple buffer overflows like this one are more likely to occur in applications that haven't been audited thoroughly, as programmers are usually more aware of the dangers of using strcpy() and similar functions. These simple bugs still pop up from time to time, however. Pointer arithmetic errors are more common than these simple bugs because they are generally more subtle. It's fairly easy to make a mistake when dealing with pointers, especially off-by-one errors (discussed in more detail in Chapter 7). These mistakes are especially likely when there are multiple elements in a single line of text (as in most text-based protocols). Text-Formatting IssuesUsing text strings opens the doors for specially crafted strings that might cause the program to behave in an unexpected way. With text strings, you need to pay attention to string-formatting issues (discussed in Chapter 8, "Program Building Blocks") and resource accesses (discussed in more detail in "Access to System Resources"). However, you need to keep your eye out for other problems in text data decoding implementations, such as faulty hexadecimal or UTF-8 decoding routines. Text elements might also introduce the potential for format string vulnerabilities in the code. Note Format string vulnerabilities can occur in applications that deal with binary or text-based formats. However, they're more likely to be exploitable in applications dealing with text-based protocols because they are more likely to accept a format string from an untrusted source. Data VerificationIn many protocols, the modification (or forgery) of exchanged data can represent a security threat. When analyzing a protocol, you must identify the potential risks if false data is accepted as valid and whether the protocol has taken steps to prevent modifications or forgeries. To determine whether data needs to be secured, ask these simple questions:
If the answer to the first question is yes, is encryption necessary? This chapter doesn't cover the details of validating the strength of a cryptographic implementation, but you can refer to the discussion of confidentiality in Chapter 2 on enforcing this requirement in design. If the answer to the second question is yes, verification of data might be required. Again, if cryptographic hashing is already used, you need to verify whether it's being applied in a secure fashion, as explained in Chapter 2. Forging data successfully usually requires that the protocol operate over UDP rather than TCP because TCP is generally considered adequate protection against forged messages. However, modification is an issue for protocols that operate over both UDP and TCP. If you're auditing a well-known and widely used protocol, you need not worry excessively about answering the questions on authentication and sensitivity of information. Standards groups have already performed a lot of public validation. However, any implementation could have a broken authentication mechanism or insecure use of a cryptographic protocol. For example, DNS message forging using the DNS ID field is covered in "DNS Spoofing" later in this chapter. This issue is the result of a weakness in the DNS protocol; however, whether a DNS client or server is vulnerable depends on certain implementation decisions affecting how random the DNS ID field is. Access to System ResourcesA number of protocols allow users access to system resources explicitly or implicitly. With explicit access, users request resources from the system and are granted or denied access depending on their credentials, and the protocol is usually designed as a way for users to have remote access to some system resources. HTTP is an example of just such a protocol; it gives clients access to files on the system and other resources through the use of Web applications or scripts. Another example is the Registry service available on versions of Microsoft Windows over RPC. Implicit access is more of an implementation issue; the protocol might not be designed to explicitly share certain resources, but the implementation provisions access to support the protocols functionality. For example, you might audit a protocol that uses data values from a client request to build a Registry key that's queried or even written to. This access isn't mentioned in the protocol specification and happens transparently to users. Implicit access is often much less protected that explicit access because a protocol usually outlines a security model for handling explicit resource access. Additionally, explicit resource accesses are part of the protocol's intended purpose, so people tend to focus more on security measures for explicit resource access. Of course, they might be unaware of implicit accesses that happen when certain requests are made. When you audit an application protocol, you should note any instances in which clients can access resourcesimplicitly and explicitlyon the system, including reading resources, modifying existing resources, and creating new ones. Any application accesses quite a lot of resources, and it's up to you to determine which resource accesses are important in terms of security. For example, an application might open a configuration file in a static location before it even starts listening for network traffic. This resource access probably isn't important because clients can't influence any part of the pathname to the file or any part of the file data. (However, the data in the file is important in other parts of the audit because it defines behavioral characteristics for the application to adhere to.) After you note all accesses that are interesting from a security perspective, you need to determine any potential dangers of handling these resources. To start, ask the following questions:
|