Hypertext Transfer Protocol | The Art of Software Security Assessment: Identifying and Preventing Software Vulnerabilities

Hypertext Transfer Protocol (HTTP) is used to serve dynamic and static content from servers to clients (typically Web browsers). It's a text-based protocol, so many of the vulnerabilities in C/C++ HTTP implementations result from string manipulation errorsbuffer overflows or incorrect pointer arithmetic.

Note

The popularity of HTTP has caused its design to influence a number of other protocols, such as RTSP (Real Time Streaming Protocol) and SIP (Session Initiation Protocol). These similarities in design generally lead to similar problem areas in the implementation, so you can leverage your knowledge of one in reviewing the other.

HTTP is discussed in more depth when covering Web applications in Chapter 17, "Web Applications," but this section gives you a quick overview. HTTP requests are composed of a series of headers delineated by end-of-line markers (CRLF, or carriage return and linefeed). The first line is a mandatory header indicating the method the client wants to perform, the resource the client wants to access, and the HTTP version. Here's an example:

GET /cgi-bin/resource.cgi?name=bob HTTP/1.0

The method describes what the client wants to do with the requested resource. Typically, only GET, HEAD, and POST are used for everyday Web browsing. Chapter 17 lists several additional request methods.

Header Parsing

One of the most basic units of HTTP communication is the HTTP header, which is simply a name and value pair in the following format:

name: value

Headers can generally have any name and value. The HTTP server handling the request simply ignores a header it doesn't recognize; that is, the unknown header is stored with the rest of the headers and passed to any invoked component, but no special processing occurs. The code for parsing headers is fairly simple, so it's unlikely to contain vulnerabilities. However, a special type of header, known as a folded header, is more complex and could lead to processing vulnerabilities.

Headers are usually one line long, but the HTTP specification allows multiline headers, which have a normal first line followed by indented lines, as shown:

name: value data     more value data     even more value data

HTTP servers that support this header might make assumptions about the maximum size of a header and copy too much data when encountering folded headers, as shown in this example:

int read_header(int soc, char **buffer) {     static char scratch[HTTP_MAX_HEADER], *line;     unsigned int size = HTTP_MAX_HEADER, read_bytes = 0;     int rc;     char c;     for(line = scratch;;){         if((rc = read_line(sock, line+read_bytes,                           HTTP_MAX_HEADER)) < 0)             return 1;         if(peek_char(sock, &c) < 0)             return 1;         if(c != '\t' && c != ' ')             return line;         size += HTTP_MAX_HEADER;         if(line == scratch)             line = (char *)malloc(size);         else             line = (char *)realloc(line, size);         if(line == NULL)             return 1;         read_bytes += rc;      } } struct list *read_headers(int sock) {     char *buffer;     struct list *headers;     LIST_INIT(headers);     for(;;){         if(read_header(sock, &buffer) < 0){             LIST_DESTROY(headers);             return NULL;         }     } } int log_user_agent(char *useragent) {     char buf[HTTP_MAX_HEADER*2];     sprintf(buf, "agent: %s\n", useragent);     log_string(buf);     return 0; }

The log_user_agent() function has an obvious overflow, but normally, it couldn't be triggered because the read_header() function reads at most HTTP_MAX_HEADER bytes per line, and the buffer in log_user_agent() is twice as big as that. Developers sometimes use less safe data manipulation when they think supplying malicious input isn't possible. In this case, however, that assumption is incorrect because arbitrarily large headers can be supplied by using header folding.

Accessing Resources

Exposing resources to clients (especially unauthenticated ones) can be dangerous, but the whole point of an HTTP server is to serve content to clients. However, the code for requesting access to resources must be careful. There are hundreds of examples of HTTP servers disclosing arbitrary files on the file system, as shown in this simple example of a bug:

char *webroot = "/var/www"; int open_resource(char *url) {     char buf[MAXPATH];     snprintf(buf, sizeof(buf), "%s/%s", webroot, url);     return open(buf, O_RDONLY); }

This code is intended to open a client-requested file from the /var/www directory, but the client can simply request a file beginning with ../../ and access any file on the system. This is possible because no checking is done to handle dots in the filename. HTTP servers are also particularly vulnerable to encoding-related traversal bugs. You saw an example in Chapter 8, but here's another simple example:

char *webroot = "/var/www"; void hex_decode(char *path) {     char *srcptr, *destptr;     for(srcptr = destptr = path; *srcptr; srcptr++){         if(*srcptr != '%' || (!srcptr[1] || !srcptr[2])){             *destptr++ = *srcptr;             continue;         }         *destptr++ = convert_bytes(&srcptr[1]);             srcptr += 2;         }         *destptr = '\0';         return; } int open_resource(char *url) {     char buf[MAXPATH];     if(strstr(url, ".."))         return -1; // user trying to do directory traversal     hex_decode(url);     snprintf(buf, sizeof(buf), "%s/%s", webroot, url);     return open(buf, O_RDONLY); }

Obviously, this code is dangerous because it does hexadecimal decoding after it checks the URL for directory traversal. So a URL beginning with %2E%2E/%2E%2E allows users to perform a directory traversal, even though the developers intended to deny these requests.

Some HTTP servers implement additional features or keywords; they are implicitly processed by the server to perform a different task with the document being requested. Should you encounter a server that does this, familiarize yourself with the code dealing with those special features or keywords. Developers often fail to account for the security implications of these features because they are operating outside the core specification, so vulnerable mistakes or oversights in implementing these features are possible.

Utility Functions

Most HTTP servers include a lot of utility functions that have interesting security implications. In particular, there are functions for URL handlingdealing with URL components such as ports, protocols, and paths; stripping extraneous paths; dealing with hexadecimal decoding; protecting against double dots; and so forth. Quite a large codebase can be required just for dealing with untrusted data, so checking for buffer overflows and similar problems is certainly worthwhile. In addition, logging utility functions can be interesting, as most HTTP servers log paths and methods, which could create an opportunity to perform format string attacks. Here's an example of some vulnerable code:

int log(char *fmt, ...) {     va_list ap;     va_start(ap, fmt);     vfprintf(logfd, fmt, ap);     va_end(ap);     return 0; } int log_access(char *path, char *remote_address) {     char buf[1024];     snprintf(buf, sizeof(buf), "[ %s ]: %s accessed by %s\n",              g_sname, path, remote_address);     return log(buf); }

This type of code isn't uncommon (at least it wasn't when format string vulnerabilities were first brought to public attention). By having multiple layers of functions that take variable arguments, code can easily be susceptible to format string attacks, and logging utility functions are one of the most common areas for this code to appear.

Posting Data

Another potential danger area in HTTP occurs when handling input supplied via the POST method. There are two methods used when supplying data via a POST method: a simple counted data post and chunked encoding.

In a simple counted data post, a block of data is supplied to the HTTP server in a message. The size of this data is specified by using the Content-Length header. A request might look like this:

POST /app HTTP/1.1 Host: 127.0.0.1 Content-Length: 10 1234567890

In this request, the block of data is supplied after the request headers. How this length value is interpreted, however, could create a serious vulnerability for an HTTP server. Specifically, you must consider that large values might result in integer overflows or sign issues (covered in Chapter 6, "C Language Issues"). Here's an example of a simple integer overflow:

char *read_post_data(int sock) {    char *content_length, *data;    size_t clen;    content_length = get_header("Content-Length");    if(!content_length)        return NULL;    clen = atoi(content_length);    data = (char *)malloc(clen + 1);    if(!data)        return NULL;    tcp_read_data(s, data, clen);    data[clen] = '\0';    return data; }

The Content-Length value is converted from a string to an integer and then used to allocate a block of data. Because the conversion is unchecked, a client could supply the maximum representable integer. When it's added to in the argument to malloc(), an integer overflow occurs and a small allocation takes place. The following call to tcp_read_data() then allows data read from the network to overwrite parts of the process heap. Also, note that the line in the code that NUL-terminates the user-supplied buffer writes a NUL byte out of bounds (because clen is 0xFFFFFFFF, which is equivalent to data[-1]one byte before the beginning of the buffer).

The second issue in dealing with Content-Length header interpretation involves handling signed Content-Length values. If the length value is interpreted as a negative number, size calculation errors likely occur, with memory corruption being the end result. Consider the following code (originally from AOLServer):

typedef struct Request {     ... other members ...     char *next;    /* Next read offset. */     char *content;    /* Start of content. */     int  length;    /* Length of content. */     int  avail;    /* Bytes avail in buffer. */     int  leadblanks;    /* # of leading blank lines read */     ... other members ... } Request; static int SockRead(Sock *sockPtr) {     Ns_Sock *sock = (Ns_Sock *) sockPtr;     struct iovec buf;     Request *reqPtr;     Tcl_DString *bufPtr;     char *s, *e, save;     int  cnt, len, nread, n;     ...     s = Ns_SetIGet(reqPtr->headers, "content-length");     if (s != NULL) {         reqPtr->length = atoi(s);     ...        if (reqPtr->length < 0           && reqPtr->length >           sockPtr->drvPtr->servPtr->limits.maxpost) {           return SOCK_ERROR;        }     ...     if (reqPtr->coff > 0 && reqPtr->length <= reqPtr->avail) {         reqPtr->content = bufPtr->string + reqPtr->coff;         reqPtr->next = reqPtr->content;         reqPtr->avail = reqPtr->length;         /*          * Ensure that there are no "bonus" crlf chars left          * visible in the buffer beyond the specified          * content-length. This happens from some browsers          * on POST requests.          */         if (reqPtr->length > 0) {             reqPtr->content[reqPtr->length] = '\0';         }         return (reqPtr->request ? SOCK_READY : SOCK_ERROR);     }

This code is quite strange. After retrieving a Content-Length specified by users, it explicitly checks for values less than 0. If Content-Length is less than 0 and greater than maxpost (also a signed integer, which is initialized to a default value of 256KB), an error is signaled. A negative Content-Length triggers the first condition but not the second, so this error doesn't occur for negative values supplied to Content-Length. (Most likely, the developers meant to use || in the if statement rather than &&.) As a result, reqPtr->avail (meant to indicate how much data is available in reqPtr->content) is set to a negative integer of the attacker's choosing, and is then used at various points throughout the program.

Data can also be posted to HTTP servers via chunked encoding. With this method, input is supplied by a series of delineated chunks and then combined when all chunks have been received to form the original data contents. Instead of specifying a content size with the Content-Length header, the Transfer-Encoding header is used, and it takes the value "chunked." It also has a boundary pattern to delineate the supplied chunks. The header looks something like this:

Transfer-Encoding: chunked; boundary=__1234

A chunk is composed of a size (expressed in hexadecimal), a newline (carriage return/line feed [CRLF] combination), the chunk data (which is the length specified by the size), and finally a trailing newline (CRLF combination). Here's an example:

8 AAAAAAAA 10 AAAAAAAABBBBBBBB 0

The example shows two data chunks of lengths 8 and 16. (Remember, the size is in hexadecimal, so "10" is used rather than the decimal "16.") A 0-length chunk indicates that no more chunks follow, and the data transfer is complete. As you might have guessed, remote attackers specifying arbitrary sizes has been a major problem in the past; careful sanitation of specified sizes is required to avoid integer overflows or sign-comparison vulnerabilities. These vulnerabilities are much like the errors that can happen when processing a Content-Length value that hasn't been validated adequately, although processing chunk-encoded data poses additional dangers. In the Content-Length integer overflows, an allocation wrapper performing some sort of rounding was necessary for a vulnerability to exist; otherwise, no integer wrap would occur. With chunked encoding, however, data in one chunk is added to the previous chunk data already received. By supplying multiple chunks, attackers might be able to trigger an integer overflow even if no allocation wrappers or rounding is used, as shown in this example:

char *read_chunks(int sock, size_t *length) {     size_t total = 0;     char *data = NULL;     *length = 0;     for(;;){         char chunkline[MAX_LINE];         int n;         size_t chunksize;         n = read_line(sock, chunkline, sizeof(chunkline)-1);         if(n < 0){             if(data)                 free(data);             return NULL;         }         chunkline[n] = '\0';         chunksize = atoi(chunkline);         if(chunksize == 0)        /* no more chunks */             break;         if(data == NULL)             data = (char *)malloc(chunksize);         else             data = (char *)realloc(data, chunksize + total);         if(data == NULL)             return NULL;         read_bytes(sock, data + total, chunksize);         total += chunksize;         read_crlf(sock);     }     *length = total;     return data; }

As you can see, the read_chunks() function reads chunks in a loop until a 0-length chunk is received. The cumulative data size is kept in the total variable. The problem is the call to realloc(). When a new chunk is received, the buffer is resized to make room for the new chunk data. If the addition of bytes received and the size of the new chunk is larger than the maximum representable integer, an overflow on the heap could result. A request to trigger this vulnerability would look something like this:

POST /url HTTP/1.1 Host: hi.com Transfer-Encoding: chunked 8 xxxxxxxx FFFFFFF9 xxxxxx... (however many bytes you want to overflow by)

The request is composed of two chunks: a chunk of length 8 bytes and a chunk of length 0xFFFFFFF9 bytes. The addition of these two values results in 1, so the call to realloc() attempts to shrink the buffer or leave it untouched yet read a large number of bytes into it.

Note

The reason FFFFFFF9, not FFFFFFF8, bytes is used in this example is because with FFFFFFF8, the result of the addition would be 0, and many implementations of realloc() act identically to free() if a 0 is supplied as the size parameter. When this happens, realloc() returns NULL. Even though you could free data unexpectedly by supplying a 0 size to realloc(), the function would just return, and the vulnerability wouldn't be triggered successfully.