HTTP is the network protocol that all Web transactions use under the hood. The next section summarizes the high points, but interested readers should check out RFC 2616 (www.ietf.org) or find a good Web inspection proxy tool and start studying traffic. OverviewHTTP is a straightforward request and response protocol, in which every request the client sends to the server is reciprocated with a single response. These requests are performed over TCP connections. In contemporary versions of HTTP, a single TCP connection is typically reused for multiple requests to the same server, but historically, each Web request caused the creation of an entirely new TCP connection. Here's an example of a simple HTTP request:
HTTP requests are composed of a header and an optional body. A blank linecalled a carriage return/line feed (CRLF)separates the header and the body. The preceding request doesn't have a body, so the blank line is simply the end of the request. The first line of a HTTP request is composed of a method, a URI path, and an HTTP protocol version. The method tells the server what type of request it is. The preceding request has a GET method, which tells the server to retrieve (get) the requested resource. The URI path which tells the server which resource the client is requesting. The preceding request asks for the resource located at /testing/test.html on the server. The protocol version specifies the version of HTTP the client is using. In the preceding request, the client is using version HTTP/1.1. The rest of the lines in the request header share the same general format: a field name followed by a colon, and then a field definition. The preceding request includes the following request header fields:
Now look at the response to this query: HTTP/1.1 404 Not Found Date: Fri, 20 Aug 2006 01:58:14 GMT Server: Apache/1.3.28 (Unix) PHP/4.3.0 Keep-Alive: timeout=15, max=100 Connection: Keep-Alive Transfer-Encoding: chunked Content-Type: text/html; charset=iso-8859-1 d3 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <HTML><HEAD> <TITLE>404 Not Found</TITLE> </HEAD><BODY> <H1>Not Found</H1> The requested URL /testing/test.html was not found on this server.<P> </BODY></HTML> 0 HTTP responses are similar to HTTP requests. The response has a header and a body, and the response header is set up so that the first line has a special format. The rest of the header response lines share the field name, colon, and field value format. The first line of the HTTP response header is composed of the HTTP protocol version, the response code, and the response reason phrase. The protocol version is the same as in the request: HTTP/1.1. The response code is a numeric status code that tells the client the result of the request. In the preceding response, it's 404, which is probably familiar to you. If it isn't, the response reason phrase gives a short text description of the status code, which is "Not Found" in this response. The rest of the response header lines provide information to the client:
The response body in the example is encoded with the chunked encoding method, which is made up of a series of chunks. Each chunk has a line specifying its length in hexadecimal and the corresponding data. In the preceding response, d3 specifies 211 bytes of data in the first chunk. The 0 at the end indicates the end of the chunked data. You can see that in the response, which is plain HTML, the server gives an error message to go along with the error code 404. VersionsThree versions of HTTP are currently in use: 0.9, 1.0, and 1.1. An HTTP version 0.9 request looks like this: GET / This request retrieves the root document. It's about as straightforward as it can get and can be used for quick manual testing. A minimal HTTP version 1.0 request looks like this: GET / HTTP/1.0 This request is similar to the request shown in the previous section. Note that a blank line (a second CRLF) signifies the end of the HTTP request header and, therefore, the end of the HTTP request. If you're entering requests by hand, HTTP/1.0 is easiest to use because it's simpler than HTTP/1.1. Here's a minimal HTTP/1.1 request: GET / HTTP/1.1 Host: test.com This request is nearly identical to the minimal HTTP/1.0 request, except it requires the client to provide a Host header in the request. HeadersHTTP headers provide descriptive information (metadata) about the HTTP connection. They are used in negotiating an HTTP connection and establishing the connection's properties after successful negotiation. HTTP supports a variety of headers that fall into one of four basic categories:
The remainder of this chapter refers to a number of HTTP headers, so Table 17-1 lists them for easy reference.
MethodsHTTP supports many methods, especially considering vendor extensions to the protocol. The three most important are GET, HEAD, and POST. GET is the most common method used by a client to retrieve a resource. HEAD is identical to GET, except it tells the server not to return the actual document contents. In other words, it tells the server to return only the response headers. POST is used to submit a block of data to a specified resource on the server. The difference between GET and POST is related to how developers use HTML forms and parameters (covered in "Parameters and Forms" later in this chapter). The following sections describe some less common methods. DELETE and PUTThe DELETE and PUT methods allow files to be removed from and added to a Web server. Historically, these two methods have been seen little use in real sites; further, they have been associated with a number of vulnerabilities and are usually disabled. The notable exception is using these methods as a component of complete WebDAV support. TEXTSEARCH and SPACEJUMPThe TEXTSEARCH and SPACEJUMP requests aren't methods, nor were they ever officially added to the HTTP specification. However, they were proposed methods, and the functionality they describe is supported in modern Web servers. To briefly see how they work, start by looking at the TEXTSEARCH request: GET /customers?John+Doe HTTP/1.0 This request uses the ? character to terminate the request and contains a URL-encoded search string. This string causes the server to run a file at the supplied location and pass the decoded search string as a command line. Anyone familiar with common path traversal attacks should recognize this request type immediately. It's the form of request commonly used to pass parameters to an executable file via the query string, which makes it useful in exploiting a path traversal vulnerability. In all truth, this use might be the only remaining one for this request type. The following SPACEJUMP request represents another legacy request type: GET /map/1.1+2.7 HTTP/1.0 This request is designed for handling server-side image maps. It provides the coordinates of a clicked point in an object. As server-side image mapping has disappeared, so has the SPACEJUMP request. It's interesting to note, however, that this request type has also been associated with a number of vulnerabilities. The classic handler for this request (on both Apache and IIS servers) is the htimage program, which has been the source of a number of high-risk vulnerabilities, ranging from data disclosure to stack buffer overflows. OPTIONS and TRACEThe OPTIONS and TRACE methods provide information about a server. The OPTIONS request simply lists all methods the server accepts. This information is not particularly sensitive, although it does give a potential attacker details about the system. Further, this method is useful only for servers that support extended functionality, such as WebDAV. The HTTP TRACE method is quite simple, although its implications are interesting. This method simply echoes the request body to the client, ostensibly for testing purposes. Of course, the capability to have a Web site present arbitrary content can present some interesting possibilities for vulnerabilities, discussed in "Cross-Site Scripting" later in this chapter. CONNECTThe HTTP CONNECT method provides a way for proxies to establish Secure Sockets Layer (SSL) connections with other servers. It's a reasonable method for use in proxies but is usually dangerous on application servers. WebDAV MethodsWeb Distributed Authoring and Versioning (WebDAV) is a set of methods and associated protocols for managing files over HTTP connections. It makes use of the standard GET, PUT, and DELETE methods for basic file access. WebDAV adds a number of methods for other file-management tasks, described in Table 17-2.
Fortunately, most Web applications do not (and certainly should not) expose WebDAV functionality directly. However, you should keep a few points in mind when you encounter WebDAV systems. First, WebDAV uses HTTP as a transport protocol and uses the same basic security mechanisms of SSL and HTTP authentication, so the coverage of these standards also applies to WebDAV. Second, the specification for WebDAV access control is only in draft form and not widely implemented at the time of this writing, so access control capabilities can vary widely between products. Parameters and FormsA Web client transmits parameters (user-supplied input and variables) to a Web application through HTTP in three main ways, explained in the following sections. Embedded Path InformationA URI path can contain embedded parameters as part of the path components. This embedded path information can be handled by server-based filtering such as path rewriting rules, which remap the received path and place the information into request variables. Path information may also be handled through the PATH_INFO environment variable common to most web application platforms. The PATH_INFO variable contains additional components appended to a URI resource path. For example, say you have a dynamic Web application at /Webapp, and a user submitted the following request: GET /webapp/blah/blah/blah HTTP/1.1 Host: test.com The Web server calls the program or request handler corresponding to /webapp and indicates that extra information was passed through the appropriate mechanism. If the program gets information through CGI variables, the CGI program would see something like this: PATH_INFO=/blah/blah/blah SCRIPT_NAME=/webapp If the program is a Java servlet and calls request.getServletPath(), it receives /webapp. However, if the program calls request.getRequestURI(), it receives /webapp/blah/blah/blah. Auditing Tip If you see code performing actions or checks based on the request URI, make sure the developer is handling the path information correctly. Many servlet programmers use request.getRequestURI() when they intend to use request.getServletPath(), which can definitely have security consequences. Be sure to look for checks done on file extensions, as supplying unexpected path information can circumvent these checks as well. GET and Query StringsThe second mechanism for transmitting parameters to a Web application is the query string. It's the component of a request URI that follows the question mark character (?). For example, if the http://test.com/webapp?arg1=h1&arg2=jimbo URI is entered into a browser, the browser connects to the test.com server and submits a request similar to the following: GET /webapp?arg1=hi&arg2=jimbo HTTP/1.1 Host: test.com This is the query string in the preceding request: arg1=hi&arg2=jimbo Most dynamic Web technologies parse this query string into two separate variables: arg1 with a value of hi and arg2 with a value of jimbo. The & character is used to separate the arguments, and the = character separates the argument name from the argument value. The other possible form for a query string is the one mentioned for the TEXTSEARCH request. If the query string doesn't contain an = character, the Web server assumes the query is an indexed query, and the arguments represent command-line arguments. For example, the following code runs the CGI program mycgi.pl with the arguments hi and jimbo: GET /mycgi.pl?hi&jimbo HTTP/1.1 Host: test.com HTML FormsBefore you look at the third common way of transmitting parameters, take a look at HTML forms. Forms are an HTML construct that enables application designers to construct Web pages that request user input and then relay it back to the server. A basic HTML form has an action, a method, and variables. The action is a URI that corresponds to the resource handling the filled-out form. The method is GET or POST, and it determines which method the client uses to transmit the filled-out form. The variables are the actual content of the form, and designers can use a few basic types of variables. Here's a brief example of a form: <form method="GET" action="http://test.com/transfer.php"> Source Account: <select name="source"> <option selected value="42424242">42424242</option> <option value="82345678">82345678</option> </select><br> Destination Account: <select name="dest"> <option selected value="12345678">12345678</option> <option value="82345678">82345678</option> </select><br> Amount: <input type="input" name="value"><br> <input type="Submit" value="Transfer Money"><br> </form> Figure 17-1 shows what this simple form would look like rendered in a client's browser. This form uses the GET method, and the results are submitted to the transfer.php page. There are drop-down list boxes for the source account and destination account and a simple text input field for the transfer amount. The last input is the submit button, which allows users to initiate the transmission of the form contents. Figure 17-1. Simple form
When users submit this form, their browsers connect to test.com and issue a request similar to the following: GET /transfer.php?source=42424242&dest=12345678&value=123 HTTP/1.1 Host: test.com In this request, you can see that the variables in the form have been turned into a query string. The source, dest, and value parameters are transmitted to the server and submitted via the GET method. POST and Content BodyThe third mechanism for transmitting parameters to a Web application is the POST method. In this method, the user's data is transferred by using the body of the HTTP request instead of embedding the data in the URI as the GET method does. Assume you changed the preceding form to use a POST method instead of a GET method by changing this line: <form method="GET" action="http://test.com/transfer.php"> To this: <form method="POST" action="http://test.com/transfer.php"> When users submit this form, a request from the Web browser similar to the following is issued: POST /transfer.php HTTP/1.0 Content-Type: application/x-www-form-urlencoded Content-Length: 40 source=42424242&dest=12345678&value=123 You can see that the parameters are encoded in a similar fashion to the GET request, but they are now in the request's content body. Parameter EncodingParameters are encoded by using guidelines outlined in RFC 2396, which defines the URI general syntax. This encoding is necessary whether they are sent via the GET method in a query string or the POST method in the content body. All nonalphanumeric ASCII characters are encoded, which includes most Unicode characters and multibyte characters. This encoding is described in Chapter 8 "Strings and Metacharacters," but we will briefly recap it here. The URL encoding scheme is % hex hex, with a percent character starting the escape sequence, followed by a hexadecimal representation of the required byte value. For example, the character = has the value 61 in the ASCII character set, which is 0x3d in hexadecimal. Therefore, an equal sign can be encoded by using the sequence %3d. So you can set the testvar variable to the string jim=42 with the following encoded string: testvar=jim%3d42 GET Versus POSTAlthough you've learned the technical details of GET and POST, you haven't seen the difference between them in a real-world sense. Here are the essential tradeoffs:
Auditing Tip Generally, you should encourage developers to use POST-style requests for their applications because of the security concerns outlined previously. One issue to watch for is the transmission of a session token via a query string, as that creates a risk for the Web application's clients. The risk isn't necessarily a showstopper, but it's unnecessary and quite easy for a developer or Web designer to avoid. |