HTTP


HTTP is the network protocol that all Web transactions use under the hood. The next section summarizes the high points, but interested readers should check out RFC 2616 (www.ietf.org) or find a good Web inspection proxy tool and start studying traffic.

Overview

HTTP is a straightforward request and response protocol, in which every request the client sends to the server is reciprocated with a single response. These requests are performed over TCP connections. In contemporary versions of HTTP, a single TCP connection is typically reused for multiple requests to the same server, but historically, each Web request caused the creation of an entirely new TCP connection. Here's an example of a simple HTTP request:

[View full width]

GET /testing/test.html HTTP/1.1 Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/x-gsarcade-launch, application/x- shockwave-flash, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, */* Accept-Language: en-us Accept-Encoding: gzip, deflate User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322) Host: test.testing.com:1234 Connection: Keep-Alive


HTTP requests are composed of a header and an optional body. A blank linecalled a carriage return/line feed (CRLF)separates the header and the body. The preceding request doesn't have a body, so the blank line is simply the end of the request.

The first line of a HTTP request is composed of a method, a URI path, and an HTTP protocol version. The method tells the server what type of request it is. The preceding request has a GET method, which tells the server to retrieve (get) the requested resource. The URI path which tells the server which resource the client is requesting. The preceding request asks for the resource located at /testing/test.html on the server. The protocol version specifies the version of HTTP the client is using. In the preceding request, the client is using version HTTP/1.1.

The rest of the lines in the request header share the same general format: a field name followed by a colon, and then a field definition. The preceding request includes the following request header fields:

  • Accept This header field tells the server which kinds of media (such as an image or application) are acceptable for the response and their order of preference.

  • Accept-Language This header field tells the server which languages the client accepts and prefers, which in the preceding request is U.S. English.

  • Accept-Encoding This header field tells the server it can encode the request body with certain schemes if necessary.

  • User-Agent This header field tells the server what software versions the client is using for its Web browser and operating system. You can see that the preceding request was made from Internet Explorer 6.0 (MSIE 6.0) on a Windows XP machine (Windows NT 5.1) with the .NET 1.1 runtime installed (.NET CLR 1.0.3705; .NET CLR 1.1.4322).

  • Host This header field tells the Web server which host the request is for, which is useful if multiple Web sites are hosted on the same machine (called virtual hosts). You can see that the request was for the machine named test.testing.com, and the client is talking to the server on port 1234.

  • Connection This header field gives the server options that are specific to the connection. In the preceding request, the client's Keep-Alive value tells the server not to close the connection after it answers the request. This way, the client can reuse the TCP connection to issue another request.

Now look at the response to this query:

HTTP/1.1 404 Not Found Date: Fri, 20 Aug 2006 01:58:14 GMT Server: Apache/1.3.28 (Unix) PHP/4.3.0 Keep-Alive: timeout=15, max=100 Connection: Keep-Alive Transfer-Encoding: chunked Content-Type: text/html; charset=iso-8859-1 d3 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <HTML><HEAD> <TITLE>404 Not Found</TITLE> </HEAD><BODY> <H1>Not Found</H1> The requested URL /testing/test.html was not found on this server.<P> </BODY></HTML> 0


HTTP responses are similar to HTTP requests. The response has a header and a body, and the response header is set up so that the first line has a special format. The rest of the header response lines share the field name, colon, and field value format.

The first line of the HTTP response header is composed of the HTTP protocol version, the response code, and the response reason phrase. The protocol version is the same as in the request: HTTP/1.1. The response code is a numeric status code that tells the client the result of the request. In the preceding response, it's 404, which is probably familiar to you. If it isn't, the response reason phrase gives a short text description of the status code, which is "Not Found" in this response.

The rest of the response header lines provide information to the client:

  • Date This field tells the client when the server generated the response.

  • Server This field gives the client information about the Web server software. You can see that the Web server is running Apache 1.3.28 on some kind of UNIX machine.

  • Keep-Alive and Connection These fields give the client information about the connection and how long it will be held open.

  • Transfer-Encoding This field tells the client the mechanism the server uses to transmit the body of the response. This server elected to use the chunked method of encoding.

  • Content-Type This field tells the client the media type and character set of the response, which is a plain HTML document.

The response body in the example is encoded with the chunked encoding method, which is made up of a series of chunks. Each chunk has a line specifying its length in hexadecimal and the corresponding data. In the preceding response, d3 specifies 211 bytes of data in the first chunk. The 0 at the end indicates the end of the chunked data. You can see that in the response, which is plain HTML, the server gives an error message to go along with the error code 404.

Versions

Three versions of HTTP are currently in use: 0.9, 1.0, and 1.1. An HTTP version 0.9 request looks like this:

GET /


This request retrieves the root document. It's about as straightforward as it can get and can be used for quick manual testing. A minimal HTTP version 1.0 request looks like this:

GET / HTTP/1.0


This request is similar to the request shown in the previous section. Note that a blank line (a second CRLF) signifies the end of the HTTP request header and, therefore, the end of the HTTP request. If you're entering requests by hand, HTTP/1.0 is easiest to use because it's simpler than HTTP/1.1. Here's a minimal HTTP/1.1 request:

GET / HTTP/1.1 Host: test.com


This request is nearly identical to the minimal HTTP/1.0 request, except it requires the client to provide a Host header in the request.

Headers

HTTP headers provide descriptive information (metadata) about the HTTP connection. They are used in negotiating an HTTP connection and establishing the connection's properties after successful negotiation. HTTP supports a variety of headers that fall into one of four basic categories:

  • Request Headers in the initial request

  • Response Headers in the server response

  • General Headers that can be in a request or response

  • Entity Headers that apply to a specific entity in the request or response

The remainder of this chapter refers to a number of HTTP headers, so Table 17-1 lists them for easy reference.

Table 17-1. Request and Response Header Fields

Header

Type

Description

Accept

Request

Lists media (MIME) types the client will accept

Accept-Charset

Request

Lists character encodings the client will accept

Accept-Encoding

Request

Lists content encodings the client will accept, such as compression mechanisms

Accept-Language

Request

Lists languages the client will accept

Accept-Ranges

Response

Server indicates it supports range requests

Age

Response

Freshness of the requested URI

Allow

Entity

Lists HTTP methods allowed for the requested URI

Allowed

Response

Deprecated: lists allowed request methods

Authorization

Request

Presents credentials for HTTP authentication

Cache-Control

Response

Specifies caching requirements for the requested URI

Charge-To

Request

Deprecated: billing information

Connection

General

Allows the client to specify connection options

Content-Encoding

Entity

Identifies additional encoding of the entity body, such as compression

Content-Transfer-Encoding

Response

Deprecated: MIME transfer encoding

Content-Language

Entity

Identifies the language of the entity body

Content-Length

Entity

Identifies the length (in bytes) of the entity body

Content-Location

Entity

Supplies the correct location for the entity if known and not available at the requested URI

Content-MD5

Entity

Supplies an MD5 digest of the entity body

Content-Range

Entity

Lists the byte range of a partial entity body

Content-Type

Entity

Specifies the media (MIME) type of the entity

Cost

Response

Deprecated: cost of requested URI

Date

General

Date and time of the message

Derived-From

Response

Deprecated: previous version of requested URI

ETag

Response

Entity tag used for caching purposes

Expect

Request

Lists server behaviors required by the client

Expires

Entity

Date and time after which the entity is considered stale

From

Request

E-mail address of the requester

Host

Request

Host name and port number of the requested URI

If-Match

Request

Used to make request conditional based on entity tags

If-Modified-Since

Request

Used to make request conditional based on HTTP date

If-None-Match

Request

Used to make request conditional based on entity tags

If-Range

Request

Used to make a range request conditional based on entity tags

If-Unmodified-Since

Request

Used to make request conditional based on HTTP date

Last-Modified

Entity

Identifies the time the entity was last modified

Location

Response

Supplies an alternate location for the requested URI

Max-Forwards

Request

Mechanism for limiting the number of gateways in a TRACE or OPTIONS request

Message-Id

Response

Deprecated: globally unique message identifier

Pragma

General

Used for implementation-specific headers

Proxy-Authenticate

Response

Identifies that a proxy requires authentication

Proxy-Authorization

Request

Presents credentials for HTTP proxy authentication

Public

Response

Deprecated: lists publicly accessible methods

Range

Request

Identifies a specific range of bytes needed from the requested URI

Referer

Request

Client-provided URI responsible for initiating the request

Retry-After

Response

Indicates how long a service is expected to be unavailable

Server

Response

Server identification string

TE

Request

Lists transfer encodings accepted by the client for a chunked transfer

Trailer

General

Indicates header fields present in the trailer of a chunked message

Transfer-Encoding

General

Identifies the encoding applied to the message

Upgrade

General

Identifies additional protocols supported by the client

URI

Response

Deprecated: superseded by Location header field

User-Agent

Request

Contains general information about the client

Vary

Response

Provided by the server to determine cache freshness

Version

Response

Deprecated: version of requested URI

Via

General

Used by gateways and proxies to identify intermediate hosts

Warning

General

Provides additional message status information

WWW-Authenticate

Response

Initiates the HTTP authentication challenge required by a server

WWW-Title

Response

Deprecated: document title

WWW-Link

Response

Deprecated: external document reference


Methods

HTTP supports many methods, especially considering vendor extensions to the protocol. The three most important are GET, HEAD, and POST. GET is the most common method used by a client to retrieve a resource. HEAD is identical to GET, except it tells the server not to return the actual document contents. In other words, it tells the server to return only the response headers. POST is used to submit a block of data to a specified resource on the server. The difference between GET and POST is related to how developers use HTML forms and parameters (covered in "Parameters and Forms" later in this chapter). The following sections describe some less common methods.

DELETE and PUT

The DELETE and PUT methods allow files to be removed from and added to a Web server. Historically, these two methods have been seen little use in real sites; further, they have been associated with a number of vulnerabilities and are usually disabled. The notable exception is using these methods as a component of complete WebDAV support.

TEXTSEARCH and SPACEJUMP

The TEXTSEARCH and SPACEJUMP requests aren't methods, nor were they ever officially added to the HTTP specification. However, they were proposed methods, and the functionality they describe is supported in modern Web servers. To briefly see how they work, start by looking at the TEXTSEARCH request:

GET /customers?John+Doe HTTP/1.0


This request uses the ? character to terminate the request and contains a URL-encoded search string. This string causes the server to run a file at the supplied location and pass the decoded search string as a command line. Anyone familiar with common path traversal attacks should recognize this request type immediately. It's the form of request commonly used to pass parameters to an executable file via the query string, which makes it useful in exploiting a path traversal vulnerability. In all truth, this use might be the only remaining one for this request type.

The following SPACEJUMP request represents another legacy request type:

GET /map/1.1+2.7 HTTP/1.0


This request is designed for handling server-side image maps. It provides the coordinates of a clicked point in an object. As server-side image mapping has disappeared, so has the SPACEJUMP request. It's interesting to note, however, that this request type has also been associated with a number of vulnerabilities. The classic handler for this request (on both Apache and IIS servers) is the htimage program, which has been the source of a number of high-risk vulnerabilities, ranging from data disclosure to stack buffer overflows.

OPTIONS and TRACE

The OPTIONS and TRACE methods provide information about a server. The OPTIONS request simply lists all methods the server accepts. This information is not particularly sensitive, although it does give a potential attacker details about the system. Further, this method is useful only for servers that support extended functionality, such as WebDAV.

The HTTP TRACE method is quite simple, although its implications are interesting. This method simply echoes the request body to the client, ostensibly for testing purposes. Of course, the capability to have a Web site present arbitrary content can present some interesting possibilities for vulnerabilities, discussed in "Cross-Site Scripting" later in this chapter.

CONNECT

The HTTP CONNECT method provides a way for proxies to establish Secure Sockets Layer (SSL) connections with other servers. It's a reasonable method for use in proxies but is usually dangerous on application servers.

WebDAV Methods

Web Distributed Authoring and Versioning (WebDAV) is a set of methods and associated protocols for managing files over HTTP connections. It makes use of the standard GET, PUT, and DELETE methods for basic file access. WebDAV adds a number of methods for other file-management tasks, described in Table 17-2.

Table 17-2. WebDAV Methods

Method

Description

COPY

Copies a resource from one URI to another

MOVE

Moves a resource from one URI to another

LOCK

Locks a resource for shared or exclusive use

UNLOCK

Removes a lock from a resource

PROPFIND

Retrieves properties from a resource

PROPPATCH

Modifies multiple properties atomically

MKCOL

Creates a directory (collection)

SEARCH

Initiates a server-side search


Fortunately, most Web applications do not (and certainly should not) expose WebDAV functionality directly. However, you should keep a few points in mind when you encounter WebDAV systems. First, WebDAV uses HTTP as a transport protocol and uses the same basic security mechanisms of SSL and HTTP authentication, so the coverage of these standards also applies to WebDAV. Second, the specification for WebDAV access control is only in draft form and not widely implemented at the time of this writing, so access control capabilities can vary widely between products.

Parameters and Forms

A Web client transmits parameters (user-supplied input and variables) to a Web application through HTTP in three main ways, explained in the following sections.

Embedded Path Information

A URI path can contain embedded parameters as part of the path components. This embedded path information can be handled by server-based filtering such as path rewriting rules, which remap the received path and place the information into request variables. Path information may also be handled through the PATH_INFO environment variable common to most web application platforms. The PATH_INFO variable contains additional components appended to a URI resource path. For example, say you have a dynamic Web application at /Webapp, and a user submitted the following request:

GET /webapp/blah/blah/blah HTTP/1.1 Host: test.com


The Web server calls the program or request handler corresponding to /webapp and indicates that extra information was passed through the appropriate mechanism. If the program gets information through CGI variables, the CGI program would see something like this:

PATH_INFO=/blah/blah/blah SCRIPT_NAME=/webapp


If the program is a Java servlet and calls request.getServletPath(), it receives /webapp. However, if the program calls request.getRequestURI(), it receives /webapp/blah/blah/blah.

Auditing Tip

If you see code performing actions or checks based on the request URI, make sure the developer is handling the path information correctly. Many servlet programmers use request.getRequestURI() when they intend to use request.getServletPath(), which can definitely have security consequences. Be sure to look for checks done on file extensions, as supplying unexpected path information can circumvent these checks as well.


GET and Query Strings

The second mechanism for transmitting parameters to a Web application is the query string. It's the component of a request URI that follows the question mark character (?). For example, if the http://test.com/webapp?arg1=h1&arg2=jimbo URI is entered into a browser, the browser connects to the test.com server and submits a request similar to the following:

GET /webapp?arg1=hi&arg2=jimbo HTTP/1.1 Host: test.com


This is the query string in the preceding request:

arg1=hi&arg2=jimbo


Most dynamic Web technologies parse this query string into two separate variables: arg1 with a value of hi and arg2 with a value of jimbo. The & character is used to separate the arguments, and the = character separates the argument name from the argument value.

The other possible form for a query string is the one mentioned for the TEXTSEARCH request. If the query string doesn't contain an = character, the Web server assumes the query is an indexed query, and the arguments represent command-line arguments. For example, the following code runs the CGI program mycgi.pl with the arguments hi and jimbo:

GET /mycgi.pl?hi&jimbo HTTP/1.1 Host: test.com


HTML Forms

Before you look at the third common way of transmitting parameters, take a look at HTML forms. Forms are an HTML construct that enables application designers to construct Web pages that request user input and then relay it back to the server. A basic HTML form has an action, a method, and variables. The action is a URI that corresponds to the resource handling the filled-out form. The method is GET or POST, and it determines which method the client uses to transmit the filled-out form. The variables are the actual content of the form, and designers can use a few basic types of variables. Here's a brief example of a form:

<form method="GET" action="http://test.com/transfer.php"> Source Account: <select name="source"> <option selected value="42424242">42424242</option> <option value="82345678">82345678</option> </select><br> Destination Account: <select name="dest"> <option selected value="12345678">12345678</option> <option value="82345678">82345678</option> </select><br> Amount: <input type="input" name="value"><br> <input type="Submit" value="Transfer Money"><br> </form>


Figure 17-1 shows what this simple form would look like rendered in a client's browser. This form uses the GET method, and the results are submitted to the transfer.php page. There are drop-down list boxes for the source account and destination account and a simple text input field for the transfer amount. The last input is the submit button, which allows users to initiate the transmission of the form contents.

Figure 17-1. Simple form


When users submit this form, their browsers connect to test.com and issue a request similar to the following:

GET /transfer.php?source=42424242&dest=12345678&value=123 HTTP/1.1 Host: test.com


In this request, you can see that the variables in the form have been turned into a query string. The source, dest, and value parameters are transmitted to the server and submitted via the GET method.

POST and Content Body

The third mechanism for transmitting parameters to a Web application is the POST method. In this method, the user's data is transferred by using the body of the HTTP request instead of embedding the data in the URI as the GET method does. Assume you changed the preceding form to use a POST method instead of a GET method by changing this line:

<form method="GET" action="http://test.com/transfer.php">


To this:

<form method="POST" action="http://test.com/transfer.php">


When users submit this form, a request from the Web browser similar to the following is issued:

POST /transfer.php HTTP/1.0 Content-Type: application/x-www-form-urlencoded Content-Length: 40 source=42424242&dest=12345678&value=123


You can see that the parameters are encoded in a similar fashion to the GET request, but they are now in the request's content body.

Parameter Encoding

Parameters are encoded by using guidelines outlined in RFC 2396, which defines the URI general syntax. This encoding is necessary whether they are sent via the GET method in a query string or the POST method in the content body. All nonalphanumeric ASCII characters are encoded, which includes most Unicode characters and multibyte characters. This encoding is described in Chapter 8 "Strings and Metacharacters," but we will briefly recap it here.

The URL encoding scheme is % hex hex, with a percent character starting the escape sequence, followed by a hexadecimal representation of the required byte value. For example, the character = has the value 61 in the ASCII character set, which is 0x3d in hexadecimal. Therefore, an equal sign can be encoded by using the sequence %3d. So you can set the testvar variable to the string jim=42 with the following encoded string:

testvar=jim%3d42


GET Versus POST

Although you've learned the technical details of GET and POST, you haven't seen the difference between them in a real-world sense. Here are the essential tradeoffs:

  • GET requests have more limitations than POST requests. The Web server typically limits the query string to a certain number of characters. This limitation is usually between 1024 and 8192 characters and is tied to the maximum size request header line the Web server accepts. POST requests can effectively be any length, although the Web server might limit them to a reasonable threshold (or crash because of numeric overflow vulnerabilities).

  • GET requests are easier to create, as you can specify them via hyperlinks without having to create an HTML form. POST requests, on the other hand, require creating an HTML form or scripted events, which might have display characteristics that Web designers want to avoid.

  • GET requests are less secure because they are likely to be logged in Web proxy logs, browser histories, and Web server logs. Usually, security-sensitive information shouldn't be transmitted in GET requests because of this logging.

  • GET requests also expose application logic to end users by placing variables in the Web browser's address bar, which just tempts users to manipulate them.

  • The Referer request header tells the server the URI of the page the client just came from. So if the query string used to generate a page contains sensitive variables, and users click a link on that page that takes them to another server, those sensitive variables are transferred to the third-party server in the Referer header.

Auditing Tip

Generally, you should encourage developers to use POST-style requests for their applications because of the security concerns outlined previously. One issue to watch for is the transmission of a session token via a query string, as that creates a risk for the Web application's clients. The risk isn't necessarily a showstopper, but it's unnecessary and quite easy for a developer or Web designer to avoid.





The Art of Software Security Assessment. Identifying and Preventing Software Vulnerabilities
The Art of Software Security Assessment: Identifying and Preventing Software Vulnerabilities
ISBN: 0321444426
EAN: 2147483647
Year: 2004
Pages: 194

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net