Transmission Control Protocol

Transmission Control Protocol (TCP) is a transport-layer protocol that sits on top of IP. It's a mechanism for ensuring reliable and accurate delivery of data from one host to the other, based on the concept of connections. A connection is a bidirectional communication channel between an application on one host and an application on another host. Connections are established and closed by exchanging special TCP packets.

The endpoints see the TCP data traversing the connection as streams: ordered sequences of contiguous 8-bit bytes of data. The TCP stack is responsible for breaking this data up into packet-sized pieces, known as segments. It's also responsible for making sure the data is transferred successfully. The data sent by a TCP endpoint is acknowledged when it's received. If a TCP endpoint doesn't receive an acknowledgement for a chunk of data, it retransmits that data after a certain time interval.

TCP endpoints keep a sliding window of expected data, so they temporarily store segments that aren't the immediate next piece of data but closely follow the expected segment. This window allows TCP to handle out-of-order data segments and handle lost or corrupted segments more efficiently. TCP also uses checksums to ensure data integrity.

Auditing TCP code can be a daunting task, as the internals of TCP are quite complex. This section starts with the basic structure of TCP packet headers and the general design of the protocol, and then gives you a few examples that should illustrate where things can go wrong. The TCP header structure is shown in Figure 14-10.

Figure 14-10. TCP header

The following list describes the fields in more detail:

Source port (16 bits) This field indicates the TCP source port. It is used in conjunction with the destination port, source IP address, and destination IP address to uniquely identify a connection.
Destination port (16 bits) This field is the port the packet is destined for. This field combined with the source port, source IP address, and destination IP address to uniquely identify a connection.
Sequence number (32 bits) This field identifies where in the stream the data in this packet belongs, starting at the first byte in the segment. The sequence number is randomly seeded during connection establishment, and then incremented by the amount of data sent in each packet.
Acknowledgement number (32 bits) This field contains the sequence number the endpoint expects to receive from its peer. It's the sequence number of the last byte of data received from the remote host plus one. It indicates to the remote peer which data has been received successfully so that data lost en route is noticed and retransmitted.
Data offset (4 bits) This field indicates the size of the TCP header. Like IP, a TCP header can contain a series of options after the basic header, and so a similar header size field exists within the TCP header to account for these options. Its value is 5 if there are no options specified.
Reserved (4 bits) This field is not used.
Flags (8 bits) Several flags can be set in TCP connections to indicate information about the TCP packet: whether it's high priority, whether to ignore certain fields in the TCP header, and whether the sender wants to change the connection state.
Window (16 bits) This field indicates the size of the window, which is an indicator of how many bytes the host accepts from its peer. It's resized dynamically as the buffer fills up and empties and is used for flow control. This size is specific to the connection that the TCP packet is associated with.
Checksum (16 bits) This field is a checksum of the TCP header and all data contained in the TCP segment. Several other fields are combined to calculate the checksum, including the source and destination IP addresses from the IP header.
Urgent pointer (16 bits) This field is used to indicate the location of urgent data, if any (discussed in "URG Pointer Processing").

Interested readers should familiarize themselves with TCP by reading the RFC 793, as well as Stevens's discussion on TCP in TCP/IP Illustrated, Volume 1 (Addison-Wesley, 1994).

Basic TCP Header Validation

Naturally, every field in the TCP header has properties that have some relevance in terms of security. To start, a few basic attributes of the TCP packet, explained in the following sections, should be verified before the packet is processed further. Failure to do so adequately can lead to serious security consequences, with problems ranging from memory corruption to security policy violation.

Is the TCP Data Offset Field Too Large?

The TCP header contains a field indicating its length, which is known as the data offset field. As with IP header validation, this field has an invariant relationship with the packet size:

TCP header length <= data available
20 <= TCP header length <=60

The TCP processing code must ensure that there's enough data in the packet to hold the header. Failure to do so could result in processing uninitialized memory and potentially even integer-related vulnerabilities, when calculations such as this are performed:

data_size = packet_size  tcp_header_size;

If the tcp_header_size variable hasn't been validated sufficiently, underflowing the data_size variable might be possible. This will invariably result in out-of-bounds memory accesses or possibly even memory corruption later during processing, most likely when validating the checksum or dealing with TCP options.

Is the TCP Header Length Too Small?

The minimum size of a TCP header is 20 bytes, making certain values for the TCP data offset field too small. As with IP headers, if code analyzing TCP packets fails to ensure that the header length is at least 5 (again, it's multiplied by four to get the header's actual size in bytes), length calculations can result in integer underflows.

Is the TCP Checksum Correct?

The TCP stack must verify the checksum in the TCP header to ensure that the packet is valid. This check is particularly important for software that monitors network traffic. If an application is trying to determine how TCP packets are processed on an end host, it must be sure validate the checksum. If it fails to do so, it can easily be desynchronized in its processing and become hopelessly confused. This is a classic technique for evading IDSs.

TCP Options Processing

TCP packets can contain a variable number of options after the basic header, just like IP packets. However, IP options are rarely used in practice, whereas TCP options are used extensively. TCP options are structured similarly to IP options; they are composed of an option byte, a length byte, and a variable-length data field. The structure is as follows:

struct tcp_option {     unsigned char option;     unsigned char optlen;     char data[0]; };

When auditing code that processes TCP options, you can look for the same types of problems you did for IP options. The following sections briefly recap the potential issues from the discussion of IP options processing:

Is the Option Length Field Sign Extended?

Sign extension of the option length byte can be dangerous and lead to memory corruption or neverending process loops. For example, two Polish researchers named Adam Osuchowski and Tomasz Dubinski discovered a signed vulnerability in processing TCP options was present in the 2.6 Netfilter implementation of the iptables TCP option matching rule in the Linux 2.6 kernel (documented at www.netfilter.org/security/2004-06-30-2.6-tcpoption.html). The following is an excerpt of that code:

char opt[60 - sizeof(struct tcphdr)]; ...    for (i = 0; i < optlen; ) {        if (opt[i] == option) return !invert;        if (opt[i] < 2) i++;        else i += opt[i+1]?:1;    }

An integer promotion occurs when adding the option length (which is of type char) to the integer i. The option length is sign-extended, and a negative length decrements i rather than incrementing it in each iteration of the loop. A specially crafted packet can, therefore, cause this loop to continue executing indefinitely (incrementing i by a certain amount of bytes and then decrementing it by the same amount of bytes).

Are Enough Bytes Left for the Current Option?

As with IP options, certain TCP options are fixed length, and certain options are variable length. One potential attack is specifying a fixed-length option near the end of the option space so that the TCP/IP stack erroneously reads kernel memory past the end of the packet contents.

Is the Option Length Too Large or Too Small?

The option length has an invariant relationship with the size of the TCP header and the total size of the packet. The TCP stack must ensure that the option length, when added to the offset into the header where the option appears, isn't larger than the total size of the TCP header (and, of course, the total size of the packet).

TCP Connections

Before two hosts can communicate over TCP, they must establish a connection. TCP connections are uniquely defined by source IP address, destination IP address, TCP source port, and TCP destination port.

For example, a connection from a Web browser on your desktop to Slashdot's Web server would have a source IP of something like 24.1.20.30, and a high, ephemeral source port such as 46023. It would have a destination IP address of 66.35.250.151, and a destination port of 80 the well-known port for HTTP. There can only be one TCP connection with those ports and IP addresses at any one time. If you connected to the same Web server with another browser simultaneously, the second connection would be distinguished from the first by having a different source port.

States

Each endpoint maintains several pieces of information about each connection it's tracking, which it stores in a data structure known as the transmission control block (TCB). One of the most important pieces of information is the overall connection state. A TCP connection has 11 possible states:

LISTEN When a process running on an end host wants to receive incoming TCP connections, it creates a new connection and binds it to a particular port. While the server waits for incoming TCP connections, that connection is in the LISTEN state.
SYN_SENT A client enters this state when it has sent an initial SYN packet to a server requesting a connection.
SYN_RCVD A server enters this state when it has received an initial SYN packet from a client wanting to connect.
ESTABLISHED Clients and servers both enter this state after the initial TCP handshake has been completed and remain in this state until the connection is torn down.
FIN_WAIT_1 A host enters this state if it's in an ESTABLISHED state and closes its side of the connection by sending a FIN packet.
FIN_WAIT_2 A host enters this state if it's in FIN_WAIT_1 and receives an ACK packet from the participating server but not a FIN packet.
CLOSING A host enters this state if it's in FIN_WAIT_1 and receives a FIN packet from the participating host.
TIME_WAIT A host enters this state if it's in FIN_WAIT_2 when it receives a FIN packet from the participating host or receives an ACK packet when it's in CLOSING state.
CLOSE_WAIT A host enters this state if it's in ESTABLISHED state and receives a FIN packet from the participating host.
LAST_ACK A host enters this state if it's in CLOSE_WAIT state after it has sent a FIN packet to the participating host.
CLOSED A host enters this state if it's in LAST_ACK state and receives an ACK, or after a timeout occurs when a host is in TIME_WAIT state (that timeout period is defined as the maximum segment life of a TCP packet multiplied by two). This state is a theoretical one; when a host enters CLOSED state, an implementation cleans up the connection and removes it from the active connection structures it maintains.

These states are explained in more detail in RFC 793 (www.ietf.org/rfc/rfc0793.txt?number=793).

State transitions generally occur when TCP packets are received that have certain flags set or when the local application dealing with the connection forces a change (such as closing the connection). If the application layer initiates a state change, the TCP/IP stack typically notifies the other endpoint of the state change.

Flags

Six TCP flags are used to convey information from one host to the other:

SYN The synchronize flag is used exclusively for connection establishment. Both sides of a connection must have this flag set in the initial packet of a TCP connection.
ACK The acknowledge flag indicates that this packet is acknowledging it has received some data from the other host participating in the connection. If this flag is set, the acknowledgement number in the TCP header is significant and needs to be verified or processed.
RST The reset flag indicates some sort of unrecoverable problem has occurred in a connection, and the connection should be abandoned.
URG The urgent flag indicates urgent data to be processed (discussed in more detail in "URG Pointer Processing" later in this chapter).
FIN The FIN flag indicates that the issuer wants to close the connection.
PSH The push flag indicates that data in this packet is high-priority and should be delivered to the application as quickly as possible. This flag is largely ignored in modern implementations.

Of the six flags, three are used to cause state changes (SYN, RST, and FIN) and appear only when establishing or tearing down a connection. (RST can occur at any time, but the result is an immediate termination of the connection.)

Establishing a Connection

Establishing a connection is a three-part process, commonly referred to as the three-way handshake. An integral part of the three-way handshake is exchanging initial sequence numbers, covered in "TCP Spoofing" later in this chapter. For now, just focus on the state transitions. Table 14-4 describes the process of setting up a connection and summarizes the states the connection goes through.

Table 14-4. Connection Establishment
Action	Client State	Server State
The server listens on a port for a new connection.	N/A	`LISTEN`
The client sends a SYN packet to the server's open port.	`SYN_SENT`	`LISTEN`
The server receives the packet and enters the `SYN_RCVD` state.	`SYN_SENT`	`SYN_RCVD`
The server transmits a SYN-ACK packet, acknowledging the client's SYN and providing a SYN of its own.	`SYN_SENT`	`SYN_RCVD`
The client receives the SYN-ACK and transmits an ACK packet, acknowledging the server's SYN.	`ESTABLISHED`	`SYN_RCVD`
The server receives the ACK packet, and the connection is fully established.	`ESTABLISHED`	`ESTABLISHED`

Closing a Connection

Connections are bidirectional, and either direction of traffic can be shut down independently. Normally, connections are shut down by the exchange of FIN packets. Table 14-5 describes the process.

Table 14-5. Connection Close
Action	Client State	Server State
The client sends a FIN-ACK packet, indicating it wants to close its half of the connection. The client enters the `FIN_WAIT_1` state.	`FIN_WAIT_1`	`ESTABLISHED`
The server receives the packet and acknowledges it.	`FIN_WAIT_1`	`CLOSE_WAIT`
The client receives the acknowledgement of its FIN.	`FIN_WAIT_2`	`CLOSE_WAIT`
The server now elects to close its side of the connection and sends a FIN packet.	`FIN_WAIT_2`	`LAST_ACK`
The client receives the server's FIN and acknowledges it.	`TIME_WAIT`	`LAST_ACK`
The server receives the acknowledgement.	`TIME_WAIT`	`CLOSED`
The client tears down the TCB after waiting enough time for the server to receive the acknowledgement.	`CLOSED`	N/A

Note that connection termination isn't always this straightforward. If one host sends a packet with the FIN flag set, it's indicating a termination of the sending channel of the established TCP stream, but the hosts receiving channel remains open. Upon receipt of a FIN, a host can send more data across the connection before sending a FIN packet of its own.

Resetting a Connection

Resetting a connection occurs when some sort of unrecoverable error has occurred during the course of connection establishment or data exchange. Resetting the connection simply involves a host sending a packet with the RST flag set. RSTs are used mainly in these situations:

Someone sends a SYN to establish a connection with a server, but the server port isn't open (that is, no server is listening on the specified port).
A TCP packet arrives at a host without the SYN flag set, and no valid connection can be found to deliver this packet to.

TCP Streams

TCP is a stream-oriented protocol, meaning that data is treated as an uninterrupted stream (as opposed to a record-based protocol, such as UDP). Streams are tracked internally by using sequence numbers, with each sequence number corresponding to one byte of data. The TCP header has two sequence number fields: sequence number and acknowledgement number. The sequence number indicates where in the data stream the data in the packet belongs. The acknowledgement number indicates how much of the remote stream has been received successfully and accounted for. This field is updated every time the host sees new data from the remote host. If some data is lost during transmission, the acknowledgement number isn't updated. Eventually, the peer notices it hasn't received an acknowledgement on the data it sent and retransmits the missing data.

Each TCP endpoint maintains a sliding window, which determines which sequence numbers it allows from its peer. This window mechanism allows data to be saved when it's delivered out of order or if certain segments are corrupted or dropped. It also determines how much data the host accepts before having a chance to pass the data up to the application layer. For example, say a host is expecting the next sequence number to be 0x10000. If the host has a window of 0x1000, it accepts segments between 0x10000 and 0x11000. "Future" data is saved and used as soon as holes are filled when the missing data is received.

Both sequence numbers are seeded randomly at the beginning of a new connection and then exchanged in the three-way handshake. The starting sequence number is called the initial sequence number (ISN). Here's a brief example of a three-way handshake and a simple data exchange. First, the client picks a random initial sequence number and sends it to the server. Figure 14-11 shows that the client has picked 0xabcd.

Figure 14-11. Transmit 1

The server also picks a random initial sequence number, 0x4567, which it sets in the SYN-ACK packet. The SYN-ACK packet acknowledges the ISN sent by the client by setting 0xabce in the acknowledgment number field. If you recall, that field indicates the sequence number of the next expected byte of data. SYN and SYN-ACK packets consume one sequence number, so the next data you're expecting to receive should begin at sequence number 0xabce (see Figure 14-12).

Figure 14-12. Receive 1

The client completes the handshake by acknowledging the server's ISN. Note that the sequence number has been incremented by one to 0xabce because the SYN packet consumed the sequence number 0xabcd. Likewise, the client in this connection indicates that the next sequence number it expects to receive from the server is 0x4568 because 0x4567 was used by the SYN-ACK packet (see Figure 14-13).

Figure 14-13. Transmit 2

Now the client wants to send two bytes of data, the characters HI. The sequence number is the same, as the client hasn't sent any data yet. The acknowledgement number is also the same because no data has been received yet (see Figure 14-14).

Figure 14-14. Transmit 3

The server wants to acknowledge receipt of the data and transmit two bytes of data: the characters OK. So the sequence number for the server is 0x4568, as you expect, and the acknowledgement number is now set to 0xabd0. This number is used because sequence number 0xabce is the character H and sequence number 0xabcf is the character I (see Figure 14-15).

Figure 14-15. Receive 2

The client doesn't have any new data to send, but it wants to acknowledge receipt of the OK data (see Figure 14-16).

Figure 14-16. Transmit 3

TCP Spoofing

Sending TCP packets with arbitrary source addresses and content is fairly straightforwardtypically only a few lines of C code with a library such as libdnet or libnet. There are a few reasons attackers would want to send these type of TCP packets:

Attackers might want to fabricate a new connection purporting to be from one host to another. Plenty of software has access control policies based on the source IP address. The canonical example is something like rsh, which can be configured to honor trust relationships between hosts based on the source IP address.
If attackers know about a connection that's underway, they might want to insert data into that connection. For example, they could insert malicious shell commands into a victim's TELNET session after the victim has logged in. Another attack is modifying a file as a user downloads it to insert Trojan code.
Attackers might want to terminate an ongoing connection, which can be useful in attacking distributed systems and performing various denial-of-service attacks.

TCP's main line of defense against these attacks is verifying sequence numbers of incoming packets. The following sections examine these attacks in more detail and how sequence numbers come into play in each scenario.

Connection Fabrication

Say you want to spoof an entire TCP connection from one host to another. You know there's a trust relationship between two servers running the remote shell service. If you can spoof a rsh connection from one server to the other, you can issue commands and take over the target machine. First, you would spoof a SYN packet from server A to server B. You can pick a sequence number out of thin air as your initial sequence number (see Figure 14-17).

Figure 14-17. Transmit 1

Server B is going to respond to server A with a SYN-ACK containing a randomly chosen initial sequence number represented by BBBB in Figure 14-18.

Figure 14-18. Receive 1

To complete the three-way handshake and initialize the connection, you need to spoof a third acknowledgement packet (see Figure 14-19).

Figure 14-19. Transmit 2

The first major obstacle is that you need to see the SYN-ACK packet going from server B to server A to observe the sequence number server B chose. Without that sequence number, you can't acknowledge the SYN-ACK packet and complete the three-way handshake.

Naturally, if you're on the same network so that you can sniff server B's packets, you won't have any problems learning the correct sequence number. If you aren't on the same network, and you can't hack the routing infrastructure to see the packet, you need to guess! This method is called blind connection spoofing (described in the next section).

The second obstacle to this attack is that the SYN-ACK packet can potentially reach server A, and server A isn't expecting it. Server A likely generates a RST in response to the SYN-ACK, which messes up your spoofed connection. There are a few ways to work around this problem, so consider it a nonissue for the purposes of this discussion.

Blind Connection Spoofing

If attackers can't see the SYN-ACK packet the victim server generates, they have to guess the initial sequence number the victim server chose. Historically, guessing was quite simple, as many operating systems used simple incremental algorithms to choose their ISNs.

A common practice was to keep a global ISN variable and increment it by a fixed value with every new connection. To exploit this practice, attackers could connect to the victim server and observe its choice of ISN. With some simple math, they could calculate the next ISN to be used, perform the spoofing attack, and know the correct acknowledgement number to spoof.

Most operating systems moved to randomly generated ISNs to mitigate the threat of blind TCP spoofing. The security of much of TCP depends on the unpredictability of the ISN, so it's important that their ISN generation code really does produce random sequence numbers. Straightforward linear congruent pseudo-random number generators (PRNGs) doesn't cut it, as an attacker can sample several ISNs to reverse the internal state of the random number algorithm.

Back in 2000, Pascal Bouchareine of the Hacker Emergency Response Team (HERT) published an advisory about FreeBSD's ISN generation, which used the kernel random() function: a linear congruent PRNG. After sampling four ISNs, an attacker can reconstruct the PRNGs internal state and generate the same sequence numbers as the target host.

An Attack on Randomness

There have been a couple of interesting discoveries related to the randomness of TCP sequence-numbering algorithms. Of particular note is a research paper made available by Michael Zalewski at www.bindview.com/Support/RAZOR/Papers/2001/tcpseq.cfm, which discusses the relative strengths of random number algorithms some contemporary operating systems use. Although the versions tested are somewhat dated, the paper gives you a good idea how operating systems measure up against each other. (Additionally, even though some versions aren't so current, a lot of the ISN algorithms probably haven't changed a great deal.) The paper goes on to discuss PRNG strengths in other network components (such as DNS IDs and session cookies).

ISN Vulnerability

Stealth and S. Krahmer, members of a hacker group named TESO discovered a subtle blind spoofing bug in the Linux kernel, in the 2.2 branch of code. The following code was used to generate a random ISN:

__u32 secure_tcp_sequence_number(__u32 saddr, __u32 daddr,                  __u16 sport, __u16 dport) {     static __u32    rekey_time = 0;     static __u32    count = 0;     static __u32    secret[12];     struct timeval     tv;     __u32        seq;     /*      * Pick a random secret every REKEY_INTERVAL seconds.      */     do_gettimeofday(&tv);    /* We need the usecs below... */    if (!rekey_time || (tv.tv_sec - rekey_time)        > REKEY_INTERVAL) {        rekey_time = tv.tv_sec;        /* First three words are overwritten below. */        get_random_bytes(&secret+3, sizeof(secret)-12);        count = (tv.tv_sec/REKEY_INTERVAL) << HASH_BITS;    }    secret[0]=saddr;    secret[1]=daddr;    secret[2]=(sport << 16) + dport;    seq = (halfMD4Transform(secret+8, secret) &           ((1<<HASH_BITS)-1)) + count;    seq += tv.tv_usec + tv.tv_sec*1000000;    return seq; }

In the call to get_random_bytes(), the intent is to write random data over the last nine bytes of the secret array. However, the code actually writes the data at the wrong place in the stack, and the majority of the secret key is left always containing the value zero! This happens because the expression &secret is a pointer to an array with 12 elements. From the discussion on pointer arithmetic in Chapter 6, remember that an integer added to a pointer type is multiplied by the size of the base data type, so &secret+3 is the address 36 elements past the start of secret. The author intended to use &secret[3], which correctly indexes the third element in the secret array.

The impact of this oversight was that the sequence numbers were very close to each other if the source IP address was the only variable, allowing the TESO researchers to craft an ISN-guessing attack.

Auditing Tip

Examine the TCP sequence number algorithm to see how unpredictable it is. Make sure some sort of cryptographic random number generator is used. Try to determine whether any part of the key space can be guessed deductively, which limits the range of possible correct sequence numbers. Random numbers based on system state (such as system time) might not be secure, as this information could be procured from a remote source in a number of ways.

Connection Tampering

If attackers want to spoof TCP packets to manipulate existing connections, they need to provide a sequence number that's within the currently accepted window. If attackers are located on the network and can sniff packets belonging to the connection they are trying to manipulate, finding this number is obviously quite simple. From this position, attackers can easily inject data or tear down a connection. In more subtle attacks, they could hijack and resynchronize an existing TCP connection.

However, if attackers can't see the packets belonging to the target connection, finding the sequence number is again more difficult. They need to guess a sequence number that's within the currently accepted window to have their spoofed TCP packets honored.

Blind Reset Attacks

In certain situations, attackers might want to remotely terminate a connection between two hosts on outside networks. Certain protocols and applications can fall into behavior that's not secure or could be exploited if their TCP connections are torn out from under them. For example, there have been attacks against Internet Relay Chat (IRC) based on temporarily severing links between distributed servers to steal privileges to chat channels. Kids' games aside, a researcher named Paul Watson published an attack with a bit more gravity. The bullet point of his presentation was that resetting Border Gateway Protocol (BGP) TCP connections maliciously can lead to considerable disruption of routing between ISPs (archives of the presentation are available at www.packetstormsecurity.org/papers/protocols/SlippingInTheWindow_v1.0.doc).

Attackers attempting to spoof a RST packet have a few things working in their favor. First, the RST packet just needs to be in the current window to be honored, which reduces the search for sequence numbers. Second, the RST packet is processed immediately if it's anywhere within the window, which removes any potential issues with stream reassembly or having to wait for a sequence number to be reached.

Attackers need to know the source IP, source port, destination IP, destination port, a sequence number within the windowand that's about it. If the connection used a window size of 16KB, an attacker needs to send about 262,143 packets. Paul Watson was able to terminate connections by brute-forcing the sequence number at T1 speeds in roughly 10 seconds.

It's worth noting that many old operating systems, especially older UNIX systems, don't even check that the sequence number in the RST packet is within the window, making reset attacks extremely easy. In addition, the reset-inducing packet can be a SYN instead of a RST, as a SYN in the window causes an existing connection to be reset.

Blind Data Injection Attacks

A blind data injection attack is a slight superset of the blind reset attack. The attacker needs to provide an acknowledgement number as well as a sequence number. However, the verification of acknowledgement numbers is lax enough that only two guesses are usually needed for each sequence number trial.

The full details of this attack and the blind reset attacks are outlined in the excellent draft IETF document Improving TCP's Robustness to Blind In-Window Attacks by R. Stewart and M. Dalal (www.ietf.org/internet-drafts/draft-ietf-tcpm-tcpsecure-05.txt).

TCP Segment Fragmentation Spoofing

Michael Zalewski pointed out an interesting potential blind spoofing attack in a post to the full-disclosure mailing list (archived at archives.neohapsis.com/archives/fulldisclosure/2003-q4/3488.html). If attackers know that a TCP segment is fragmented as it traverses from one endpoint to another, they can spoof an IP fragment for the data section of the packet. This spoofing allows them to inject data into the TCP connection without having to guess a valid sequence number. Attackers need to come up with a mechanism to fix the TCP checksum, but that should prove well within the realm of possibility.

TCP Processing

So far, you've examined a few security issues in TCP code. The following sections describe some interesting corner cases and nuances in TCP processing to give you ideas where to look for potential vulnerabilities.

TCP State Processing

TCP stacks implement a complicated state machine that's highly malleable by outside actors. Studying this code can reveal subtle behaviors that might be useful to attackers. For example, operating systems have different reactions to unusual combinations of TCP flags. These reactions can lead to security-critical behaviors, which you examine in Chapter 15's discussion of firewalls and SYN-FIN packets. You can also find many corner cases in TCP processing. For example, some operating systems allow data in the initial SYN packet, and some allow data segments without the ACK flag set. The following section has an example of a vulnerability that shows the kind of creativity you should apply to your inspection of TCP code.

Linux Blind Spoofing Vulnerability

Noted researcher, Anthony Osborne, discovered a subtle and fascinating bug in the Linux TCP stack related to connection state tracking (documented at www.ciac.org/ciac/bulletins/j-035.shtml). There were actually three vulnerabilities that he was able to weave into an attack for blindly spoofing TCP traffic from an arbitrary source. To follow this vulnerability, take a look at a simplified version of the tcp_rcv() function in the Linux kernel.

int tcp_rcv() { ...     if(sk->state!=TCP_ESTABLISHED)     {         if(sk->state==TCP_LISTEN)         {             seq = secure_tcp_sequence_number(saddr, daddr,                              skb->h.th->dest,                              skb->h.th->source);             tcp_conn_request(sk, skb, daddr, saddr, opt,                 dev, seq);             return 0;         }         ... /* various other processing */     }     /*      *    We are now in normal data flow (see the step list      *    in the RFC) Note most of these are inline now.      *    I'll inline the lot when I have time to test it      *    hard and look at what gcc outputs      */     if (!tcp_sequence(sk, skb->seq, skb->end_seq-th->syn))         die(); /* bad tcp sequence number */     if(th->rst)         return tcp_reset(sk,skb);     if(th->ack && !tcp_ack(sk,th,skb->ack_seq,len))         die(); /* bad tcp acknowledgement number */     /* Process the encapsulated data */     if(tcp_data(skb,sk, saddr, len))         kfree_skb(skb, FREE_READ); }

If the incoming packet is associated with a socket that isn't in TCP_ESTABLISHED, it performs a variety of processing related to connection initiation and teardown. What's important to note is that after this processing is performed, the code can fall through to the normal data-processing code in certain situations. This is usually innocuous, as control packets such as SYN and RST don't contain data. Looking at the preceding code, you can see that any data in the initial SYN packet isn't processed, as the server is in the TCP_LISTEN state, and it returns out of the receive function. However, after the SYN is received and the server is in the SYN_RCVD state, the code falls through and data is processed on incoming packets. So data in packets sent after the initial SYN but before the three-way handshake is completed is actually queued to be delivered to the userland application.

The attack Osborne conceived was to spoof packets from a trusted peer and provide data before completion of the three-way handshake. Attackers would first send a normal SYN packet, spoofed from a trusted peer (see Figure 14-20).

Figure 14-20. Transmit 1

Upon receipt of the SYN packet, the server enters the SYN_RCVD state and sends the SYN-ACK packet to the purported source of the SYN. Attackers can't see this packet, but as long as they act quickly enough, their attack isn't hindered.

At this point, they know which sequence numbers are valid in the window for data destined for the victim host, but they don't know what the acknowledgement sequence number should be because they didn't see the SYN-ACK packet. However, look closely at the previous code from tcp_rcv(). The second nuance that Osborne leveraged is that if the ACK flag isn't set in the TCP packet, the Linux TCP stack doesn't check the acknowledgement sequence number for validity before queuing the data! So attackers simply send some data in a packet with a valid sequence number but with no TCP flags set (see Figure 14-21).

Figure 14-21. Transmit 2

Now attackers have data queued in the victim machine's kernel, ready to be delivered to the userland rlogind process as soon as the three-way handshake is completed. Normally, this handshake can't be completed without knowing or guessing the correct acknowledgement number, but Osborne discovered a third vulnerability that lets attackers deliver the death blow. Usually, the userland process doesn't return from the call to accept() unless the handshake is completed. The following code shows the logic for this in tcp.c:

static struct sk_buff *tcp_find_established(struct sock *s) {     struct sk_buff *p=skb_peek(&s->receive_queue);     if(p==NULL)         return NULL;     do     {         if(p->sk->state == TCP_ESTABLISHED ||             p->sk->state >= TCP_FIN_WAIT1)             return p;         p=p->next;     }     while(p!=(struct sk_buff *)&s->receive_queue);     return NULL; }

Note that the kernel treats states greater than or equal to TCP_FIN_WAIT1 as being equivalent to ESTABLISHED. The following code handles packets with the FIN bit set:

static int tcp_fin(struct sk_buff *skb, struct sock *sk, struct tcphdr *th) { ...    switch(sk->state)    {        case TCP_SYN_RECV:        case TCP_SYN_SENT:        case TCP_ESTABLISHED:            /*             * move to CLOSE_WAIT, tcp_data() already handled             * sending the ack.             */            tcp_set_state(sk,TCP_CLOSE_WAIT);            if (th->rst)                sk->shutdown = SHUTDOWN_MASK;            break;

CLOSE_WAIT is greater than TCP_FIN_WAIT, which means that if attackers simply send a FIN packet, it moves the connection to the CLOSE_WAIT state, and the userland application's call to accept() returns successfully. The application then has data available to read on its socket: the data the attackers spoofed! In summary, the attack involves the three packets shown in Figure 14-22.

Figure 14-22. Blind spoofing attack

Sequence Number Representation

Sequence numbers are 32-bit unsigned integers that have a value between 0 and 2^32-1. Note that sequence numbers wrap around at 0, and special care must be taken to make this wrapping work flawlessly. For example, say you have a TCP window starting at 0xfffffff0 with a size of 0x1000. This means data with sequence numbers between 0xfffffff0 and 0xffffffff is within the window, as is data with sequence numbers between 0x0 and 0xff0. This flexibility is provided by the following macros:

    #define    SEQ_LT(a,b)     ((int)((a)-(b)) < 0)     #define    SEQ_LEQ(a,b)    ((int)((a)-(b)) <= 0)     #define    SEQ_GT(a,b)     ((int)((a)-(b)) > 0)     #define    SEQ_GEQ(a,b)    ((int)((a)-(b)) >= 0)

It's worth taking a moment to study how these macros work around corner cases. Basically, they measure the absolute value distance between two sequence numbers. In general, if you see code operate on sequence numbers without using a similar type of macro, you should be suspicious. The next section describes an example.

Snort Reassembly Vulnerability

Bruce Leidl, Juan Pablo Martinez Kuhn, and Alejandro David Weil from CORE Security Technologies published a remotely exploitable heap overflow in Snort's TCP stream reassembly that resulted from improper handling of sequence numbers (www.coresecurity.com/common/showdoc.php?idxseccion=10&idx=313). To understand this code, you need a little background on relevant structures used by Snort to represent TCP connections and incoming TCP packets. The incoming TCP segment is represented in a StreamPacketData structure, which has the following prototype:

typedef struct _StreamPacketData {     ubi_trNode Node;     u_int8_t *pkt;     u_int8_t *payload;     SnortPktHeader pkth;     u_int32_t seq_num;     u_int16_t payload_size;     u_int16_t pkt_size;     u_int32_t cksum;     u_int8_t chuck;    /* mark the spd for                           chucking if it's                         * been reassembled                         */ } StreamPacketData;

The fields relevant for this attack are the sequence number, stored in the seq_num member, and the size of the segment, stored in payload_size. The Snort stream reassembly preprocessor has another structure to represent state information about a current stream:

typedef struct _Stream {     ... members cut out for brevity ...     u_int32_t current_seq; /* current sequence number */     u_int32_t base_seq;    /* base seq num for this                        packet set */     u_int32_t last_ack;    /* last segment ack'd */     u_int16_t win_size;    /* window size */     u_int32_t next_seq;    /* next sequence we expect                                 to see  used on reassemble */     ... more members here ... } Stream;

The Stream structure has (among other things) a base_seq member to indicate the starting sequence number of the part of the TCP stream that is being analyzed, and a last_ack member to indicate the last acknowledgement number that the peer was seen to respond with.

Now, for the vulnerability. The following code is used to copy data from a TCP packet that has been acknowledged by the peer. All variables are of the unsigned int type, with the exception of offset, which is an int. Incoming packets are represented by a StreamPacketData structure (pointed to by spd), and are associated with a Stream structure (pointed to by s). Coming into this code, the packet contents are being copied into a 64K reassembly buffer depending on certain conditions being true. Note that before this code is executed, the reassembly buffer is guaranteed to be at least as big as the block of data that needs to be analyzed, which is defined to be the size (s->last_ack s->base_seq).

The following code has checks in place to make sure the incoming packet is within the reassembly windowthe sequence number must be in between s->base_seq and s->last_ack:

   /* don't reassemble if we're before the start sequence     * number or after the last ack'd byte     */    if(spd->seq_num < s->base_seq || spd->seq_num > s->last_ack) {        DEBUG_WRAP(DebugMessage(DEBUG_STREAM,                 "not reassembling because"                 " we're (%u) before isn(%u) "                 " or after last_ack(%u)\n",               spd->seq_num, s->base_seq, s->last_ack););        return; }

Next, a check is again performed to ensure the sequence number is past base_seq. It also makes sure the sequence number is greater than or equal to the next expected sequence number in the stream. One final check is done to verify that the sequence number plus the payload size is less than the last acknowledged sequence number.

    /* if it's in bounds... */     if(spd->seq_num >= s->base_seq &&         spd->seq_num >= s->next_seq &&        (spd->seq_num+spd->payload_size) <= s->last_ack)     {

If all these checks pass, the data portion of the packet being inspected is added to the reassembly buffer for later analysis:

    offset = spd->seq_num - s->base_seq;     s->next_seq = spd->seq_num + spd->payload_size;     memcpy(buf+offset, spd->payload, spd->payload_size);

The vulnerability in this code results from the authors using unsigned ints to hold the sequence numbers. The attack CORE outlined in its advisory consisted of a sequence of packets that caused the code to run with the following values:

s->base_seq = 0xffff0023 s->next_seq = 0xffff0024 s->last_ack = 0xffffffff spd->seq_num 0xffffffff spd->payload_size 0xf00

If you trace the code with these values, you can see that the following check is compromised:

    (spd->seq_num+spd->payload_size) <= s->last_ack)

The seq_num is an unsigned int with the value 0xffffffff, and spd->payload_size is an unsigned int with the value 0xf00. Adding the two results in a value of 0xeff, which is considerably lower than last_ack's value of 0xffffffff. Therefore, memcpy() ends up copying data past the end of the reassembly buffer so that an attacker can remotely exploit the process.

Sequence Number Boundary Condition

A nuance of sequence number signed comparisons is worth pointing out. Assume you use the following macro to compare two sequence numbers:

    #define    SEQ_LT(a,b)    ((int)((a)-(b)) < 0)

Use of a macro such as this has some interesting behavior when dealing with cases near to integer boundary conditions, such as the sequence numbers 0 and 0x7fffffff. In this case, SEQ_LT(0, 0x7fffffff) evaluates to (0-0x7fffffff), or 0x80000001. This is less than 0, so the result you find is that the sequence number 0 is less than 0x7fffffff.

Now compare the sequence numbers 0 and 0x80000000. SEQ_LT(0,0x80000000) evaluates to (0-0x80000000), or 0x80000000. This is less than 0, so the result you find is that sequence number 0 is less than 0x80000000.

Now compare 0 and 0x80000001. SEQ_LT(0,0x80000001) evaluates to (0-0x80000001), or 7fffffff. This is greater than 0, so you find that the sequence number 0 is greater than the sequence number 0x80000001.

Basically, if two sequence numbers are 2GB away from each other, they lie on the boundary that tells the arithmetic which sequence number comes first in the stream. Keep this boundary in mind when auditing code that handles sequence numbers, as it may create the opportunity for TCP streams to be incorrectly evaluated.

Window Scale Option

The window scale TCP option allows a peer to specify a shift value to apply to the window size. This option can allow for very large TCP windows. The maximum window size is 0xFFFF, and the maximum window scale value is 14, which results in a possible window size of 0x3FFFC000, or roughly 1GB.

As mentioned, the sequence number comparison boundary is located at the 2GB point of inflection. The maximum window scale value of 14 is carefully chosen to prevent windows from growing large enough that it's possible to cross the boundary when doing normal processing of data within the window. The bottom line is that if you encounter an implementation that honors a window scale of 15 or higher, chances are quite good the reassembly code can be exploited in the TCP stack.

URG Pointer Processing

TCP provides a mechanism to send some out-of-band (OOB) data at any point during a data exchange. ("Out of band" means ancillary data that isn't part of the regular data stream.) The idea is that an application can use this mechanism to signal some kind of exception with accompanying data the peer can receive and handle immediately without having to dig through the data stream and generally interrupt the traffic flow. RFC 793 (www.ietf.org/rfc/rfc0793.txt?number=793) is quoted here:

The objective of the TCP urgent mechanism is to allow the sending user to stimulate the receiving user to accept some urgent data and to permit the receiving TCP to indicate to the receiving user when all the currently known urgent data has been received by the user.

The TCP header has a 16-bit urgent pointer, which is ignored unless the URG flag is set. When the flag is set, the urgent pointer is interpreted as a 16-bit offset from the sequence number in the TCP packet into the data stream where the urgent data stops. When auditing urgent pointer processing code, you should consider the potential mistakes covered in the following sections.

Handling Pointers into Other Packets

The urgent pointer points to an offset in the stream starting from the sequence number indicated in the packet header. It's perfectly legal for the urgent pointer to point to an offset that's not delivered in the packet where the URG flag is set. That is, the urgent pointer offset might hold the value 1,000, but the packet is only 500 bytes long. Code dealing with this situation can encounter two potential problem areas:

Neglecting to check that the pointer is within the bounds of the current packet This behavior can cause a lot of trouble because the code reads out-of-bounds memory and attempts to deliver it to the application using this TCP connection. Worse still, after extracting urgent data from the stream, if the code copies over urgent data with trailing stream data (effectively removing urgent data from the buffer), integer underflow conditions and memory corruption are a likely result.
Recognizing that the pointer is pointing beyond the end of the packet and trying to handle it This behavior is correct but is easy to get wrong. The problem with urgent pointers pointing to future packets is complicated by the fact that subsequent packets arriving could overlap where urgent data exists in the stream or subsequent packets arriving might also have the URG flag set, thus creating a series of urgent bytes within close proximity to each other.

Handling 0-Offset Urgent Pointers

The urgent pointer points to the first byte in the stream following the urgent data, so at least one byte must exist in the stream before the urgent pointer; otherwise, there would be no urgent data. Therefore, an urgent pointer of 0 is invalid. When reviewing code that deals with urgent pointers, take the time to check whether an urgent pointer of 0 is correctly flagged as an error. Many implementations fail to adequately validate this pointer, and as a result, might save a byte before the beginning of the urgent pointer or corrupt memory when trying to remove the urgent data from the stream.

Simultaneous Open

There is a lesser-known way of initiating a TCP connection. In a simultaneous open, both peers send a SYN packet at the same time with mirrored source and destination ports. Then they both send a SYN-ACK packet, and the connection is established.

From the perspective of an endpoint, assume you send a SYN from port 12345 to port 4242. Instead of receiving a SYN-ACK packet, you receive a SYN packet from port 4242 to port 12345. Internally, you transfer from state SYN_SENT to SYN_RCVD and send a SYN-ACK packet. The peer sends a SYN-ACK packet to you acknowledging your SYN, at which point you can consider the connection to be established. Keep this initiation process in mind when auditing TCP code, as it's likely to be overlooked or omitted.