Tcpdump could well be called the first intrusion detection system—certainly when it was initially released in 1991 there were few, if any, other systems that could display network traffic. Even today, tcpdump is used as an adjunct to current intrusion detection products, providing a standard format for capturing and storing network traffic. As all IDS systems support tcpdump format files, it is useful when employing multiple IDS technologies. A file captured with tcpdump can be run through multiple IDS systems to contrast and compare the varying capabilities of the products.
In this chapter, we will explore some of the capabilities of tcpdump with a specific focus on intrusion detection uses. In addition, other useful programs in the tcpdump family will be briefly surveyed.
Tcpdump was originally developed by the Network Research Group at Lawrence Berkeley National Laboratory (LBNL), and it is currently maintained by and is available at www.tcpdump.org.
Tcpdump is capable of capturing, displaying, and storing all forms of network traffic (not just TCP traffic, despite the name) in a variety of output formats. Input packets can be captured from the network or read from a disk file. Tcpdump capture files are portable across architectures because all data is stored in network byte order.
The man page for tcpdump on a given system is the authoritative guide to its usage, but the main options are fairly standard. The syntax for the tcpdump command is as follows:
tcpdump [ -adeflnNOpqStvx ] [ -c count ] [ -F file ][ -i interface ] [ -r file ] [ -s snaplen ] [ -T type ] [ -w file ] [ expression ]
The most commonly used options are described in Table 5-1.
Option |
Description |
---|---|
-c |
Capture count packets, then exit. |
-e |
Print the link-level header; usually the link-level (such as Ethernet) data is not printed. |
-I |
The name of the network interface to capture data from. |
-n |
Don't convert IP addresses or port numbers to names. |
-O |
Do not attempt to optimize the generated code; this is sometimes useful when specifying complex expressions. |
-p |
Don't put the interface in promiscuous mode. |
-r |
Read packets from the tcpdump capture file (created with the -w option, or IDS tcpdump format output). |
-s |
Capture snaplen bytes of data from each packet (the default is 68 bytes). |
-S |
Print TCP sequence numbers as captured 32-bit values, rather than relative to the beginning of the connection. |
-t |
Don't print any timestamp (not generally useful). |
-tt |
Print timestamp as standard Unix timestamp (number of seconds since the Unix beginning of time: Jan. 1, 1970), rather than human formatted. |
-v |
Produce more verbose output. |
-vv |
Produce even more verbose output. |
-w |
Write packets to file, in raw format. |
-x |
Print the packet in hexadecimal. |
Network Byte Order
When designing the TCP/IP protocol suite, attention was paid to standardizing the representation of multiple-byte quantities. Some processors (those from Sun and Motorola, IBM-370s, and PDP-10s, for instance) store the most significant bytes (that is, the bytes that hold the largest part of the value) first. These are known as big-endian processors. Other processors (the x86 family, Vaxes, and Alphas, among others) store the most significant byte last. These are known as little-endian processors. A few older processors use a strange ordering called middle-endian, as well. Some modern processors can change their endian type programmatically.
Clearly, it was necessary to decide on a standard Internet format if systems with differing endian standards could hope to communicate. Possibly because early development took place on big-endian systems, the big-endian format was accepted as the standard network byte order. Systems that use other orders must convert the network data to their internal format on input, and convert it back to network byte order on outputting it to the network. Tcpdump, the TCP/IP stack in the operating system, and all network programs routinely perform these tasks so that a particular system's endian style does not affect their functionality.
A few of these options require special attention or caveats:
Tcpdump has two basic output formats: either a raw file, which consists of the packet contents along with accompanying information (such as timestamps), or various forms of human-readable output. In this section, we will explore the various human-readable formats that are of interest to the IDS analyst. To compare the formats, we will examine the effects of the various output formats on a sample packet, which is a mid-connection packet from a telnet connection.
By default, tcpdump will print one line per packet consisting of important packet data, including timestamp, protocol, source and destination hosts and ports, flags, options, and (for TCP packets) sequence numbers. We begin by using the following command to display our sample packet:
tcpdump –r demo.trace
This command produces the following sample packet:
03:00:44.919519 test.demo.com.3252 > help.demo.com.telnet: P 177:215(38) ack 331 win 62839 (DF)
In this example, we can see a timestamp shortly past 3 A.M., followed by the source hostname (test.demo.com), and the source port (3252). The greater-than symbol (>) indicates the direction of the packet. The destination host is indicated as help.demo.com on the telnet port. This packet contains the PUSH flag (P), indicating urgent data, followed by the sequence number at the beginning of the packet (177) and that of the packet end (215), followed by the difference, which will typically be the count of data bytes in this packet (38). In this default mode, the sequence numbers are normalized to 0, as we shall discuss later. We can also see that the packet is acknowledging (ack) relative sequence number 331 from a previous packet sent by help.demo.com. The transmission window (win) is 62839, and the Do Not Fragment flag (DF) is also set.
If we use the -tt flag described earlier, we will see the following output:
1063360844.919519 test.demo.com.3252 > help.demo.com.telnet: P 177:215(38) ack 331 win 62839 (DF)
Note that the timestamp is now expressed in Unix format, which is seconds since Jan. 1, 1970, 0:00 GMT. We can convert this to human-readable format by sending the numeric value thru date –r:
$ date -r 1063360844.919519 Fri Sep 12 03:00:44 PDT 2003
The Unix timestamp is valuable because direct arithmetic can be performed between timestamps. For instance, by taking the difference between the timestamp of the initial SYN packet that started a connection and the timestamp of the FIN or RST that ended it, the connection’s total time in seconds can be directly computed.
We will now try the –n flag:
03:00:44.919519 192.168.3.3.3252 > 192.168.3.99.23: P 177:215(38) ack 331 win 62839 (DF)
The change from the original packet display is clear—neither the hostnames, nor the ports, are resolved to names. Some implementations of tcpdump require the use of -nn to achieve this result.
Using the –S flag gives us the following output:
03:00:44.919519 test.demo.com.3252 > help.demo.com.telnet: P 2053563889:2053563927(38) ack 3671890340 win 62839 (DF)
The display here gives the raw TCP sequence numbers. Previously, we saw sequence numbers relative to the beginning of the connection. We know from the previous displays that the relative sequence number of test.demo.com started at 177, and since the corresponding raw sequence number is 2053563889, we can by simple arithmetic conclude that test.demo.com started with a sequence number of 2053563712.
Combining the three options (tcpdump –n –tt –S –r demo.trace) gives no surprises:
1063360844.919519 192.168.3.3.3252 > 192.168.3.99.23: P 2053563889:2053563927(38) ack 3671890340 win 62839 (DF)
We can also increase the level of verbosity by using -v:
1063360844.919519 192.168.3.3.3252 > 192.168.3.99.23: P 2053563889:2053563927(38) ack 3671890340 win 62839 (DF) (ttl 126, id 36440)
Here, we additionally see the IP TTL (126) and the IP id field (36440).
In addition to the preceding display formats, tcpdump also provides the option to dump the packet in hexadecimal format by using the -x flag. In this format, the earlier flags still take effect, and the data is displayed as previously, along with a hex dump of the packet. For example, we can run this command:
tcpdump –x –n –tt –S –v –r demo.trace
The output of that command is as follows:
1063360844.919519 192.168.3.3.3252 > 192.168.3.99.23: P 2053563889:2053563927(38) ack 3671890340 win 62839 (DF) (ttl 126, id 36440) 4500 004e 8e58 4000 7e06 59d3 c0a8 0303 c0a8 0363 0cb4 0017 7a66 e5f1 dadc 99a4 5018 f577 2561 0000 594d 5347 000b 0000 0012 008a 0000 0000 db7f 11d4 30c0 8063 6861 726c 6573 686f 7761 7264 c080
What we see in the hex dump is the exact contents of the packet, starting with the IP header, which in this case is followed by the TCP header and the TCP data. As the previous data is in decimal format, and this display is in hex, a good programmer’s calculator can be valuable as an aid in conversion.
Several things can, however, be easily gleaned from inspecting the hex dump and referring to the IP and TCP header formats that were discussed in Chapter 2. Each hex digit represents 4 bits, and the first 4 bits of the IP header represent the IP version, so we can easily see that this packet is using IP version 4. The next hex digit is the number of 32-bit words in the IP header, in this case 5. (This is by far the most common value, as it represents an IP packet with no options). Each set of 4 hex digits represents 16 bits, so after counting 10 (5 2) groups of 4, we reach the beginning of the TCP header, which starts with 0cb4, which is the source port (3252 in decimal).
It is important to read the manual for your version of tcpdump, as options and output formats vary. Our discussion has been based upon the original LBNL version of tcpdump, but there have been some additional changes made by the folks at www.tcpdump.org that are hugely useful. In particular, the -X option displays the packet in ASCII format for direct examination of cleartext protocols. See the man pages for your version of tcpdump for its particular details.
The final argument to the tcpdump command is a Boolean expression against which packets are matched. The construction of the expression may seem somewhat obscure, but with practice it can be used to extract packets matching extremely precise characteristics. In fact, we will see how tcpdump expressions can be used in such diverse applications as detecting services running on nonstandard ports, determining whether users are using peer-to-peer applications, and keeping TCP connection records.
Basically, the Berkeley Packet Filtering (BPF) language used by tcpdump and most IDS products performs packet matching by using an expression that matches bytes within the packet. The expression can include bytes and the normal arithmetic and logical operators, generally matching those found in the C language. A packet that matches the expression will be processed by the BPF application, whereas those that fail the pattern match are silently discarded.
Although tcpdump is capable of dissecting raw packets, we will examine the types of packets we’ve discussed earlier—namely IP and the higher-level protocols (TCP, UDP, and ICMP). Each byte in a packet can be addressed as an offset (starting at 0) from the beginning of the protocol header. The format of this basic form of addressing is proto[offset], where proto is the protocol in question (IP, TCP, UDP, or ICMP), and offset is the byte count from the beginning of the protocol header. Multibyte fields can be addressed as proto[offset:size], where size is 1, 2, or 4, indicating the number of bytes in the field. The IP proto is usually used to filter on specific fields in the IP header, while the TCP, UDP, and ICMP protos are used to filter on the protocol header and packet contents.
Numeric values in the BPF expression can be addressed in the familiar C syntax of 0xabcd for hexadecimal (base 16) values, 01234 for octal (base 8) values, and the familiar decimal notation for decimal values. The following operators can be used to string these values together:
Here are a few examples:
Rather than expecting casual users to remember that (for instance) the TCP destination port is addressed as tcp[2:2], users can use the shorthand tcp dst port 22 format instead of the more cumbersome tcp[2:2]=22 format. It is rather unfortunate that ranges are not expressible in this shorthand format, though. There is, for instance, no convenient way to express TCP ports from 20 to 30 other than the nonintuitive tcp[2:2]>=20 and tcp[2:2]<=30.
Expressions can be quite complex. To check that the first data byte of a TCP packet is a binary 1, we could use this expression:
tcp[(tcp[12]>>4)*4] = 1
Let’s break down this rather complex expression from the inside out. First, we are taking byte 12 of the TCP header, which contains the header length (measured in 32-bit words) in the top four bits, and a reserved field in the lower four bits. We then shift this to the left by four bits, which will move the header length to the lower four bits, and fill the upper four bits with 0’s. As this is a count of 32-bit (or 4-byte) words, we then multiply by four to compute the header length in bytes. This is the offset of the first data byte, whose value is then checked against 0.
Note |
If you’ve forgotten the details of the TCP header, see Chapter 2, particularly Figure 2-5. |
Astute readers will note that we shifted four bits to the right, then two bits to the left in the multiplication by four, so if we assume that the reserved field is indeed filled with 0’s, the expression could be simplified to this:
tcp[tcp[12]>>2] = 1
By performing an and operation on tcp[12] with a binary 11110000 (hexadecimal 0xf0, or decimal 240), the reserved field is masked out to zeros, so the final expression looks like this:
tcp[(tcp[12] & 0xf0)>>2] = 1
In IDS applications, it is probably safest to perform this extra step to avoid possible evasion attempts.
The somewhat tortured syntax we’ve just seen is often expressed more naturally with the use of shorthand expressions. When compiled, these shorthand expressions reduce to the comparisons of packet bytes we’ve looked at, but they are much more readable for human consumption. We’ll briefly examine a few of the more useful shorthand expressions.
To specify hosts or networks, the qualifiers host or net are used, while to specify a port, the port keyword is used. These keywords are followed by a name or number that specifies what the keyword refers to. Here are some examples: host foobar, net 192.168/16, and port telnet.
To specify the direction of the transfer (in other words, to match the source or destination), the keywords src, dst, src or dst and src and dst are used. For example: src host bar, dst net 172.16.0.0/16, src or dst port ftp. If the directional keyword is omitted, src or dst (meaning either source or destination) is assumed.
The length keyword matches the length of the packet (excluding the link-level header), which includes the IP header length, the protocol-specific header, and the protocol data. For IP packets, this is equivalent to the much more cumbersome expression ip[2:2].
Here are a few examples using this more natural syntax:
Tcpdump is often used as an adjunct to IDS systems to capture traffic for later analysis. Utilizing tcpdump in this manner is often termed bulk capture to denote the process of capturing all (or a significant subset) of the traffic that the IDS sees for possible forensic analysis. In large sites, the size of these capture files are likely to be many megabytes or even gigabytes.
Why is this capture important? Several benefits accrue to sites that perform this additional data capturing: First, an IDS system must, of necessity, be lightweight enough to keep up with the traffic stream. Often, it is useful to perform more CPU-intensive analysis on the traffic offline. For instance, the IDS system Bro (available from www.icir .org/vern/bro.html) can detect encrypted stepping stones, where an incoming connection is used as a stepping stone to other systems. This detection mechanism is not lightweight enough to be performed in real time, and thus is performed offline using a bulk-capture file, which captures the same traffic that the IDS sees.
The use of a bulk-capture mechanism also allows experimentation with varying IDS rule sets and comparison between various IDS products in a controlled manner. As most IDS products can read input from a tcpdump file, their features can be compared by feeding each offering the same bulk-capture file, and comparing results. Also, the effects of enabling various IDS options or detection features can be examined.
Probably the most compelling argument for utilizing a bulk-capture feature is for later forensic examination of traffic. An IDS alert can often be a later indication of a prior compromise of a system, and in cases like this, having prior tcpdump data can provide valuable clues as to the nature of the compromise.
Creating a bulk-capture facility is relatively simple, as the hardware and software requirements are generally relatively modest. The system needs hardware fast enough to support the capture and storage of the anticipated traffic. Disk storage requirements will be determined by the traffic load and the length of time that the traffic will be maintained for archival and forensic purposes. Typical modern PC hardware generally has no difficulty with these requirements. Software requirements are, if anything, even more modest, requiring a modern operating system and the software described in this chapter.
To implement a simple bulk-capture system on a Unix system requires little more than the appropriate script, invoked periodically via cron:
#!/bin/sh # bulk_capture.sh – to be invoked by cron periodically to perform bulk # capture and rollover into periodic files FILEPREFIX=/home/capturedir/trace. PIDFILE=/home/capturedir/pid # Determine filename for next capture i=1 while true do # Determine next free capture filename CAPTUREFILE=${FILEPREFIX}.$i [ -f $i ] || break i=`expr $i + 1` done # Get the process ID of the previous capture file PID=0 [ -f $PIDFILE ] PID=`cat $PIDFILE` # Start the next capture # Note that we don't kill the previous capture until after this # one has the opportunity to start, so that there is never a time # without coverage tcpdump –s 1500 –i eth0 –w $CAPTUREFILE myfilter_expression & # Save the new process ID away echo $! >$PIDFILE # Wait briefly to allow new tcpdump to start cleanly, # then kill the previous one sleep 1 [ $PID –eq 0 ] || kill $PID exit
The portions of the script in italics are to be customized by users for their installations. Note especially the use of the –s 1500 option to ensure that the full Ethernet packet is captured by tcpdump. This script can periodically be invoked via cron to roll over to a new file, which will be numbered in sequence, starting at trace.1. An additional script (not included here) that may be required is one that monitors disk space usage and purges old capture files when disk utilization reaches a predetermined value.
In Chapter 2, we examined the details of TCP communications. Each side of the communication establishes an initial sequence number, which is a randomly selected 32-bit number, and it is incremented with each byte transferred, or each SYN and FIN bit set. This means that if we know the sequence number of the packet that established the connection and the sequence number of the packet that ended the connection, we can easily calculate the actual number of bytes transferred. For each side of the connection, we simply subtract the initial sequence number from the final sequence number, and then deduct 1 for each SYN, FIN, and RST sent.
With that in mind, we will start with a tcpdump filter that captures TCP packets with SYN, FIN, or RST. tcp[13] contains the packet flags, and by masking out the lower three bits, which contain the flags we’re interested in, and comparing it with zero, we can extract just the packets of interest:
tcp[13] & 7 != 0
When we use the -S flag with tcpdump, the raw sequence numbers are displayed, as mentioned earlier. When this flag is omitted, tcpdump does the arithmetic for us and computes relative sequence numbers, as shown in the following example:
1068422738.826406 HostA.2398 > HostB.23: S 1950934552:1950934552(0) win 16384 (DF) 1068422738.908640 HostB.23 > HostA.2398: S 1373962008:1373962008(0) ack 1950934553 win 61440 1068422752.662658 HostB.23 > HostA.2398: F 4538:4738(200) ack 94 win 61440 (DF) 1068422752.665419 HostA.2398 > HostB.23: F 94:94(0) ack 4739 win 16319 (DF)
What we see above is an initial SYN from HostA with a value of 1950934552, with a relative sequence number of 0, the SYN/ACK from HostB with a sequence number of 1373962008, again with its respective relative sequence number of 0. At the end of the connection, we see HostB acknowledging HostA’s relative sequence number of 94, and HostA acknowledging HostB’s relative sequence number of 4739. By subtracting two from each, we see that HostA sent 92 data bytes to HostB, which sent 4737 data bytes.
In the upcoming section, “TCPDump as Intrusion Detection,” we will see how to construct a TCP connection log based on this information.
There are a few minor limitations to this procedure, which are unimportant in most cases. First, the initial sequence number is randomly picked as a 32-bit number (offering over 4 billion possible values), and in a given connection, it can roll past the maximum value and begin counting up again from zero. By way of analogy, a 1,000 mile automotive trip, with the odometer starting at 99,500 will cause the odometer to roll past 99,999 and reach 500 by the end of the trip. In this case, we would recognize that the figure 500 really means 100,500, and that the top digit had not been recorded. This is easily dealt with by inserting an implied digit at the head of the number.
To continue with this analogy, though, unless we know how many times the odometer has rolled over, we cannot know for sure the mileage of the vehicle. An odometer reading of 10,000 could indicate a fairly new vehicle with 10,000 miles, or a vehicle with 210,000 miles that is past time for retirement. Similarly, because the sequence number range has a count of about 4 billion, any transfer in excess of this amount cannot be computed accurately by simply comparing the starting and ending sequence numbers—we also need to know how many times the “odometer” has rolled over.
Despite these minor objections, the captured data is generally extremely useful, as data transfers over 4 gigabytes in length are fairly uncommon, except perhaps in the supercomputer arena.
Can tcpdump be used as an intrusion-detection tool? Although tcpdump is strictly a packet-capture and archiving utility, the BPF language is capable of enough pattern matching to allow us to match some simple attacks. The early intrusion-detection tool Shadow used tcpdump to capture network traffic and pass it to scripts to process for anomalies. In this section, we will look at simple intrusion-detection capabilities along those lines.
The SQL Slammer worm hit the entire Internet with a vengeance on Saturday, January 25 at 05:30 UTC, and within 30 minutes had infected an estimated 75,000 vulnerable systems—the fastest propagating worm to date. This worm exploited the Microsoft SQL Server service on UDP port 1434. It was the first successful worm to propagate via UDP.
Key to the rapid propagation of this worm was the fact that it was entirely self-contained in one 376-byte UDP packet. The payload of this packet exploited the vulnerability, replicated itself, and generated random IP addresses that the newly exploited system would, in turn, send attack packets to. As the exploit is UDP-based, no handshake was necessary to complete the exploit, unlike TCP, which requires the three-way handshake to establish a connection. Compromised systems literally sent traffic as fast as their Internet connection could handle, thus clogging the entire Internet.
A tcpdump filter could help us be good neighbors on the Internet by notifying us immediately if any of our systems become infected, so that we could immediately take corrective action. A filter that matches the traffic that a worm on our network would send out can be constructed in several ways, depending on the precision in detection that is desired. We could look at the packet contents and verify that the packet contains the actual Slammer byte codes, but in practice, any packet directed at UDP port 1434 with a data content of 376 bytes is highly likely to be the Slammer worm. Thus, we can construct a tcpdump filter like this:
udp[4:2] = 384 and dst port 1434 and src net mynet
This filter matches UDP packets with a payload size of 376 bytes (as explained in Chapter 2, the UDP header is 8 bytes long, so we add 8 to our desired 376 bytes, resulting in a packet 384 bytes long), a destination port of 1434, and a source IP within our home network.
By outputting matching packets through a simple filter that triggers on the first instance of a new source IP and that contacts our trusty sysadmin, we have created a detection mechanism for Unix that will alert us if any of our systems become infected with SQL Slammer:
#!/bin/sh # This program monitors for SQL slammer traffic and alerts the admin # if any local sources are detected (only once per IP) tcpdump -n -tt -i eth0 'udp[4:2] = 384 and dst port 1434 and src net mynet' | awk ' BEGIN { squote=sprintf("%c",39) } { split ($2,x,"."); ip=x[1] "." x[2] "." x[3] "." x[4]; if (! addr[ip++] ) { temp="mail -s " squote " Slammer Source " ip squote temp=temp " " squote "admin@mynet.com" squote temp=temp " /dev/null 2>&1 &" system(temp) } }'
Note that we do not need to use the -s option with tcpdump to capture more than the header in this case, since all of the data we’re interested in is in the IP and UDP headers. Also note that we are not asking tcpdump to convert the timestamps or resolve hostnames, as this will simply slow down processing. The awk program extracts the source IP address, and sends an e-mail with the subject “Slammer Source IP” to admin@mynet.com, but only on the first appearance of a given IP address, since a Slammer-infected system will easily send out many thousands of packets.
Tcpdump can even be used to create connection records for TCP connections, with starting and ending timestamps, source and destination IP addresses and ports, and the number of bytes inbound and outbound. Such records can be enormously valuable for forensic use.
Although we will not go into detail here, we could begin by taking the tcpdump expression we used earlier to match TCP packets containing the SYN, FIN, or RST flags (tcp[13] & 7 != 0). The text output could then be run through a filter (not shown here) that matches packets with matching source and destination addresses and ports and outputs a connection log.
tcpdump –n –tt –S –r trace.1 'tcp[13] & 7 != 0' | awk –f connection_log.awk
We have omitted the script that this tcpdump command feeds because this is an extremely inefficient method of producing connection logs—we are converting the data to ASCII format for processing, rather than directly examining the raw packets programmatically. For a robust, open source connection-logging package, consider using tcptrace, available at www.tcptrace.org.
Another interesting example is in the detection of Secure Shell (SSH) back doors. SSH is an encrypted replacement for older interactive network applications, such as telnet. SSH has a default port of 22, but hackers will often create an SSH encrypted connection on a nonstandard port so that they can enter the hacked system without detection. Tcpdump can be used to detect these hackers by detecting the distinctive SSH signature on a nonstandard port. Consider the following tcpdump expression:
tcp[((tcp[12] & 0xf0)>>2):4] = 0x5353482D
This filter matches any TCP packet whose first four bytes are 0x5353482D. By consulting an ASCII chart, we can see that 0x5353482D matches the string “SSH-”. SSH uses this preamble during its negotiation of options when the session is being established. This is not a very foolproof detection mechanism, since the four characters could be part of a normal data transfer (for example, in a paper describing the SSH protocol). However, in practice, the false positive rate has been quite low. Packets that match this expression can be run through a short filter, similar to the preceding examples, to alert the IDS analyst.
Note |
In these days of peer-to-peer file-sharing abuse, the signature for the gnutella file-sharing program might be a target for filtering. Gnutella traffic has as its identifier the string “GNUTELLA.” |
From such humble beginnings as these, the first IDSs were built. Although they are now far advanced from their origins, we can see the roots of many of the current crop of IDS products in these simple examples. All IDSs can examine tcpdump-format traffic and store packet output in the same format. Tcpdump, along with packet reassembly, connection-state tracking, and advanced heuristics for pattern matching to the tcpdump engine (as manifested in the libpcap library), forms the basis of most modern IDSs.
TCPDump files are useful for forensic examination of traffic in the event of a system compromise. In many cases, due to the sheer bulk of data that may be captured in these files, auxiliary programs can be useful to prune, or otherwise process these files. In this section, we introduce three such programs: tcpslice—which can create smaller, bite-sized portions of tcpdump files, tcpflow—which can provide output of the data portion of TCP sessions recorded in tcpdump files, and tcpjoin—which provides pasting of several tcpdump files into one.
At busy sites, tcpdump capture files can easily grow to many hundreds of megabytes, or even gigabytes. Extracting data of interest from these files can easily become quite time consuming. Since IDS systems will invariably give a timestamp for an event, tcpslice (another production of LBNL’s Network Research Group, available for download at ftp://ee.lbl.gov/) comes to the rescue by providing slices (hence the name) of tcpdump data by starting and ending timestamp.
Frequently, the item that triggers an IDS alert will be preceded by other information of interest to the IDS analyst. For instance, TCP port 1524 is registered for use by the ingreslock service, but in practice it is almost never used. Thus, some IDS systems will flag successful inbound connections to that port as suspicious. Quite often, this will be evidence of a successful intrusion into the system. However, more than likely, it wasn’t the ingreslock service itself that was compromised, but a vulnerable service running on another port, and the ingreslock port had simply been hijacked as a back-door method of access by the intruder.
In this scenario, it is critical to determine the actual service that was exploited so that appropriate action can be taken, not only on the compromised system, but on other potentially vulnerable systems in the organization. It is well-known that once a system has been hacked by exploiting a particular vulnerability, other systems in the institution may also be attacked in the same manner—the oversight that led to the first intrusion could indicate that additional systems are vulnerable in the same way.
Tcpslice lets us select a section of a capture file, perhaps from 10 minutes before the event to 10 minutes after, to reduce the amount of data that the IDS analyst needs to examine. Tcpslice takes a raw tcpdump file (created with the -w option) and the starting and ending times of the data to extract, and it creates another raw tcpdump file as output. The times can be specified in a variety of formats, including raw Unix format, and the more natural, human-readable timestamps.
Tcpslice is much faster than might be assumed, because it uses an intelligent algorithm to determine the slices to cut out of the trace file. It computes an assumed offset into the trace file by examining the first and last packet of the trace file, then refines the guess extremely rapidly. For instance, suppose the trace file contains data from 12 noon to 3 P.M., a period of three hours. If tcpslice were asked to retrieve data starting at 1 P.M., it would read the first packet and determine its timestamp (about 12 noon), and the last packet (3 P.M.), then estimate that the 1 P.M. data would be a third of the way into the file, and read the packet at this offset. If, due to variations in traffic patterns, this packet doesn’t match the timestamp, a revised estimate using the reduced range is used to quickly zero in on the starting point of interest. At that point, a sequential search and copy to the output file occurs until the ending timestamp is passed or the end of the trace file is reached. In practice, only a handful of probes into the file are generally necessary.
Despite tcpdump’s extensive capabilities, it is perhaps most useful for examining the types of network-level attacks we examined in Chapter 3. The logical streams that make up the attacks in the presentation and application layers that we looked at in Chapter 4 are crowded out with the network and transport layer data that tcpdump also displays. In these cases, we want to ignore such information and actually see the data bytes that were transferred via the virtual circuit.
Tcpflow is designed to do just that for TCP. Unfortunately, no similar open source package exists for UDP or ICMP traffic. Despite this limitation, tcpflow (available at www.circlemud.org/~jelson/software/tcpflow/) is an important tool for analyzing traffic, as it reconstructs the data streams of the TCP virtual circuit and stores each stream (or flow, in tcpflow parlance) in a separate file for IDS analysis. Tcpflow understands the TCP protocol and reconstructs the transmitted data, regardless of retransmissions and out-of- order delivery. However, tcpflow does not currently handle fragmented packets, a shortcoming that no doubt will be overcome in time. Additionally, as was noted in Chapter 3, there are ambiguities in the TCP specification that tcpflow does not attempt to disambiguate. Instead, it simply uses a default policy.
Tcpflow stores each data stream (flow) of the TCP conversation in a separate file, named with the source and destination IP addresses and ports. Each side of the conversation is stored in a separate file. For example, the telnet connection that we examined earlier (in the “Tcpdump Output Format” section of the chapter) would create two files named 192.168.003.003.03252-192.168.003.099.00023 and 192.168.003.099.00023-192.168.003.003 .03252. The first file would contain the traffic sent from the telnet client on the connecting host to the telnet service on the receiving host. The second file would contain the traffic sourced from the telnet server back to the telnet client. The data in the files is not interleaved, as we would expect in a typical interactive session, but rather, all the data for each side of the connection is placed sequentially in its file. In many cases, the sequence of events is fairly obvious, given a knowledge of the protocols involved, but in case of any questions, the packet timestamps in the tcpdump file can resolve any timing issues.
Tcpflow uses some of the same options as tcpdump. The man page gives the syntax as follows:
tcpflow [-chpsv] [-bmax_bytes] [-ddebug_level] [-fmax_fds] [-iiface] [-rfile] [expression]
The most useful options are the following:
We’ve now seen how tcpdump can be used to create a bulk data capture, how tcpslice can create slices from this bulk trace that contain data of interest, and how tcpflow can extract the data streams from the captured data. But suppose that the TCP connection we are interested in started in one bulk capture file but continued into another bulk capture file. Tcpflow cannot deal with input from two different files—to deal with this situation, we need a program that can paste multiple tcpdump files together.
Enter tcpjoin (www.algonet.se/~nitzer/tcpjoin/). Tcpjoin accepts two tcpdump files, pastes them together into a tcpdump file based on the packet timestamps, and sends the output file to the standard output, from whence it can be redirected to a file. (Tcpjoin also allows the use of a –w flag to specify an output file).
The steps to create a tcpflow of a connection that spans several bulk trace files is demonstrated by this example:
# extract the data we're interested in from the trace files into temporary files. tcpdump –s 1500 –r trace.1 –w tempfile.1 'host foo and host bar' tcpdump –s 1500 –r trace.2 –w tempfile.2 'host foo and host bar' # Paste the two files together tcpjoin tempfile.1 tempfile.2 >mergefile.1 # Now run tcpflow on the merged file tcpflow mergefile.1
In this case, tcpslice could have been used to paste the files together, as the timestamps of second file are after those of the first file. However, tcpjoin also merges files with interleaved timestamps. This can happen in high-traffic sites that devote two interfaces to capturing traffic, one inbound and the other outbound. In this case, to generate one complete tcpdump format file, the functionality of tcpjoin is necessary.
There is also another program, tcpdmerge, available on the Internet at various places (although its home page appears to have disappeared), which has similar features and functionality.
A working knowledge of the tcpdump family of tools is essential for the IDS analyst. These tools and others (see www.tcpdump.org/related.html for tcpdump friends and relatives) provide important analysis features that are used every day to foil wily hackers.
We’ve now explored the basics of networking, the types of exploits to expect, and some of the ancillary tools used for IDS applications. We’re now ready to embark on our exploration of specific IDS offerings.
Part I - Intrusion Detection: Primer
Part II - Architecture
Part III - Implementation and Deployment
Part IV - Security and IDS Management