As a consequence of the Internet’s explosion of popularity, most networking protocols that competed with IP, the Internet Protocol, have been relegated to niche status, or have been made to work with IP (such as NBT, which is NetBIOS running under TCP, the Transmission Control Protocol). The IP family of protocols has been designed to provide a range of services, from low-level networking functions that touch the hardware, through data routing, reliability, and scaling capabilities, to application-level transparency in a layered approach.
As this book is focused on intrusion detection, we will, in later chapters, examine the security implications of both the lower-level communication protocols and the applications that depend on them. For now, it is important to take note of the trust relationships between the various components. Unfortunately IPv4, the current version of the Internet Protocol in use today, was designed with scant attention to security. Many of the mechanisms implicitly trust the information that they receive from others, permitting the possibility of subversion by malicious parties. Depending on the protocol involved, misleading information may be supplied to trusting hosts, which could allow for intrusion into those hosts.
Conceptually, the various functions that network hardware and software must perform can be understood as a series of layers of functionality, with each layer built on and depending on the proper functioning of the layers below it. Each additional layer brings greater functionality and a higher level of abstraction. This layered approach also gives applications a great deal of independence, because they do not need to be concerned with implementation details.
The Open Systems Interconnection (OSI) reference model describes such a framework for understanding these layers. It was originally developed to guide the implementation of the OSI network suite (of which some implementations have been developed), but due to the overwhelming success of TCP/IP, its main use currently is as an educational tool.
The OSI reference model is a conceptual model that provides a framework for specifying and identifying the various network functions. There are seven layers within the OSI model that serve to differentiate the various hardware and software functions that a network provides. Each layer depends on the proper functioning of the layer immediately below it to provide its raw functionality, which is enhanced and then passed to the next higher layer. Status messages may be communicated up or down the various layers, although each layer only communicates with its immediate neighbors. As each layer is solely dependent on the layer below it for lower-level services, higher layers are shielded from system, hardware, and software implementation details. This leads to independence from specific systems and interoperability with many vendors’ offerings.
The OSI model is very useful for developing and understanding a “big picture” view of network processes because it provides this independence. However, the model does not claim to exactly match any specific network technology. Each layer must encapsulate the data it receives into a standard format for the next higher layer, thus incurring an overhead, and in the name of efficiency, the lines between one or more layers can be blurred. Some layers may not specifically have counterparts in an actual network implementation. However, as a tool for understanding the considerations involved in networking, the OSI model is unparalleled.
The OSI model’s seven layers, from the physical hardware level up to the actual network application that users interact with are as follows: the physical layer, the data-link layer, the network layer, the transport layer, the session layer, the presentation layer, and the application layer. It is a rare individual indeed who has expertise in all of the layers.
The physical layer consists of the physical wiring that is used to connect the different systems on a network. To ensure interoperability between various vendor implementations, strict standards must be employed to ensure compatibility. These standards not only describe the electrical standards of the network cabling, but the physical jacks, connectors, taps, and so on, all of which must be physically compatible with each other. At this lowest level, a failure is catastrophic for network communications. An Ethernet adapter will conform to physical and electrical standards of the physical layer.
The data-link layer consists of the transmission standards that are used to transmit data over the physical layer. These typically will consist of the bit-level specifications and waveforms of the transmission standard. Note that this layer does not specify the actual voltages used, but the characteristics of the waveforms. If this layer detects a problem with the physical layer (usually identified by a partial or total failure to propagate the signal), it must attempt to retransmit the information or notify the network layer. At this layer, a network adapter will generate the waveforms appropriate to the physical media. On an Ethernet network, this layer will package the data into Ethernet frames, which will then be delivered to its destination via the physical layer.
The network layer is responsible for the addressing, packaging, and delivery of data. It will format the data as appropriate for the data-link layer to deliver it to the physical layer. Typically, the network layer does not provide reliability mechanisms, such as error checking, but leaves this task to the transport layer. The partitioning of these two functions has proven to be useful, as some forms of network traffic don’t have the same need for reliability as others. For our purposes, IP uses this layer.
The transport layer provides a mechanism for reliably transporting the data from its source to its destination. Built on top of the network layer, it provides the reliability that many network services require by using such strategies as checksumming packets and requesting retransmission if errors are detected.
Some network services may not avail themselves of all this functionality in the interest of efficiency. For instance, streaming audio can often get by at a somewhat degraded level with an occasional frame being dropped. If retransmission were expected of this traffic, a noticeable delay might be experienced. On the other hand, e-mail transmissions do not require the same timeliness of delivery, and can tolerate moderate delays in the interest of reliability. The reliability characteristics of TCP fall into this layer.
The session layer is responsible for establishing communication sessions between various higher-level communicating programs, processes, or users. This layer creates a “virtual circuit” that communicating processes on network-enabled systems employ to transfer information. On a network with many systems, the data is multiplexed on the wire, but this layer creates the illusion of a dedicated circuit between the endpoints.
The presentation layer provides a consistent interface to application programs that are using network services, and it is often termed an API (application programming interface). All programs using a particular API can be assured of a consistent programming interface. The session layer is thus not burdened with the responsibility of interpreting or formatting the data, but can simply act to manage the session.
One commonly used network-based API that resides at this layer is the X protocol, originally developed by MIT, which provides a consistent interface to application programs that use its services to manage graphical interfaces for Unix hosts.
The application layer represents the high-level, abstracted network protocols that are directly used by application programs. Protocols such as HTTP, SMTP, FTP, and POP operate at this layer. The application layer is not concerned with what the application program itself does with the data—it simply provides the data to the application for processing and delivers generated traffic to the lower levels. The processing, because it does not interact with the network, is not in the scope of the OSI reference model.
Although the OSI model is useful for understanding and describing the network functions that apply during communication, the IP suite of protocols does not conform to the model described by OSI. It was developed independently of OSI, and the IP designers used the simpler conceptual model shown in Figure 2-1. Generally, TCP/IP networking is split into four categories:
Figure 2-1: The OSI reference model compared with TCP/IP
IP (the Internet Protocol) provides a basic framework for the transport of traffic from source to destination on the Internet. (See RFC 1180 for a TCP/IP tutorial; www.faqs.org/ rfcs/rfc1180.html). By design, it functions as an encapsulation (wrapper) and transport mechanism for this traffic. There is a header checksum to validate that the packet header, but not necessarily the data, has not been corrupted in transit. However, IP provides no facilities for retransmission or error correction.
IP is responsible for the routing and delivery of packets on the Internet. If a packet is lost in transmission, as, for instance, during a period of congested activity, IP will not, of itself, trigger a retransmission. Instead, it is up to the higher-level protocol to detect that the packet has not been delivered and to take corrective action. IP will also be the transport mechanism for that corrective action, so it is possible that the first corrective action taken may, in turn, not be delivered. The higher-level protocols built on top of IP are expected to implement the necessary error-correction measures for that eventuality. TCP, for instance, has mechanisms to trigger multiple retransmissions. Only after repeated failures to communicate does TCP decide that the transport layer (IP) is fatally broken and, in this case, it provides a notification to the application requesting the communication.
In the layered approach we have been discussing, each layer is embedded (or encapsulated) in a wrapper from the next lower-level protocol. On an Ethernet network, for instance, the actual wire protocol consists of Ethernet frames, which are addressed using 48-bit hardware addresses for the source and destination Ethernet adapters, and this includes a cyclical redundancy check (CRC) code to ensure reliability. Embedded in the data portion of this frame will be an IP packet, which encodes its source and destination using 32-bit IP addresses, and it also contains a checksum for reliability. Embedded as the data portion of the IP packet is a protocol-specific packet. In the case of TCP or UDP, this will contain source and destination ports, along with yet another checksum. The data portion of this packet will consist of the application or presentation layer data. Thus, each successive layer’s message is treated as data by the layer below it, which provides an extensible framework (see Figure 2-2).
Figure 2-2: Encapsulation and data flow in an IP network
Although having multiple checksums may seem redundant, the design actually allows each layer to detect faults emanating from the layer before it, and to take appropriate action. If the implementation employed only one check code, the traffic would need to travel completely from source to destination before any transmission faults could be detected. Using multiple checksums allows for detection of faults on the local Ethernet using the Ethernet CRC, faults in clean packet transmission between connected networks via the IP checksum, and faults in delivery to the final destination by the protocol-specific checksum. Corrective action can thus be taken at the point where the fault occurred. The redundancy contributes to the reliability and efficiency of the Internet, as well as assisting in failure detection.
The standard IP header is defined in RFC 791 and is shown in Figure 2-3. It consists of a minimum of 20 bytes and ranges up to a maximum of 60 bytes. Embedded in the data portion of the IP packet is the protocol-specific packet (such as a TCP or UDP packet), as discussed earlier.
Figure 2-3: Internet datagram header
These are the header fields:
It was understood by the developers of the IP protocol suite that a sending host might have little or no idea of the characteristics of the physical network through which traffic may be routed, and thus could not adjust packet sizes to fit the requirements of that network. Also, as traffic routing is an adaptive, dynamic process, a packet size appropriate for a known network may not be appropriate for an alternative routing pathway. If the preferred pathway goes down for any reason, routing protocols will attempt to develop alternative pathways, and they may not have the same Maximum Transmission Unit (MTU—the maximum packet size that the media will support).
These considerations drove the development of a packet fragmentation and reassembly process. The decision to fragment a packet is made by a router when the MTU of the next hop is smaller than the packet size. The packet can be flagged to disallow fragmentation, in which case an ICMP error message (“fragmentation needed, but Do Not Fragment bit set”) is sent back to the originating host. Otherwise, the original packet will be split into two or more packets containing the fragments and the regenerated IP header with changes made to the appropriate fields and a recalculated checksum.
Fragmentation could take place several times, as one router may split a packet to match its MTU, then pass the fragments on to another router that may have an even smaller MTU, thus necessitating another fragmentation of the previously fragmented packet. Thus, the receiving host is the most reasonable place to reassemble the packet, although IDSs often perform reassembly as well, in order to examine packet contents.
Three fields in the IP header are used to support the fragmentation of packets:
For an unfragmented packet, the fragmentation offset will be set to 0, and the More Fragments bit will be cleared, indicating that the packet is complete.
The receiving system will collect the fragments, identified by the IP ID field, until the last fragment is received, which is signified by a 0 in the More Fragments bit. If there are no holes in the completed packet, the packet is ready for further processing. If a fragment was not received, there will be a hole in the buffer, and the system will have to wait until it is received. If a timeout occurs before IP has received every fragment, the received buffer will be discarded, and depending on the upper-level protocol, the entire packet may be retransmitted, with the possibility of fragmentation occurring again.
RFC 1122 recommends that the reassembly time be between 60 and 120 seconds, and that an ICMP time-exceeded error be sent to the source host if the timer expires and the first fragment of the datagram has been received. This ICMP message contains the first 64 bits of fragment 0 (or less if the fragment is less than 8 bytes long).
Fragmentation solves the problems of varying frame sizes between communicating hosts at a performance penalty, but, as we shall see in Chapter 3, security problems can result from fragments. Fragments have been used to evade firewalls and intrusion-detection systems. Wouldn’t it be nice for systems to be able to determine the maximum size packets that could be transmitted on a link, and thus avoid the overhead of using fragmentation? There is, in fact, a process that determines the MTU between two hosts so that (unless the route changes) packets can be sent without resorting to fragmentation. It’s called Path MTU discovery.
As was mentioned in the previous section, it would be desirable for systems to determine the maximum MTU they could use in communications to avoid the overhead of fragmentation. According to RFC 791, all devices talking to IP must support a minimum MTU of 68 bytes, so fragmentation can be avoided by transmitting IP packets of 68 bytes which allows for an IP header of up to 60 bytes, and a fragment size of 8 bytes. Unfortunately, most useful packets will not fit into 68 bytes, so they may need to be fragmented anyway.
However, it is possible, by the use of some trickery, for a host to determine the maximum MTU that a link between two systems will support, as follows:
Of course, this mechanism is not completely reliable. ICMP traffic is dependent on the best-effort delivery resources of IP. Although packet corruption is minimized by the use of a checksum, the message could be sent but dropped somewhere in the network, and the sending host would assume that the Path MTU is larger than it really is. Also, since routing is dynamic, the path could change, and the MTU might increase or decrease as a result. Some sites, for example, may have an emergency low-performance link to the Internet, for use when the main link fails. If this emergency link uses a different type of medium than the normal link, it likely will also have a different MTU. If the MTU is lower, traffic using the link will likely be fragmented, exacerbating the performance problems. Thus, it is important that ICMP traffic not be discarded at a network perimeter to allow these sorts of network issues to be signaled to hosts or internal routers for action.
More information on the process of Path MTU discovery can be found in RFC 1191 (www.faqs.org/rfcs/rfc1191.html).
TCP, the Transmission Control Protocol, can rightly lay claim to being the crown jewel of the IP protocol suite—it is by far the most widely used protocol, as well as the one that is responsible for carrying the majority of the Internet’s useful content. TCP adds to the best-effort delivery capabilities of IP.
TCP, being embedded within an IP packet, paradoxically must employ the unreliable delivery mechanisms of IP to ensure reliability. Of course, perfect reliability is not possible in the real world, due to hardware, routing, and software failures, but TCP nevertheless achieves a high degree of reliability by employing three distinct, cooperating processes:
Retransmissions may occur multiple times with varying timeouts before TCP decides that the IP layer is hopelessly broken—at that point, it will signal an error to the application.
Each TCP connection is uniquely identified by four distinct items (a four-tuple): the IP addresses of the two communicating systems and the TCP ports used by each system. This does not mean that two systems cannot communicate using the same service on more than one connection—multiuser systems support multiple telnet connections from the same client. In such a case, the IP addresses will be the same, and the telnet service will be found on the standard TCP port 23. However, the client system will use a different port for each connection, thus keeping the traffic for each connection distinct.
Under normal conditions, a listening process must be running on the receiving host to accept and respond to TCP connection requests. If a TCP packet is received that is destined for a port with no listeners, a TCP RST packet will be sent back to the source by the receiving host.
TCP uses 16-bit port numbers, which means that there are 65,536 possible ports. These ports are normally divided into two distinct ranges: 0 through 1,023 represent the well-known services that are (on Unix systems) only accessible by the root account. Ports 1,024 through 65,535 are termed ephemeral ports, which user programs can access and use to provide services or use as client ports for establishing connections.
To reliably maintain a connection between two systems requires a well-defined process for session establishment, maintenance, and teardown. This process is described in the following summary of steps, and it is illustrated in Figure 2-4:
Figure 2-4: Creating and tearing down a TCP connection
The standard TCP header is defined in RFC 791 and illustrated in Figure 2-5. It consists of a minimum of 20 bytes and a maximum of 60 bytes. The application-specific information, which is delivered to the application program, is located in the data portion of the TCP packet.
Figure 2-5: TCP header
The TCP header consists of the following fields:
Note |
The pseudo-header used to compute UDP and TCP checksums includes the source and destination IP addresses, as well as the protocol specific header. At the packet’s final destination, the checksum is recomputed using the source and destination addresses obtained from the header of the IP packet which transported the protocol-specific packet. If the checksums agree, then we can have a high degree of confidence that the packet reached the intended destination host, as well as the correct protocol- specific port. |
UDP, the User Datagram Protocol, is often used by applications that prefer to avoid the overhead of establishing a TCP connection (such as DNS, NFS, TFTP), or those that can tolerate occasional errors in the interest of efficiency (such as streaming audio or video). UDP is given the Internet protocol number of 17 and is defined in RFC 768 (www.faqs.org/ rfcs/rfc768.html).
The UDP model is much simpler than that of TCP. There is no session-level error checking or retransmission built into the protocol. The packets do, however, contain a checksum. A receiving host can verify this checksum to ensure that the packet has not been corrupted during transit. The sending host, however, does not receive a protocol-level acknowledgment that the packet was delivered. If any such reliability is needed, it is left to be implemented at the application level.
An example of an application-level, reliable protocol built on UDP is the Trivial File Transfer Protocol, or TFTP (see RFC 1350, www.faqs.org/rfcs/rfc1350.html), which consists of server and client implementations. These two processes exchange crafted UDP packets that contain handshaking information along with the data being transferred. The application programs must handle this handshaking themselves, as well as extracting the data. Contrast this with TCP data transfers, where the application receives only the data bytes and needn’t concern itself with the details of the data transfer.
UDP uses 16-bit port numbers, as does TCP, but UDP and TCP ports are distinctly different. As with TCP, there generally needs to be a listening process at the receiving host to accept and respond to the request. Under normal circumstances, the arrival of a UDP packet destined for a port with no listeners will cause the receiving host to respond with an ICMP “port unreachable” message.
The UDP header (shown in Figure 2-6) consists of four 16-bit fields, totaling 8 bytes in length. These fields follow immediately after the IP header:
Figure 2-6: UDP header
ICMP, the Internet Control Message Protocol, is the signaling mechanism used in IP networks to communicate error conditions and other control information about network conditions. As IP is a best-effort delivery protocol, failure to deliver a packet to its destination is not considered a network-level error. Protocols or applications need additional mechanisms to ensure reliability. For instance, as was mentioned earlier, TCP employs an acknowledgment protocol, along with retransmission of lost packets after a suitable time.
ICMP is given the protocol identifier of 1 in the standard IP packet, and it is documented in RFC 792 (www.faqs.org/rfcs/rfc792.html). All conforming implementations of IP must include ICMP, as it is integral to signaling error conditions on the network.
ICMP messages are of interest both to end-hosts and intermediate routers, although some messages are generally only sent by routers. It is never permissible for an ICMP error message to be generated as the result of receiving an ICMP error message—this avoids the infinite recursion of ICMP message generation (see RFC 1122, www.faqs.org/rfcs/ rfc1122.html). It is also forbidden to send an ICMP message as the result of a datagram that references multiple hosts, such as a broadcast or multicast message, or upon receipt of a noninitial fragment (see the “IP Fragmentation” section earlier in the chapter). These restrictions are designed to prevent broadcast storms.
Broadly, there are two classes of ICMP messages: ICMP error messages and ICMP query/response messages. Each ICMP error message will be constructed from the Internet header, and (at least) the first eight bytes of the IP packet payload (which generally is data from the header of the lower-level protocol, such as TCP or UDP).
The ICMP packet format is illustrated in Figure 2-7. Note that it includes a checksum for reliability.
Figure 2-7: ICMP packet format
These are the fields in the ICMP packet:
IP addresses of 32 bits (or the upcoming 128-bit IPv6 standard) are logical addresses only. This means that the network adapter itself has no preconceived notion of what its IP address is, but rather is assigned an address by a hardware mechanism. This is an important distinction, since it would be difficult to replace a failed network adapter, or to change the IP address of an existing adapter, unless the hardware address were decoupled from the logical address.
Ethernet hardware addresses (called MAC addresses), by way of example, are 48 bits in length and are theoretically unique throughout the world. Vendors are assigned generous blocks of addresses out of this space so that each individual network adapter will have a unique address. To send a packet to another Ethernet card requires that the sender know the MAC address of the target system. At the IP level, though, the only information that the sender has is the IP address, which is not, as was mentioned, tied to a specific Ethernet card. Thus, a mechanism is needed to provide a mapping from the IP address to the hardware address of the card. ARP provides such a mechanism.
Note |
Although MAC addresses are supposed to be unique, in practice, some off-brand vendors have been known to use address space not allocated to them, or to randomly address cards. In the unlikely event that two network cards on the same subnet are utilizing the same MAC address, neither system will likely be able to successfully communicate with others. |
When a system wishes to send a packet to a system whose hardware address is unknown, it will send out a network broadcast message that, in effect, asks, “Who, on this network, has this IP address?” The system that is using that IP address will respond with a message saying, “I have this address, and here’s my hardware address.” In order to increase the efficiency of the network, each system keeps a table of recently used hardware and IP addresses in memory, called the ARP cache. Typically, these recent addresses will expire after 20 minutes, in which case another ARP request will be made upon the next access to the system.
Diskless workstations suffer from an opposite problem—they know their hardware address but do not know their IP address. Reverse Address Resolution Protocol (RARP) is used to broadcast the request, and a RARP server will send a message back indicating the host’s IP address. However, RARP has been mostly superceded by more advanced protocols, such as the Bootstrap Protocol (BOOTP) and Dynamic Host Configuration Protocol (DHCP).
ARP was designed with the flexibility to handle different media with varying hardware address lengths, so length fields are included to allow differing field sizes. Figure 2-8 illustrates the ARP packet format.
Figure 2-8: ARP packet format
The ARP packet contains the following fields:
When an ARP or RARP packet is sent requesting information, the unknown fields are set to 0 by the sender and are filled in by the responding host.
Except when two communicating systems are part of the same subnet, they will not have any direct way to communicate with each other. Rather, they must forward their traffic through a router, which will forward the traffic on their behalf. The router, in turn, if not on the same subnet as the ultimate destination, will forward the traffic to another router, until the ultimate destination is reached. Each hop a packet takes results in the packet being subtly changed before it is forwarded:
Eventually, through this forwarding process, a packet will reach a router that has knowledge of the ultimate destination, and that will forward the packet to its destination. Of course, it is also possible that the destination system doesn’t exist, in which case the packet will be dropped and an ICMP “destination unreachable” error packet will be sent back to the originator.
Let’s examine, in a little more detail, what happens to a packet as it traverses from host A to host B through a few routers. Suppose an application program on host A wants to initiate a TCP communication to an application on host B. We will assume that host A already knows the IP address of host B. (If this is not the case, the sequence of events described in the section on DNS applies.) The following are the steps that will be taken:
From this simple (!) example of transferring one packet between two hosts, it can be seen that Internet communication is a complex process that requires all cooperating hosts to adhere to defined standards. As Chapter 3 will show, though, there is enough wiggle- room in some of these areas for malicious activities to take place.
DNS, the Domain Name System, is a worldwide distributed database whose most important function is to translate from the human-readable system names that we are all familiar with (such as www.osborne.com), into the simpler (but more rigid) 32-bit IP addresses. One benefit this database provides is that it is easy for web sites to migrate to different hosting companies, because only the DNS records need to be changed to reflect the changed IP address—all references to the host name will follow. As we shall see in later chapters, however, if the DNS records are modified or forged, an attacker can redirect traffic to an entirely different host than the expected legitimate host.
DNS was designed in 1984 to solve an escalating problem with the host-name-to-IP- address mapping system. Previously, all hosts needed to maintain a table (a file called hosts) with periodic updates. As the Internet grew, this host table became unwieldy and unmaintainable. DNS solves this problem by delegating the name service information to the owners of the domain, who maintain a table only of their own systems or subdomains.
DNS uses the concept of domain name space, which can be represented as an inverted tree, as shown in Figure 2-9. Each node on the tree represents a domain, and everything below a node is part of that domain, until the final leaf node is reached, which represents an individually named system. For example, in Figure 2-9 the system gene is part of the bar domain, which in turn is part of the .com domain, which along with the other “top-level” domains, are all subdomains of the Internet root domain.
Figure 2-9: Domain name space
To resolve a name to its IP address, a host will examine its host table (which still exists in most systems in a legacy form), and if the name is not found, the host will forward the request to its name server. (The order of this search is configurable on some systems). The name server, if it has recent knowledge of the name in question (all DNS records time out to ensure that the data isn’t stale) will immediately respond with the IP address. If the name server is unfamiliar with the domain name, it will ask a server higher up the tree, which in turn will continue going up the tree until an answer is received. Under some circumstances, name resolution can take many seconds, thus appearing to the user as if the system has frozen. The recursive nature of the queries is one reason for these potential delays. Also, since DNS typically communicates over UDP (on port 53), it is possible that the packets could be lost, so multiple attempts are often made to resolve a name.
We’ve explored the protocols and processes that drive the modern Internet: IP, the transport mechanism to deliver the traffic; TCP for the establishment of virtual circuits for two-way communication; UDP for lighter weight transport of data, without the overhead of creating a connection; and ICMP for transporting error or status conditions between hosts. We’ve also seen some of the infrastructure mechanisms, and examined how all of these mechanisms tie together to provide reliable delivery of data throughout the Internet. We’ve also discussed the complexity of the network infrastructure. It is a monument to the designers of the Internet that it functions so well. Complex systems, however, are often subject to damage or abuse by malicious parties.
In Chapters 3, 4, and 5, we will build on the foundations presented here, and delve into low-level network abuses, as well as specific application protocol abuses. We will also examine common programming errors that allow for attacks targeting network-aware programs. Get ready! The fun’s just beginning…
Part I - Intrusion Detection: Primer
Part II - Architecture
Part III - Implementation and Deployment
Part IV - Security and IDS Management