Section 6.1. About the Internet | Computer Security Basics

6.1. About the Internet

The Internet may have been invented by former Vice President Al Gore, but it has since taken on a life of its own.^[*]When many people think of the Internet, the first thing that comes to mind is often the World Wide Web; but the Web is only part of the story, if a highly visible one. This works to the advantage of the attacker, who crafts exploits based on less familiar parts of the Web in order to shut down the parts more readily seen.

^[*] A flurry of press clippings to the contrary, Gore appears never to have made the claim that he invented the Internet, except perhaps to point out that he promoted technical progress during his terms in the House of Representatives and as the Vice President. It is all a ruse, and a well-executed one. There was even circulated a bogus RFC, RFC 3000, in which Gore purportedly detailed his involvement. This was published before the RFC counter got that high. The real RFC 3000 is way less entertaining.

In truth, the Internet is composed of many different connection schemes called protocols, all of which transmit over a common system of packetized communication called Transmission Control Protocol/Internet Protocol (TCP/IP). Among these are the following:

File Transfer Protocol (FTP): The File Transfer Protocol allows rapid, reliable transfer of data files between repositories, called FTP servers, and between computers with FTP client software installed, called FTP clients.
Hypertext Transport Protocol (HTTP): The Hypertext Transport Protocol allows users to access pages of text that are marked up using a special format called the Hypertext Markup Language (HTML). HTML tags are inserted into a web document to indicate the desired font, color, and position of text, and it facilitates linking to different different web sites, files, or pages. This allows an author to create the displayed page one time and to have it display more or less the same on any platform.
Simple Mail Transfer Protocol (SMTP): The SMTP service allows a standardized method of electronic mail transmission.
Domain Name Service (DNS): The Domain Name Service resolves the easy-to-read names familiar to Internet users, such as www.oreilly.com, to the Internet Protocol addresses that actually guide information around the network, such as 172.16.32.15.
Dynamic Host Configuration Protocol (DHCP): When requested, DHCP automatically provides an Internet Protocol (IP) address, such as 172.16.32. 15, to a computer on a local area network. An IP address is required to communicate with other network devices that exist beyond the immediate proximity of the computer requesting the address.

Each of these is very useful to the Internet as we know it. But each is subject to attacks that can cause no end of problems. This chapter will describe several of these protocols, what they do, how they can be subverted, and provide some insight into what can be done about it.

6.1.1. History of Data and Voice Communications

When you call a friend on the phone, you imagine that the telephone company forms a direct connection between you and your friend. You don't care if the connection is always there, or if the phone company just whips it up to order, provided it is there when you need it. This method of connection is called circuit-switched. Circuit switching solved a ton of problems in the early days of the phone network because it saved the telcos from having to run a wire from each subscriber to every other subscriber. Instead, a single wire could go from a subscriber to a central office (CO), from which the call could be switched down a trunk to the CO serving the recipient, and from there to the desired recipient.

When you use a modem to connect to school, office, or to a dial-up Internet provider, you are using circuit switching. Provided you are willing to wait out the time required to initiate the call, and are willing to take the risk that sometimes there will be more callers on the network than there are circuits (as happens when you hear the rapid busy signal), you pay for the connection only when you use it.

The opposite of circuit switching is to use a dedicated circuit. In a dedicated circuit, the wire you use is yours alone. You use it anytime you want, with no competition, and you pay for it whether you use it or not.

Dedicated and switched circuits were the basis of all wired and wireless communications from the beginning of electronic communications (about 1876) until a time just after World War II, when network supervisors decided to do something about a lingering problem with both systems. The problem was one of continuous underutilization, and the solution was a system of forced sharing between many users, called multiplexing. In a voice conversation, there are pauses (at least for the majority of people). During these intervals of silence, the two callers need not be connected. The line can do something else while two people are saying nothing to each other, as long as the parties are hooked up again before either of them has the next thing to say.

Over the years, multiplexing has taken many forms. Sometimes the contents of the line were shifted in frequency, so that one conversation theoretically could not be heard by the other. This system was called frequency division multiplexing (FDM). Other systems seized the line and rotated it between a number of conversations, returning it to you before you noticed it was gone. This was called time division multiplexing (TDM). A few systems capitalized on the nature of the phone systems themselves to create extra, phantom circuits by borrowing one of the two lines of your circuit and one of the two lines of a neighbor's, and allowing other users to conduct conversations over them.

All of these systems increased the capability and capacity of systems, but they did little to overcome another key drawback: how to handle retransmission of missed or scrambled message segments, or missed or garbled speech? If a message was mangled in transmission, it was necessary to determine that fact, then to notify the sender, and await retransmission. All of this took time and resources. And while it was all getting sorted out and resent, current messages often had to wait.

6.1.2. Packets, Addresses, and Ports

To increase the reliability of communications, a new paradigm developed called packet switching. In a packet-switched network, messages are chopped up into chunks of uniform length, called packets. A packet-switched network gives each packet an individual address label and then shoots it out onto the network, trusting that each packet will eventually make it to its destination, although the packets of a single message may travel by different routes. The packet address includes mechanisms to insure that all packets are accounted for and assembled in the proper order at the other end of the link. (Broadly speaking, this is called User Datagram Protocol, or UDP.) Other mechanisms provide the capability to recognize which packets have been corrupted or delayed, and to facilitate retransmission of replacements. This is generally called Transport Control Protocol (TCP). TCP and UDP together work with the IP addressing system to facilitate all the services that the Internet provides to users. Additional protocols and services are also at work behind the scenes to keep the network smoothly running.

So how do packets help make networks and the Internet reliable? First, a packet travels over the circuit quickly. If it goes missing, its replacement can be retransmitted without taking a long time. When dealing with larger messages, the entire message has to traverse the circuit before it can be checked for errors in transmission. If an error is detected, the entire message must be transmitted again instead of just the errant packet or packets. Second, because it is understood that packets may take one of several possible routes to their destination, there is a possibility that packets may actually spend part of their journey traveling in parallel, rather than waiting constantly for the packets in front of them to move along. This parallel movement may increase overall transmission speed.

Packet transmission revolutionized communications, first remaking data transmission networks, then reforming the methods used for voice traffic, and finally, video. Packets have been with us since the early ARPANET, the precursor to today's Internet. They are likely to be with us through the foreseeable future. In fact, the successor to packet communications, asynchronous transfer mode (ATM), the system on which most of today's heavy communications backbones are built, refines packets further. Instead of using packets of roughly even length, ATM uses cells that are precisely 53 bits long.

There is much more that can be said about digital communications and packets, but we'll go into only two more concepts: addresses and ports.

An Internet Protocol address describes a location on the network. Technically speaking, a network address, or an IP address in the case of the Internet, is a logical address. Users don't often need to concern themselves with this logical, network address because the network has services that correlate such addresses with more friendly names, such as those that end in .com or .org. Beneath this layer of friendly conversion, however, logical addresses hold sway. The use of logical address allow the network to route packets to the correct part of the network, without concerning itself where in the world that location is. In fact, it is somewhat difficult to know by inspection where a packet actually goes based on its IP address alone, a concept that lends the Internet its "no-borders" characteristic.

IP addresses generally take the form of four numbers, separated by periods, in which each number is between 1 and 255. For instance, the IP address, 192.168.32.12 leads you to a destination on your network that's administrator-designated. Between the action of the conversion layer and the protocols that work with the IP address, the packets that compose the messages get delivered.

Just as there is a conversion layer between the common name and the IP address, there is a conversion system that connects the IP address to a specific network device. This system is called the Address Resolution Protocol (ARP), and the address of each device is called the physical address, or in network speak, the machine access code (MAC) address. Think of the MAC address as you would the vehicle identification number (VIN) that uniquely identifies every automobile that is manufactured, and the IP or logical address as the license plate, or the identification that is required to operate the vehicle on public roadways.

There is one more component to addresses, and in some ways it is the most important of all, at least from a security perspective. It is important that packets are identified by function, that is, that their addresses contain some clue as to what they are intended to do. This allows them to be switched to the correct location by inspection (like reading the license plate) without having to open them up and examine their contents. For this reason, it is not uncommon for packets to have the same IP address, and to use the port number to state the packet function, so that the packet will be routed properly once it makes it to the device. This is similar to the various destinations used for things that come into a house. It is traditional, for instance, for mail to be delivered to the mail box or slot, for the newspaper to be placed or thrown onto the porch, and for water and gas to come into the house through the appropriate pipes. (To awaken and find natural gas coming out of the kitchen faucet, water pouring out of the oven, the mail in the flower bed, and the paper through the front room window is generally a sign that it will be a bad day.) In the physical world, getting these delivery functions right is the job of the respective couriers and plumbers. In the network world, it is the job of the port number.

Each IP address comes with roughly 65,000 port numbers, which can be thought of as cubbyholes, or individual message slots in a hotel lobby. Different types of network traffic use different ports. This is how the network keeps things straight, or knows to deliver mail in one place and newspapers in another.

Why do you need to understand this multitier system of addressing? Simple: most network attacks in some way involve falsely manipulating or replacing the IP address, MAC address, or port. In fact, one of the most important tools used today for network safety, the firewall, is based almost entirely on recognizing suspicious or invalid combinations of addresses and ports.