Section 9.3. Internetworking | Linux for Programmers and Users

[Page 338 (continued)]

9.3. Internetworking

In order for a collection of LANs and WANs to be able route information amongst themselves, they must agree upon a networkwide addressing and routing scheme. This large-scale interconnection of different networks is known as internetworking. Any group of two or more networks connected together across administrative boundaries may properly be called "an internet." However, the largest and best-known such network has become known as "the Internet."

[Page 339]

Universities, large corporations, government offices, and military sites all have computers that are part of the Internet, which are generally linked together by high-speed data links. The largest of these computer systems are joined together to form what is known as the backbone of the Internet. Other smaller establishments link their LANs to the backbone via gateways.

[Page 340]

9.3.1. Packet Switching

Today's digital computer networks are packet-switched networks. When one node on the network sends a message to another node, the message is split up into small packets, each of which can be routed independently (switched) through the network.

These packets contain special information that allows them to be recombined at the destination. They also contain information for routing purposes, including the address of the source and destination nodes. The combined set of protocols is called the Transmission Control Protocol and Internet Protocol (TCP/IP) protocol suite. Linux interprocess communication (IPC) uses TCP/IP to allow Linux processes on different machines to talk to each other.

9.3.2. IP Addresses

Hosts on the Internet, as well as many private internets, also use TCP/IP to send data. While it is most popularly implemented on Ethernet networks, TCP/IP can also be used on other types of media. This makes it useful for connecting different types of networks, because not all computers are connected by Ethernet. For example, some LANs may use the IBM Token Ring system. The IP addressing system therefore uses a hardware-independent labeling scheme; the bridges, routers, and gateways transmit messages based purely on their destination IP address. The IP address is mapped to a physical hardware address only when the message reaches the destination host's LAN. Thus the computer sending the message does not need to understand hardware-specific information of the computer where the message is to be sent.

The IP addressing mechanism works the same whether or not you actually connect your computers to the Internet. When it sets up a LAN that is to be part of the Internet, an organization must get a unique address range assigned to its computers, a process we will see later.

9.3.2.1. IP (IPv4) Addresses

The most common version of the Internet Protocol (IP) in use today is still version 4. In it, an IP address is a 32-bit value that is written as 4 dot-separated numbers, each number representing 8 of the 32 bits of the address. Because each part represents an 8-bit value, the maximum value it may have is 255. Here's how his form of the IP address looks:

192.127.63.141

However, due to the explosive growth of the Internet, the seemingly endless supply of 32-bit addresses is quickly being used up. The day will come when this version of IP will no longer allow enough Internet addresses to satisfy the demand. Even when it does, local networks may continue to use IPv4 internally, as will the examples in this chapter after the next section.

9.3.2.2. IPv6 Addresses

In the early 1990s, it became clear that a new generation of IP that allowed for many more addresses would be necessary. Work began to define IPng (IP next generation), and a formal proposal for version 6 of the Internet Protocol was released in 1995.

IPv6 specifies 128-bit addresses. Although the two protocols use addresses of different lengths, both protocols can be used on the same network. This is necessary because the Internet is far too large to coordinate a "cut-over" to a new protocol at any moment in time. A smooth transition to a new addressing scheme requires the ability to evolve to it gradually rather than to require that we all wake up one day using the new protocol.

[Page 341]

IP packets (of both versions) specify a version in the first 4 bits of the packet. Therefore, a computer that "speaks" IPv6 can still recognize and handle an IPv4 packet (if it is configured for both protocols). This allows the two protocols to coexist on the same network. The older machines can be upgraded to IPv6 as implementations become available or as the system administrators have the opportunity to upgrade, without requiring it all to happen simultaneously.

IPv6 addresses are expressed in hexadecimal format (rather than the decimal values used in IPv4), requiring two hex digits for each 8 bits, and are delimited every 16 bits by a colon (rather than a period). An IPv6 address looks like this:

C07F:3F8D:F11B:5810:014D:2208:BFFD:1B3D

In practice, many IP addresses have 8-bit or even 16-bit portions that are zero, and IPv6 also allows for dropping leading zero values as well as eliminating contiguous 16-bit values of zero. So you can actually wind up with much shorter addresses!

In addition to the addressing changes, IPv6 also provides improvements in routing and automatic configuration. While IPv6 is not currently in wide use, vendors are implementing and testing the new protocol. Over the next few years, IPv6 will be deployed across the Internet. If all goes well, people will not even notice. For more information on IPv6, visit the web site at:

http://www.ipv6.org

9.3.3. Naming

These numeric addresses are not very convenient for humans to use to access remote computers. Humans are much more used to naming things (people, pets, and cars). So we have taken to naming our computers as well.

When a hostname is assigned to a particular computer, a correlation can be established between its name and its numeric IP address. This way, a user can type the computer's name to reference it, and the software can translate this name to an IP address automatically.

The mapping of IP addresses to local host names is kept by the LAN's system administrator in a file called "/etc/hosts". To show you what this looks like, here's a small section of the file from UT Dallas:

129.110.41.1     manmax03 129.110.42.1     csservr2 129.110.43.2     ncube01 129.110.43.128   vanguard 129.110.43.129   jupiter 129.110.66.8     neocortex 129.110.102.10   corvette

[Page 342]

9.3.4. Routing

The Internet Protocol performs two kinds of routing: static and dynamic. Static routing information is kept in the file "/etc/route" and is of the form: "You may get to the destination DEST via the gateway GATE with X hops." When a router has to forward a message, it can use the information in this file to determine the best route. Dynamic routing information is shared between hosts via the "/etc/routed" or "/etc/gated" daemons.^[1] These programs constantly update their local routing tables based on information gleaned from network traffic, and periodically share their information with other neighboring daemons.

^[1] A daemon is a fancy term for a constantly running background process that is normally started when the system is booted.

9.3.5. Security

It has long been known that the only way to keep any computer secure is to put it in a locked room and not connect it to a network. For most applications, however, this is not practical. The network is not only one of the most useful additions to computing, but also one of the most dangerous. The network provides a path for data to enter and leave the system, but makes no judgment about the use of the data. Therefore, it is up to the users or managers of the system to make sure "not just anyone" can gain access to the system or the data being transferred to or from the system.

9.3.5.1. User Authentication

Authentication of a user is the process of establishing that the user is who he or she claims to be. The most common user authentication mechanism is logging in with a username and password. When you access a remote machine across a network, you generally must re-authenticate in order to gain access. Another method may be to have a set of systems allow access from any of the other systems in the group by assuming you had to authenticate yourself to (log in on) the first system.

Several of the Linux networking utilities that I discuss later in this chapter allow a user with accounts on several machines to execute a command on one of these machines from another. For example, I have an account on both the "csservr2" and "vanguard" machines at UT Dallas. To execute the date command on the vanguard machine from the csservr2 machine I can use the rsh utility (discussed later in this chapter) as follows:

$ rsh vanguard date    ...execute date on vanguard.

The interesting thing about rsh and a few other utilities is that they are able to obtain a shell on the remote host without requiring a password. They can do this because of a Linux facility called machine equivalence. If you create a file called ".rhosts" in your home directory that contains a list of host names, then any user with the same username as your own may log into your account from these hosts without supplying a password. Both my "csservr2" and "vanguard" home directories contain a file ".rhosts" that includes the following lines:

csservr2.utdallas.edu vanguard.utdallas.edu

[Page 343]

We must use the "official" hostname in the ".rhosts" file which includes the Internet domain name (discussed later in this chapter).

This allows me to execute remote commands from either computer without any hassle. Linux also allows a system administrator to list globally equivalent machines in the file "/etc/hosts.equiv". Global equivalence means that any user on the listed machines can log into the local host without a password. For example, if the "vanguard" "/etc/hosts.equiv" file contained the lines:

csservr2.utdallas.edu vanguard.utdallas.edu

then any user on "csservr2" could log into the "vanguard" or execute a remote command on it without a password. Global equivalence should be used with great care (if ever).

9.3.5.2. Data Encryption

Even when a user can provide authentication information to a remote system, another problem is posed by a third-party eavesdropping on the network connection and gaining access to the username and password the user provides to login on the remote system. Most login information, and certainly command and input data, is sent in packets across the network. This is often referred to as sending the data "in the clear" or as "clear text."

Depending on the type and extent of the network in question, this may or may not be an issue. But consider the wireless network in your local coffee shop, where your e-mail client connects to your ISP's e-mail server and passes your username and password in clear text to check your mail. Anyone on that network could conceivably copy that data and log in as you.

For this reason, many of the common network data-transfer commands now also come in a secure version that encrypts all data being sent to or received from a remote host. This is accomplished through the use of the Open Secure Shell (OpenSSH). OpenSSH is based on the Open Secure Socket Layer (OpenSSL), originally developed by Netscape Communications, to provide secure web connections so that sensitive information like credit card numbers could be sent across the Internet without fear of copying. Now the code that started the electronic commerce revolution can be used to access your own data on a remote host to keep it from prying eyes. For more information on OpenSSH and OpenSSL, visit their web sites:

http://www.openssh.org
http://www.openssl.org

9.3.6. Ports and Common Services

When one network host talks to another, it does so via a set of numbered ports. Every host supports some standard ports for common uses and allows application programs to create other ports for transient communication. The file "/etc/services" contains a list of the standard ports. Here's a snippet from the UT Dallas file:

echo        7/tcp discard     9/tcp             sink null 
[Page 344]systat      11/tcp            users daytime     13/tcp ftp-data    20/tcp ftp         21/tcp telnet      23/tcp smtp        25/tcp            mail time        37/tcp            timeserver rlp         39/udp            resource whois       43/tcp finger      79/tcp sunrpc      111/tcp exec        512/tcp login       513/tcp

The description of the telnet utility later in this chapter provides some examples where I connected to some of these standard ports.

9.3.7. Network Programming

The Linux interprocess communication allows you to communicate with other programs at a known IP address and port. This facility is described in Chapter 12, "Systems Programming."