Introduction to Internet Names | IP Addressing Fundamentals

There was a time when names were somewhat arbitrary (if not outright capricious) and less-than-universal in their use. In the very beginning of the Internet, there were only hundreds of computers, and each one could be accessed using a mnemonic that was locally created by network administrators. This led to the inevitable: one destination computer that was known by a myriad of names that varied by location. Although there was merit in having such mnemonics, the administrators agreed that consistency of nomenclature would be even better. What was originally a convenience for a small cadre of network and host administrators evolved into a system that became the single most enabling mechanism that opened up the benefits of the Internet to nontechnical users.

This original system was simply a loose agreement among network/host administrators on a set of mnemonic names that correlated to numeric IP addresses. This agreement formalized an existing practice. These administrators had already realized that their user communities either couldn't or wouldn't use numeric addresses to access specific hosts. Thus, virtually all of them had deployed a scripted function that allowed their users to use mnemonic names for host access. Their scripts checked those names against a small but ever-growing list of known host addresses and provided an automatic translation between names and numbers.

Because this function was highly localized, there wasn't much consistency from network to network with respect to host naming conventions. After all, this was a localized translation function; the names used didn't need to be the names used by those remote hosts' administrators. They could be somewhat arbitrarily selected. For example, a name could be the local user's nickname instead of the host's official name. Needless to say, this had the potential to create great confusion. More importantly, given the highly decentralized nature of maintaining such local lists, the probability of disseminating information about moves, adds, deletes, or other changes quickly and evenly throughout the ARPANET community was slim to none. Thus, you could reasonably expect connectivity problems caused by this attempt at simplifying host access for users. Life is full of delicious irony!

hosts.txt

Fortunately, these sage administrators communicated with each other and often published documents requesting comments (ARPANET's RFCs) from other administrators. Together, they realized that it was critical to standardize on a format for naming hosts as well as a set of names for known hosts. To ensure future scalability, they further agreed to have this list managed centrally to ensure that it remained as up-to-date and evenly distributed as possible. This mechanism became known as the hosts.txt file. The Stanford Research Institute maintained the hosts.txt file via its Network Information Center (NIC) and transmitted to known hosts using the file transfer protocol (FTP). RFCs 952 and 953 spelled out the details of this update mechanism and procedure in October 1985.

NOTE

A vestige of the original hosts.txt file and the mind-set that led to its creation remain in evidence even today. Most operating systems support the creation of a file that correlates IP addresses with mnemonics that are of local significance only. For example, UNIX systems contain an etc/hosts file to support this function.

The idea was simple enough: maintain a list of all known hosts, as well as certain other data that would be useful in deciphering the list. This list would be nothing more than a flat text file that was pushed to all administrators in the network on a regular basis. Updating local lists was up to each administrator. Failure to do so meant that their users did not have access to the most current information about known hosts.

Although it might sound foolish to try and track all known hosts in a single flat file, you must remember that this dates back to RFC 606, published in December 1973. The ARPANET was using IP, but the IPv4 address scheme had yet to be devised. The address space was still only 8 bits long. Mathematically, there couldn't be any more than 255 hosts.

Problems with Flatness

The hosts.txt file approach worked well for a couple of years. However, several inherent problems threatened the usefulness of this mechanism as the Internet continued to grow:

Collision of host names Locally defined host names create the possibility of a single character string being used to identify two or more different end systems. This is known as name collision, and it can result in an inconsistent translation of names to numeric addresses.
A limited number of mnemonically useful names A finite number of useful and meaningful character strings can be used to name hosts. Imagine, for example, that only one host in the world could have the name klingon. Absent a hierarchical naming system, the first Trekkie who gave his or her computer that name would prevent anyone else in the world from using it.
Timeliness and uniformity of implementing updates The "official list" couldn't be updated in real time, so a batch-oriented approach was required. In other words, changes, deletes, and additions to the hosts.txt file would be batched and then sent out periodically. Depending on how frequently the updated list was sent out, a host could be online but still inaccessible. An additional time lag could be experienced if network or host administrators did not promptly process the newly received hosts.txt file.
The lack of a name dispute resolution mechanism and authority Some mechanism is needed to reconcile cases in which two or more people select the same name for their box. In the days of the hosts.txt file, there was no way to resolve such disputes aside from embracing a first-come, first-served philosophy. But even that approach wasn't perfect, because updates were done in a batched manner.

Generally speaking, these problems were rooted in the flatness of the address space. Flatness in this particular case means the absence of a hierarchical structure. That by itself limited the number of meaningful names that could be assigned to hosts. Each name could be used only once throughout the worldat least the parts of the world that interconnected via the Internet! Although the NIC was responsible for tracking hosts on the Internet, it had no authority to resolve disputes over names. Thus, a chronic problem became the collision of names. In a flat environment, host names had to be unique.

You could cope with this lack of hierarchy by assigning pseudo-random strings of letters to substitute for names. That could help ensure uniqueness for a very long time, but it utterly defeats the intent behind having a mnemonic name that users can understand and remember better than pseudo-random strings of numbers.

Aside from the limited number of names that could be both useful and usable, the notion of sending out a flat file periodically to update distributed tables meant that there would be a gap between when a host came online and when distributed users would know it existed.

The largest problem with the hosts.txt approach was its inability to scale upward. As more hosts on more networks came online, the challenge of keeping them up-to-date in the file grew immeasurably. Not only were there more hosts to keep track of, each of them also had to receive the hosts.txt file. After a while, just updating the file of known hosts became almost a full-time job.

NOTE

The person whose job it was to maintain the hosts file was informally known as the hostmaster. This function exists today even though the hosts.txt file has long since slipped into obsolescence. Today's hostmasters assign IP addresses to endpoints and manage IP address space, among other duties.

By September 1981, with a mere 400+ hosts on the ARPANET, it had become painfully obvious that there had to be a better solution. A series of RFCs began emanating from the technical constituents of ARPANET calling for a hierarchical approach to Internet names. Although each examined the same problem from a different perspective, the prevailing theme remained the need for a hierarchical namespace. Over time, these disparate RFCs coalesced into a distributed, hierarchical system that became known as the Domain Name System (DNS).