Section 9.8. Evolution of the Internet


[Page 360 (continued)]

9.8. Evolution of the Internet

As local networks began to grow larger and to be connected together, an evolution began. First, some companies connected their own LANs together via private connections. Others transferred data across a network implemented on the public telephone network. Ultimately, the network research being funded by the U.S. Government brought it all together.

It may sound hard to believe, but what we know today as "the internet" was almost inevitable. Although it began in a computer lab, and at the time, most thought only high-powered computer scientists would ever use it, the way we stored and used information almost dictated that we find a better way to move information from one place to another.

Now when I watch television and see web page addresses at the end of commercials for mainstream products, I know the Internet has truly reached common usage. Not only do high-tech companies maintain web pages, even cereal companies have web sites. One may argue about the usefulness of some of these sites, but the fact that they exist tells us a great deal about how society has embraced the new technology.

It makes you wonder how we got here and where we might go with it all.

9.8.1. In The Beginningthe 1960s

In the 1960s, man was about to reach the moon, society was going through upheavals on several fronts, and technology was changing more rapidly than ever before. The Advanced Research Projects Agency (ARPA) of the Department of Defense (DoD) was attempting to develop a computer network to connect government computers (and some government contractors' computers) together. As with so many advances in our society, some of the motivation (and funding) came from a government that hoped to leverage an advance for military and/or defensive capability. High-speed data communication might help win a war at some point. Our Interstate Highway system (network) has its roots in much the same type of motivation.

In the 1960s, mainframe computers still dominated computing, and would for some time. Removable disk packs, small cartridge tapes, and compact disc technology were still in the future. Moving data from one of these mainframe computers to another usually required writing the data on a bulky tape device or some large disk device, physically carrying that medium to the other mainframe computer, and loading the data onto that computer. Although this was done, it was extremely inconvenient.

9.8.1.1. A Network Connection

Though computer networking was still in its infancy, local networks did exist, and were the inspiration for what would ultimately become the Internet. During 1968 and 1969, ARPA experimented with connections between a few government computers. The basic architecture was a 50-Kbps dedicated telephone circuit connected to a machine at each site called an Interface Message Processor (IMP). Conceptually, this is not unlike your personal Internet connection today, if you consider that your modem does the job of the IMP (of course, the IMP was a much more complex device). At each site, the IMP then connected to the computer or computers that needed to access the network.


[Page 361]
9.8.1.2. The ARPANET

The ARPANET was born in September of 1969 when the first four IMPs were installed at the University of Southern California, Stanford Research Institute, the University of California at Santa Barbara, and the University of Utah. All of these sites had significant numbers of ARPA contractors. The success of the initial experiments between these four sites generated a great deal of interest on the part of ARPA as well as the academic community. Computing would never be the same.

9.8.2. Standardizing the Internetthe 1970s

The problem with the first connections to the ARPANET was that each IMP was, to some degree, custom designed for each site, depending on the operating systems and network configurations of their computer. Much time and effort had been expended to get this network up to four sites. Hundreds of sites would require hundreds of times this much custom work if it were done in the same fashion.

It became clear that if all the computers connected to the network in the same way and used the same software protocols, they could all connect to each other more efficiently and with much less effort at each site. But at this time, different computer vendors supplied their own operating systems with their hardware, and there was very little in the way of standards to help them interact or cooperate. What was required was a set of standards that could be implemented in software on different systems to allow sharing data in a form that different computers could understand.

Although the genesis of standard networking protocols began in the 1970s, it would be 1983 before all members of the ARPANET used them exclusively.

9.8.2.1. The Internet Protocol family

In the early 1970s, researchers began to design the Internet Protocol. The word "internet" was used since it was more generic (at the time) than ARPANET, which referred to a specific network. The word "internet" referred to the generic internetworking of computers to allow them to communicate.

The Internet Protocol is the fundamental software mechanism that moves data from one place to another across a network. Data to be sent is divided into packets, which is the basic data unit used on a digital computer network. IP does not guarantee that any single packet will arrive at the other end or in what order the packets will arrive, but it does guarantee that if the packet arrives, it will arrive unchanged from the original. This may not seem very useful at first, but stay with me for a moment.

9.8.2.2. TCP/IP

Once you can transmit a packet to another computer and know that, if it arrives at all, it will be correct, other protocols can be added "on top of'' the basic IP to provide other functionality. The Transmission Control Protocol (TCP) is the most often used protocol along with IP (used together they are referred to as TCP/IP). As the name might imply, TCP controls the actual transmission of the stream of data packets. TCP adds sequencing and acknowledgement information to each packet, and each end of a TCP "conversation" cooperates to make sure the original data stream is reconstructed in the same order as the original. When a single packet fails to arrive at the other end due to some failure in the network, the receiving TCP software figures this out because the packet's sequence number is missing. It can contact the sender and have it send the packet again. Alternatively, the sender, having likely not received an acknowledgment for the packet in question, will eventually retransmit the packet on its own, assuming it was not received. If it was received and only the acknowledgement was lost, the receiving TCP software, upon receiving a second copy, will drop it, since it has already received the first one. The receiver will still send the acknowledgement the sending TCP software was waiting for.


[Page 362]

TCP is a connection-oriented protocol. An application program opens a TCP connection to another program on the other computer, and they send data back and forth to each other. When they have completed their work, they close down the connection. If one end (or a network break) closes the connection unexpectedly, this is considered an error by the other end.

9.8.2.3. UDP/IP

Another useful protocol that cooperates with IP is the User Datagram Protocol (sometimes semi-affectionately called the Unreliable Datagram Protocol). UDP provides a low-overhead method to deliver short messages over IP, but does not guarantee their arrival. On some occasions, an application needs to send status information to another application (such as a management agent sending status information to a network or systems management application), but the information is not of critical importance. If it does not arrive, it will be sent again later, unless it is unnecessary for each and every instance of the data to be received by the application. Of course, this assumes that any failure would be due to some transient condition and that "next time" the transmission will work. If it fails all the time, it would imply a network problem existed.

In a case like this, the overhead required to open and maintain a TCP connection is more work than is really necessary. You just want to send a short status message. You don't really care if the other end gets it (since if they don't, they probably will get the next one) and you certainly don't want to wait around for it to be acknowledged. So an unreliable protocol fills the bill nicely.

9.8.2.4. Internet Addressing

When an organization is setting up a LAN to be part of the Internet, it requests a unique Internet IP address from the Network Information Center (NIC). The number that is allocated depends on the size of the organization:

  • A huge organization, such as a country or very large corporation, is allocated a Class A addressa number that is the first 8 bits of a 32-bit IP address. The organization is then free to use the remaining 24 bits for labeling its local hosts. The NIC rarely allocates these Class A addresses, as each one uses up a lot of the total 32-bit number space.


  • [Page 363]
  • A medium-sized organization, such as a mid-size corporation, is allocated a Class B addressa number that is the first 16 bits of a 32-bit IP address. The organization can then use the remaining 16 bits to label its local hosts.

  • A small organization is allocated a Class C address, which is the first 24 bits of a 32-bit IP address.

For example, the University of Texas at Dallas is classified as a medium-sized organization, and its LAN was allocated the 16-bit number 33134. IP addresses are written as a series of four 8-bit numbers, with the most significant byte (8 bits) written first. All computers on the UT Dallas LAN therefore have an IP address of the form 129.110.XXX.YYY,[2] where XXX and YYY are numbers between 0 and 255.

[2] 2. 129 * 256 + 110 = 33134

9.8.2.5. Internet Applications

Once a family of protocols existed that allowed easy transmission of data to a remote network host, the next step was to provide application programs that took advantage of these protocols. The first applications to be used with TCP/IP were two programs that were already in wide use.

The telnet program was (and still is) used to connect to another computer on the network in order to login and use that computer from your local computer or terminal. This is quite useful for access to high-priced computing resources. Your organization might not have its own supercomputer, but you might have access to one at another site. Telnet lets you remotely login without having to travel to the other site.

The ftp program was used to transfer files back and forth. While ftp is still available today, most people use web browsers or network file systems to move data files from one computer to another.

9.8.3. Re-Architecting and Renaming the Internetthe 1980s

As more universities and government agencies began using the ARPANET, word of its usefulness spread. Soon corporations were getting connected. At first, because of the funding involved, a corporation had to have some kind of government contract in order to qualify. Over time, this requirement was enforced less and less.

With this growth came headaches. The smaller a network is, and the fewer the nodes that are connected, the easier it is to administer. As the network grows, the complexity of managing the whole thing grows as well.

It became clear that the growth rate that the ARPANET was experiencing would soon outgrow the Defense Department's ability to manage the network.

New hosts were being added at a rate that required modifications to the network host table on a daily basis. This required each ARPANET site to download new host tables every day, if they wished to have up-to-date tables.

In addition, the number of available hostnames was dwindling, since each hostname had to be unique across the entire network.


[Page 364]
9.8.3.1. Domain Name Service

Enter DNS, the Domain Name Service. DNS and BIND, the Berkeley Internet Name Daemon, proposed the hierarchy of domain naming of network hosts and the method for providing address information to anyone on the network as they requested it.

In the new system, top-level domain names were established, under which each network site could establish a subdomain. The DoD would manage the top-level domains and delegate management of each subdomain to the entity or organization that registered the domain. The DNS/BIND software provided the method for any network site to do a lookup of network address information for a particular host.

Let's look at a real-world example of how a hostname is resolved to an address. One of the most popular top-level domains is com, so we'll use that in our example. The DoD maintained the server for the com domain. All subdomains registered in the com domain were known to this server. When another network host needed an address for a hostname under the com domain, it queried the com name server.

If you attempted to make a connection to snoopy.hp.com, your machine would not know the IP address, because there was no information in your local host table for snoopy.hp.com. Your machine would contact the domain name server for the com domain to ask it for the address. That server knows only the address for the hp.com name server; it does not need to know everything under that domain. But since hp.com is registered with it, the com name server can query the hp.com name server for the address.[3] Once a name server that has authority for the hp.com domain is contacted, it returns an address for snoopy.hp.com to the requestor (or a message that the host does not exist).

[3] Two options are available in the protocols. The first is that the requesting machine may be redirected to a "more knowledgeable" host and may then make follow-up requests until it obtains the information it needs. The second is that the original machine may make a single request, and each subsequent machine that doesn't have the address can make the follow-up request of the more knowledgeable host on behalf of the original host. This is a configuration option in the domain resolution software and has no effect on how many requests are made or the efficiency of the requests.

Up to this point, every host name on the ARPANET was just a name, like utexas for the ARPANET host at the University of Texas. Under the new system, this machine would be renamed to be a member of the utexas.edu domain. However, this change could not be made everywhere overnight. So for a time, a default domain .arpa was established. By default, all hosts began to be known under this domain (i.e., utexas changed its name to utexas.arpa). Once a site had taken that single step it could more easily become a member of its "real" domain later, since the software implemented domain names.

Once the ARPANET community adopted this system, all kinds of problems were solved. Suddenly, a hostname had to be unique only within a subdomain. HP's having a machine called snoopy didn't mean someone at the University of Texas couldn't also use that name, since snoopy.hp.com and snoopy.utexas.edu were different names. Duplication of names had not been such a big problem when only mainframe computers were connected to the network, but we were quickly approaching the explosion of workstations, and it would have been a huge problem then. The other big advantage was that a single networkwide host table no longer had to be maintained and updated on a daily basis. Each site kept its own local host tables up-to-date, but would simply query the name server when an address for a host at another site was needed. By querying other name servers, you were guaranteed to receive the most up-to-date information.


[Page 365]

The top-level domains most often encountered are listed in Figure 9-21.

Figure 9-21. Common top-level domain names.

Name

Category

biz

business

com

commercial

edu

educational

gov

governmental

info

unrestricted (i.e., anything)

mil

military

net

network service provider

org

nonprofit organization

XX

two-letter country code


For example, the University of Texas at Dallas LAN has been allocated the name "utdallas.edu". Once an organization has obtained its unique IP address and domain name, it may use the rest of the IP number to assign addresses to the other hosts on the LAN.

You can see what addresses your local DNS server returns for specific hostnames with the host command, available on most Linux systems (Figure 9-22).

Figure 9-22. Description of the host command.

Utility: host [ hostname | IPaddress ]

The host command contacts the local Name Service and requests the IP address for a given hostname. It can also do a reverse lookup, where by specifying an IP address you receive the hostname for that address.


host is most useful for obtaining addresses of machines in your own network. Machines at other sites around the Internet are often behind firewalls, so the address you get back may not be usable directly. However, host is good for finding out if domain names or web servers (machines that would need to be outside the firewall for the public to access) within domains are valid. You might see the following type of output from host:


[Page 366]

$ host www.hp.com www.hp.com is a nickname for www.hpgtm.speedera.net www.hpgtm.speedera.net has address 192.151.52.187 www.hpgtm.speedera.net has address 192.6.234.8 $ host www.linux.org www.linux.org has address 198.182.196.56 $ _ 


host displays the current IP address(es) for the hostname we requested. When a hostname doesn't exist or the DNS server can't (or won't) provide the address, we'd see something like this:

$ host xyzzy Host not found. $ _ 


9.8.3.2. DoD Lets Go

Like a parent whose child has grown up and needs its independence, the Department of Defense reached a point where its child, the ARPANET, needed to move out of the house and be on its own. The DoD originally started the network as a research project, a proof of concept. The network became valuable, so the DoD continued to run it and manage it. But as membership grew, the management of this network took more and more resources and provided the DoD fewer and fewer payoffs as non-DoD-related entities got connected. It was time for the Department of Defense to get out of the network management business.

In the late 1980s, the National Science Foundation (NSF) began to build NSFNET. NSFNET took a unique approach, in that it was constructed as a "backbone" network to which other regional networks would connect. NSFNET was originally intended to link supercomputer centers.

Using the same types of equipment and protocols as those making up the ARPANET, NSFNET provided an alternative medium with much freer and easier access than the government-run ARPANET. To most except the programmers and managers involved, the ARPANET appears to have mutated into the Internet of today. In reality, connections to NSFNET (and their regional networks) were created and ARPANET connections were severed, but because of the sharing of naming conventions and appearances, the change was much less obvious to the casual user.

The end result was a network that worked (from the user's point of view) the same as the ARPANET had, but that, as it grew, was made up of many more corporations and nongovernment agencies. More importantly, this new network was not funded by government money; it was surviving on private funding from those using it.

9.8.4. The Webthe 1990s

The 1990s saw the Internet come into popular use. Although it had grown consistently since its inception, it still belonged predominately to computer users and programmers. Two things happened to spring the Internet on an unsuspecting public: the continued growth of personal computers in the home and one amazingly good idea.


[Page 367]
9.8.4.1. The "killer app"

Again, timing played a role in the history of the Internet. The network itself was growing and being used by millions of people but was still not considered mainstream. The more sophisticated home users were getting connected to the Internet via a connection to their employer's network or a subscription with a company that provided access to the Internet. These companies came to be known as ISPs, Internet Service Providers. In the early 1990s, only a handful of these existed, as only a few people recognized there was a business in providing Internet access to anyone who wanted it.

Then came Mosaic. Mosaic was the first "browser" and was conceived by software designers at the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign. With Mosaic, a computer user could access information from other sites on the Internet without having to use the complicated and nonintuitive tools that were popular at the time (e.g., telnet, ftp).

Mosaic was (and browsers in general are) an application that displays a page of information both textually and graphically, as described by a page description language called HyperText Markup Language (HTML). The most revolutionary aspect of HTML was that of a hyperlink, a way to link information in one place in a document to other information in another part of the document (or more generically, in another document).

By designing a page with HTML, you could display information and include links to other parts of the page or to other pages at other sites that contained related information. This created a document which could be "navigated" to allow users to obtain the specific information in which they were interested rather than having to read or search the document in a sequential fashion, as was typical at the time.

Almost overnight, servers sprang up across the Internet to provide information that could be viewed by Mosaic. Now, rather than maintaining an anonymous FTP site, a person or organization could maintain publicly accessible information in a much more presentable format. In accessing anonymous FTP sites, the users usually had to know what they were trying to find, or at best, get the README file which would help them find what they wanted. With a server that provided HTML, the users could simply point-and-click and be taken to the page containing the information they sought.

Of course, not all this magic happened automatically. Each site that maintained any information for external users had to set up a server and format the information. But this was not significantly more work than providing the information via anonymous FTP. Early on, as people switched from providing information via FTP-based tools to using web-based tools, the two alternatives were comparable in terms of the amount of effort required to make data available. As sites have become more sophisticated, the work required has increased, but the payoff in presentation has also increased.


[Page 368]

Some of the people involved in the early releases of Mosaic later formed Netscape Communications, Inc., where they applied the lessons they had learned from early browser development and produced Netscape, the next generation in browsers. Since then, led by Netscape and Microsoft Internet Explorer, browsers have evolved to be very sophisticated applications, introducing significant advances to both browsing and publishing every year.

9.8.4.2. The Web Versus the Internet

The word "web" means many different things to different people in different contexts and causes much confusion. Before Mosaic and other browsers, there was just the Internet. As we have already seen, this is simply a worldwide network of computers. This in itself can be diagramed as a web of network connections. But this is not what the word "web" means here.

When Mosaic, using HTML, provided the capability to jump around from one place to another on the Internet, yet another conceptual "web" emerged. Not only is my computer connected to several others, forming a web, but now my HTML document is also connected to several others (by hyperlinks), creating a virtual spider web of information. This is the "web" that gave rise to the terms "web pages" and "web browsing," and is commonly referred to as "The World Wide Web."

When someone talks about "the web" today, they may mean the Internet itself or they may mean the web of information available on the Internet. Although not originally intended this way, the nomenclature "the web" and "the Internet" are often used interchangeably today. However, in proper usage, "the web" refers to the information that is available from the infrastructure of "the Internet."

9.8.4.3. Accessibility

A few ISPs had sprung up even as "the web" was coming into existence. Once the concept of "the web" gained visibility, it seemed that suddenly everyone wanted to get on the Internet. While electronic mail was always usable and remains one of the most talked-about services provided by access to the Internet, web browsing had the visibility and the public relations appeal to win over the general public.

All of a sudden, average people saw useful (or at least fun) things they could get from being connected to the Internet. It was no longer the sole domain of computer geeks. For better or worse, the Internet would change rapidly. More people, more information, and more demand caused great growth in usage and availability. Of course, with more people come more inexperienced people and more congestion. Popularity is always a double-edged sword.

Another factor boosting the general public's access to the Internet has been the geometric increase in modem speeds. While large companies have direct connections to the Internet, most individuals have dial-up connections over home phone lines requiring modems. When the top modem speed was 2400 bps (bytes per second), which wasn't all that long ago, downloading a web page would have been intolerably slow. As modem speeds have increased and high-speed digital lines have become economical for home use, it has become much more reasonable to have more than just a terminal connection via a dial-up connection.


[Page 369]

Most of these private connections can be had for between $10 and $60 per month, depending on speed and usage, which has also played a part in attracting the general public. A bill for Internet service that is comparable with a cable bill or phone bill is tolerable. The general public likely would not accept a bill that was an order of magnitude higher than other utility bills.

9.8.4.4. Changes in the Internet

As the public has played a larger and larger part in the evolution of the Internet, some of the original spirit has changed.

The Internet was originally developed "just to prove it could be done." The original spirit of the Internet, especially in its ARPANET days, was that information and software should be free to others with similar interests and objectives. Much of the original code that ran the Internet (the TCP/ IP protocol suite and tools such as ftp and telnet, etc.) were given away by the original authors and modified by others who contributed their changes back to the original authors for "the greater good."

This was probably what allowed the Internet to grow and thrive in its youth. Today, however, business is conducted over the Internet, and much of the data is accessible for a fee. This is not to say everybody is out to do nothing but make money or that making money is bad. But it represents a significant change in the culture of the Internet.

The Internet needed its "free spirit" origins, but now that mainstream society is using the Internet, it is only natural that it would become more economically oriented. Advertising on web sites is common, and some web sites require each user to pay a subscription fee in order to be able to "login" to gain access to information. Commerce over the Internet (such as online ordering of goods and services, including online information) is expected to continue to grow long into the future.

9.8.4.5. Security

Entire books have been written about Internet security (e.g., [Cheswick, 1994]). In the future, as more commercial activity takes place across the Internet, the needs and concerns about the security of operations across the Internet will only increase.

In general, a single transfer of data is responsible for its own security. In other words, if you are making a purchase, the vendor will probably use secure protocols to acquire purchase information from you (like credit card information).

Four major risks confront an Internet web server or surfer: information copying, information modification, impersonation, and denial of service. Encryption services can prevent copying or modifying information. User education can help minimize impersonation.

The most feared (and ironically, the least often occurring) risk is the copying of information that travels across the network. The Internet is a public network, and therefore information that is sent "in the clear" (not encrypted) can, in theory, be copied by someone between the sender and the recipient. In reality, since information is divided into packets that may or may not travel the same route to their destination, it is often impractical to try to eavesdrop in order to obtain useful information.


[Page 370]

Modification of information that is in transit poses the same problem as eavesdropping with the additional problem of making the modification. While not impossible, it is a very difficult task and usually not worth the effort.

Impersonation of a user, either through a login interface or an e-mail message, is probably the most common type of security breach. Users often do not safeguard their passwords. Once another person knows their username and password, he or she can login and have all the same rights and privileges as the legitimate user. Unfortunately, it is easy to send an e-mail message with forged headers to make it appear the message came from another user. Close examination can usually authenticate the header, but this can still lead to confusion, especially if an inexperienced user receives the message. One might also impersonate another network host by claiming to use the same network address. This is known as spoofing. Spoofing is not a trivial exercise, but an experienced network programmer or administrator can pull it off.

A denial-of-service attack occurs when an outside attacker sends a huge amount of information to a server to overload its capability to do its job. The server gets so bogged down that it either becomes unusable or it completely crashes.

9.8.4.6. Copyright

One of the biggest challenges in the development of information exchange on the Internet is that of copyright. In traditional print media, time is required to reproduce information, and proof of that reproduction will exist. In other words, if I reprint someone else's text without their permission, the copy I create will prove the action. On the Internet, information can be reproduced literally at the speed of light. In the amount of time it takes to copy a file, a copyright can be violated, leaving very little evidence of the action.

9.8.4.7. Censorship

In any environment where information can be distributed, there will be those who want to limit who can gain access to what information. If the information is mine and I want to limit your access to it, this is called my right to privacy. If the information is someone else's and I want to limit your access to it, this is called censorship.

This is not to say that censorship is bad. As with so much in our society, the idea alone is not the problem but rather the interpretation of the idea. Censorship on the Internet is, to put it mildly, a complex issue. Governments and organizations may try to limit certain kinds of access to certain kinds of materials (often with the best of intentions). The problem is that, since the Internet is a worldwide resource, local laws have very little jurisdiction over the whole of the Internet. How can a law in Nashville be applied to a web server in Sydney? Even if they decide the web server is doing something illegal, who will prosecute?

9.8.4.8. Misinformation

As much of a problem as copyrighted or offensive material may be, much more trouble is caused by information that is simply incorrect. Since there is no information authority that approves and validates information put on the net, anyone can publish anything. This is a great thing for free speech. But humans tend to believe information they see in print. I've heard innumerable stories about people acting on information they found on the web that turned out to be misleading or wrong. How much credence would you give to a rumor you were told by someone you didn't know? That's how much you should give to information you pick up off the web when you aren't sure of the source.


[Page 371]
9.8.4.9. "Acceptable Use"

Many ISPs have an Acceptable Use policy you must adhere to in order to use their service. Over time, this may well solve many of the problems the Internet has had in its formative years. Most of these policies basically ask users to behave themselves and refrain from doing anything illegal or abusive to other users. This includes sending harassing e-mail, copying files that don't belong to you, and so on.

There is a perceived anonymity[4] of users of the Internet. If you send me an e-mail message that I disagree with, it may be difficult for me to walk over to you and yell at you personally. I might have to settle for YELLING AT YOU IN E-MAIL.[5] Because of this, people tend to behave in ways they would not in person. As the Internet and its users grow up, this problem should lessen.

[4] I say "perceived" here because it is actually possible to find most people if you're willing to do enough work. Even people who have filtered threatening e-mail through "anonymous e-mail" services have been found by law enforcement. ISPs will cooperate with the authorities when arrest warrants are involved!

[5] Text in all caps is typically interpreted as equivalent to speaking the words in a loud voice. This does not apply to those few users who still use computers or terminals that can only generate uppercase characters.




Linux for Programmers and Users
Linux for Programmers and Users
ISBN: 0131857487
EAN: 2147483647
Year: 2007
Pages: 339

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net