History of the Web and Web Standards | UNIX: The Complete Reference, Second Edition (Complete Reference Series)

Before you start constructing web pages, you should know something about the history and background of HTML, especially about the HTML standards that have been promulgated so that the web pages that form the World Wide Web can be reliably accessed by most popular web browsers.

The seeds of the web go back to the work of Ted Nelson in the 1960s. Ted coined the term “hypertext” for “non-sequential writing” or text that is not constrained to be linear. Hypermedia is a term used for hypertext that is not constrained to be text. That is, it can include graphics, video, and sound, all of which are encompassed by the web today

In the late 1980s, Bill Atkinson, a programmer working for Apple Computer, Inc., developed Hypercard for the Macintosh, which enabled users to construct a series of on-screen “filing cards” that contained textual and graphical information. Users could navigate these cards by pressing on-screen buttons, taking themselves on a tour of the information in the process. Hypercard and its imitators made documentation easier to navigate. However, these packages had the limitation that hypertext jumps could only be made to files on the same computer. Jumps made to files stored on computers on a local network, much less on the other side of the world, were out of the question. A system involving hypertext links on a global scale had not been conceived yet.

The Early Web

Several Internet services existed for information retrieval prior to the advent of the web, including FTP, WAIS, and Gopher. Each of these services had a distinct user interface. Although each interface was satisfactory by itself, the combination of several dissimilar interfaces created complexity for users. The problems increased if a service was not used frequently enough so that the operational details had to be relearned at each use.

In 1989, Tim Berners-Lee invented a prototype system based on hypertext that would eventually evolve into the web. At the time, he was working in a computing services section of CERN, the European Laboratory for Particle Physics in Geneva, Switzerland. The original idea was to enable particle physics researchers from remote sites around the world to organize and pool information. But Tim wanted to take the repository of information files a step further by employing the hypertext concept of allowing cross-reference links to be created in the text of the files. Scientific and mathematical documentation could be represented as a “web” of information held in electronic form on computers across the world.

To try to make global hypertext links feasible, Berners-Lee saw the need for an approach to implement these hyperlinks that was simpler and more cross-platform than the thenexisting hypertext applications. He demonstrated a basic but attractive way of publishing text using client-server software he developed himself, and also using a simple protocol that he developed-HTTP-for jumping to other documents via hypertext links. (For more information on HTTP, see Chapter 10.) The text-markup language that he used to create this demonstration “web” of documents was called HTML.

Berners-Lee’s HTML was based on SGML (Standard Generalized Mark-up Language), an international standard method for marking up text into structural units such as paragraphs, headings, and list items. SGML could be implemented on any machine. The idea was to make the language independent from the formatter (the browser or other viewing software) that displayed the formatted text on the screen. SGML does not include hypertext links; support for local as well as remote hypertext links was purely Berners-Lee’s invention, as was the now-familiar “www.name.name” convention for addressing machines on the web.

From the beginning (Ca. 1991), Berners-Lee took the important step of openly discussing his ideas online across the Internet (mostly through e-mail lists). In 1992, researchers from the National Center for Supercomputer Applications (NCSA) of the University of Illinois at Champaign-Urbana joined the HTTP-HTML discussion; in 1994, the NCSA would release Mosaic, the first web browser that included some of the basic web features we are familiar with.

In May 1994, when the World Wide Web had caught the imagination of academics but not businessmen, the first World Wide Web conference was held in Geneva, Switzerland. At this conference, a draft HTML 2 standard was first introduced and the importance of the fledgling web’s operating with a proper HTML specification was discussed. HTML+ was unveiled, and it was agreed that the work on HTML+ would be carried forward to the development of a proposed HTML 3 standard. Features of HTML+ included text flow around a figure with captions, resizable tables, image backgrounds, math symbols, and other features.

The draft HTML 2 standard was circulated through the Internet community for comment in 1994. The ideas of early HTML enthusiasts and early web browser developers were incorporated into the draft HTML 2 standard. And in July 1994, a Document Type Definition for HTML 2, a precise description of the language was released.

In September 1994, the Internet Engineering Task Force (IETF)-the international standards and development body of the Internet-set up an HTML working group. In November 1995, HTML 2.0 was officially published as an IETF standard.

Also in 1994, a former member of the Mosaic project at NCSA named Marc Andreessen and a tech entrepreneur named James Clark formed what would become the Netscape Communications Corporation. The Netscape Navigator web browser would soon become the first truly usable and most widely used web browser of the early web. Netscape began a trend of the dominant web browser owner “extending” or ignoring existing HTML standards because of their near monopoly on the end-user’s experience of the web through their web browser. With Microsoft’s Internet Explorer having long surpassed Netscape as the dominant web browser, the monopolistic tendency to flout HTML standards continues to be somewhat of an obstacle to the widespread acceptance of web standards.

Out of concern that the fledgling web would fragment as different web server and browser vendors pushed their own proprietary web “standards,” the World Wide Web Consortium (W3C), headed up by Tim Berners-Lee, was formed in late 1994 with the common goal to push the development of open standards by which the web could continue to grow and reach its full potential. From its inception, the W3C has sought and continues to seek to build industry consensus on web standards, often not an easy task.

In December 1995, the IETF HTML working group was dismantled, since it was having difficulties coming to consensus quickly enough to deal with the fast-evolving HTML standard. In February 1996, the World Wide Web Consortium formed the HTML Editorial Review Board (ERB). The ERB included representatives from IBM, Microsoft, Netscape, Novell, Softquad, and the W3C. The ERB’s aim was to collaborate and agree upon a common standard for HTML at a time when competing web browsers each implemented a different subset of the language. The ERB would later become the HTML Working Group.

The Dynamic Web

Early web publishers consisted mainly of academic and government institutions, and their web pages usually described their work and their organizations. It wasn’t long before businesses realized the opportunities offered by the web, and commercial sites began to appear. (The .com Internet domain had existed since 1985.) In the early 90s, the majority of commercial web sites included contact and product information. However, by 1994 a few enterprises started experimenting with the web as a new medium for commerce. The deployment of commerce on the web was enabled by several emerging web technologies such as secure transactions (introduced in the Netscape Navigator browser in 1994) and online database access through the Common Gateway Interface (CGI). CGI itself was developed around 1993 largely because of the rapidly growing web required search engines that would take user input on web page forms, create online database queries based on the user input, and then generate and display search result index pages. In 1995, Amazon.com and eBay.com, two of the biggest names in web commerce history, were launched. The dot-com boom of the late 90s followed. The web had changed significantly and become mainstream.

The web that most users experience today bears little resemblance to the original distributed repository of simple, static HTML files and text from the early 90s. Much of the web content that is browsed today is dynamically generated by programs responding to user inputs. The web pages that are browsed today-informational as well as commerce-oriented pages-can rightly be called web applications, generating responses to user input or generating content that is customized according to users’ preferences after they have logged into their own personal account. Today, web applications are used to implement web-based e-mail clients, online retail sales, online auctions, wikis, discussion boards, web logs, multiplayer online role-playing games, and other functions. Commonly used technologies to create dynamic web pages include JavaScript, CGI, PHP, Java, ASP, and ISAPI. Web pages may have JavaScript programs embedded in them that are executed by the web browser to generate page elements in response to certain events such as the user clicking a button or moving the mouse cursor over the navigation menu of a page. JavaScript will be discussed later in this chapter. Web pages may also be entirely generated by CGI programs or contain embedded PHP code. Whereas JavaScript programs are run by the web browser, CGI and PHP programs are run by the web server to generate web pages. CGI and PHP will also be discussed later in this chapter.

HTML Standards

Whether written from scratch or generated by scripts, web pages still basically consist of HTML code. And the HTML code that is sent to web browsers for display must adhere to current HTML standards. Due to the efforts of the W3C and others over the years, the major web browsers in use today do expect that the HTML documents that are sent to them conform to a common set of standards. Non-standards compliant HTML can produce some peculiar looking web pages. The following are the important HTML standards that have been published by the W3C (see also http://www.w3.org/MarkUp/#recommendations):

HTML The HTML standard has grown over the years; that is, the number of HTML markup tags, which are interpreted by web browsers to generate the web page elements we are familiar with, has grown in number. A significant version of the HTML standard was version 3.2, which introduced such elements as tables, text flow around figures, subscripts and superscripts, and frames. HTML 3.2 was introduced in January 1997 and was widely used to create web sites. However, the widespread use of HTML 3.2 tags such as <font> and the “color” attribute is seen as a negative development that went against the original intent of HTML, which was to focus on content rather than formatting. Development of large web sites in which fonts and color information had to be added to every single web page became a long and laborious process. In December 1997, HTML 4.0 was published. Version 4.0 contained language innovations for the disabled and support for international languages, as well as providing style sheet support, extensions to forms, scripting, and more. The unproductive formatting elements such as the <font> tag and “color” attribute were declared to be “deprecated” in Version 4.0. Version 4.0 was published in three “flavors”: (1) “Strict,” in which the deprecated formatting elements are forbidden, (2) “Transitional,” in which the deprecated elements are allowed, and (3) “Frameset,” in which mostly only frame-related elements are allowed. HTML 4.01, published in December 1999, is the current and final version of the HTML standard. It contains minor revisions to HTML 4.0.

XHTML XHTML (Extensible Hypertext Markup Language) is the W3C’s successor to HTML. XHTML is a reformulation of HTML 4.01 using the Extensible Markup Language (XML). XML (a February 1998 W3C Recommendation) is a simplified subset of SGML, which was the basis of the HTML language. XML was created primarily to facilitate the sharing of data across different systems, especially systems connected across the Internet. XML is a standard for creating markup languages that describe the structure of data. It is not a fixed set of elements like HTML, but rather, it enables authors to define their own descriptive tags. XML has already been used to create file formats for applications such as office suites (http://en.wikipedia.org/wiki/OpenDocument). So XHTML can be thought of as just one of several data formats that XML has been used to create. One of the potential benefits of the move to XHTML is that another file format based on XML (say a spreadsheet or a drawing) can easily be transformed into XHTML for display on the web. Another potential benefit is that a complex web page written in XHTML can be more easily simplified for display on less capable devices such as a personal digital assistant or cell phone display. The familiar markup and formatting elements in HTML are preserved in XHTML, but the syntax is stricter in XHTML; for example, HTML tags can be upper- or lowercase, but XHTML tags must be lowercase, since XML is case sensitive. XHTML 1.0 was published in January 2000 as a W3C recommendation and later revised and republished in August 2002. It contains the same three “flavors” that were introduced in HTML 4.0. XHTML 1.1 was published in May 2001 as a W3C recommendation. It is based on XHTML 1.0 “Strict” with minor changes.