1.2 WWW

The WWW is a virtual network that is overlaid on the Internet. It comprises all client ^[10] and server systems that communicate with one another using the Hypertext Transfer Protocol (HTTP). HTTP, in turn , is a simple client/server application protocol that is layered on top of a reliable transport service, such as provided by the Transport Control Protocol (TCP). The protocol defines how WWW resources ^[11] may be requested and transmitted across the Internet. In this book, we do not delve into the technical details of the HTTP specifications. Instead, we refer to the many books that address HTTP and its features. Among these books, I particularly recommend [18].

HTTP and the WWW were originally invented in the late 1980s by Tim Berners-Lee and his colleagues at the European Laboratory for Particle Physics (CERN ^[12] ) located in Geneva, Switzerland. It was envisioned as a way of publishing physics papers on the Internet without requiring that physicists go through the laborious process of downloading a file and printing it out. As such, HTTP and theWWWhave been in use since 1989. Note, however, that the first version of HTTP, referred to as HTTP/0.9 (i.e., HTTP version 0.9), was only a simple protocol for raw data transfer across the Internet.

HTTP was (and still is) a simple request/response protocol. This basically means that a client sends an HTTP request message to a server, and that the server sends back a corresponding HTTP response message. There are no multiple-step handshakes in the beginning as with other TCP/IP application protocols, such as Telnet or FTP. In the case of HTTP/0.9, the browser simply established a TCP connection to the appropriate port of the origin server and sent a request message like GET /index.html to the origin server. The origin server, in turn, responded with the contents of the requested resource (the file /index.html in the example above). In HTTP/0.9, there were no request headers, no request methods other than GET, and the response had to be a file written in a special language, namely the hypertext markup language (HTML). All current servers are capable of understanding and handling HTTP/0.9 requests , but the protocol is so simple that it is not very useful anymore.

After the first implementations of HTTP/0.9, the protocol was enhanced with some new features, such as request headers and additional request methods, as well as a message format that conforms to the multipurpose Internet mail extensions (MIME) specification originally proposed for Internet-based electronic messaging. The resulting HTTP/1.0 (version 1.0) specification was officially released in 1996 in RFC 1945 [19].

Compared to HTTP/0.9, HTTP/1.0 was a major step ahead. Nevertheless, HTTP/1.0 still did not sufficiently take into consideration the effects of hierarchical proxies, caching, the need for persistent connections, and virtual hosting. In addition, the proliferation of incompletely implemented applications calling themselves ˜ ˜compliant to HTTP/1.0 required a protocol version change in order for two communicating applications to determine each other s capabilities. Consequently, an updated version of the HTTP specification was drafted in 1997. After a 2 year trial period, the specification of HTTP/1.1 (version 1.1) was officially released in RFC 2616 [20] and submitted to the Internet standards track. The basic operation of HTTP/1.1 has remained the same as for HTTP/1.0 (and HTTP/0.9), and the protocol ensures that browsers and servers of different versions can correctly interoperate . More precisely, if the browser understands version 1.1, it uses HTTP/1.1 on the request line instead of HTTP/1.0. When the server sees this version number, it can make use of HTTP/1.1 features. If, however, an HTTP/1.1 server sees a lower version number, it adjusts its responses to use that protocol version instead. In addition to RFC 2616, there is an experimental RFC 2774 that describes an HTTP extension framework [21]. This framework is not addressed in this book.

Originally developed on NeXT computers, the WWW didn t really take off until a team of researchers at the National Center for Supercomputer Application (NCSA) of the University of Illinois wrote Mosaic, a browser for the X Window system. In the early 1990s, this browser soon became the standard against which all other browsers were compared. Marc Andreessen, who was the head of the original Mosaic development team, went on to cofound a start-up company called Mosaic Communications. The company first created a new browser called Mozilla. ^[13] Afterwards, the company was renamed Netscape Communications and the corresponding browser was renamed Netscape Navigator. After Microsoft released its own browser, called the Internet Explorer, Netscape Communications and Microsoft started a tough competition for market share. The competition ended in 1998 when America On-line (AOL) bought Netscape Communications. Netscape Navigator is still available and in use today, but it has lost a lot of market share. Instead of Netscape Navigator, a new browser called Opera ^[14] is used and widely deployed on the Internet today. Opera has been developed in Norway to meet the requirements of clients with limited computing power. As such, it is the browser of choice for many users of personal digital assistants (PDAs) and handheld computer devices. As of this writing, it is difficult to tell whether Microsoft Internet Explorer will increase its market share or loose it to a competitor, such as Opera.

HTTP and Web technologies are omnipresent on the Internet and an increasingly large number of Internet services have been redesigned and implemented so they can also be accessed from a standard off-the-shelf browser (instead of only a dedicated client software package). For example, most browsers implement the File Transfer Protocol (FTP) ”in addition to HTTP ”and can be used to electronically download files accordingly . Consequently, these browsers may serve as replacement tools for formerly used FTP clients. Also, many e-mail users regularly access their message stores using Web browsers and HTTP instead of e-mail user agents and message store access protocols, such as POP3 or IMAP4. In fact, Web-based messaging has become very popular in the recent past ( especially among roaming users) and many companies have installed and are operating corresponding Web frontends to their messaging infrastructures . In the case of Microsoft Exchange, for example, Outlook Web Access may provide this kind of functionality.

Against this background, the term Web services has been created to become a new buzzword in the industry, and many software vendors have launched initiatives to promote Web services based on the extensible markup language (XML). Examples include Microsoft s .NET initiative and the Sun Open Net Environment (Sun ONE). ^{[15 ]} In either case, the Web services markup language (WSDL) is used to formally describe Web services in some structured and standardized way. Implementing a Web service means structuring data and operations inside of an XML document that complies with the Simple Object Access Protocol (SOAP) specification. The SOAP, in turn, is a simple and lightweight XML-based client/server protocol that defines a messaging framework for exchanging structured data and type information across the Web. It can be used in combination with any transport protocol or mechanism that is able to transport SOAP messages (also known as SOAP envelopes ). Many programming or scripting languages can be used to implement a Web service and to construct, transmit, read, and process corresponding SOAP messages (e.g., Java and C+). Once a Web service has been implemented, it must be published somewhere that allows interested parties to find it. Information about how a client would connect to a Web service and interact with it must also be exposed somewhere accessible to them. This connection and interaction information is commonly referred to as binding information. Universal description discovery and integration (UDDI) registries are the primary means to publish, discover, and bind Web services. These registries contain the data structures and taxonomies used to describe Web services and Web service providers. A UDDI registry can be hosted either by private organizations or by third parties. More recently, IBM and Microsoft have announced the Web services inspection language (WSIL) specification to allow applications to browse Web servers for XML Web services. As such, WSIL promises to complement UDDI by making it easier to discover available services on Web sites not listed in the UDDI registries. By the time this book hits the shelves of bookstores, many new terms and acronyms will have been created and put in place. All of these technologies are not at the core of this book. Consequently, they are mentioned and put into perspective only where useful and appropriate. You may refer to many other books to learn about XML or Web services in general, and WSDL, SOAP, and UDDI in particular [22, 23]. You may also refer to the home page of the World Wide Web Consortium ^[16] (W3C) to get some further information about the latest acronyms and buzzwords .

^[10] In WWW parlance, HTTP clients are often called browsers. In this book, we are going to use the terms HTTP client, client , browser , and Web browser synonymously. Note, however, that most browsers provide client support for other application protocols in addition to HTTP, such as Telnet, FTP, and Gopher.

^[11] Examples of WWW resources include text and HTML files, GIF, and JPEG image files, or any other file that stores digitally encoded data in some specific format.

^[12] The acronym is derived from the French name of the research laboratory.

^[13] Note that sometimes browsers are still called Mozilla.

^[14] http://www.opera.com

^{[15 ]} In its latest material, Sun Microsystems uses the term services on demand to go one step further and to collectively refer to local applications, client/server applications, Web applications, and Web services.

^[16] http://www.w3.org