| < Day Day Up > |
|
To access the World Wide Web service, HTTP protocol is used, the latest version being HTTP 1.1. HTTP uses URL (Uniform Resource Locator), which represents the location and access method for a resource available on the Internet. For example:
http://www.iseeyes.com
ftp://rtfm.mit.edu/pub/Index.README
The Uniform Resource Locator (URL) has six fields: protocol, domain name, port address, directory path, object name, and the specific spot in the object. The server on which the object is located is called the origin server.
The format of a URL is
http://www.elearn.cdacindia.com:80:/hypertext/www/Addressing/Addressing.html#spot
1 2 3 4 5 6
Protocol (such as HTTP, FTP)
Domain name
Port address
Directory path
Object name
To reach a specific spot (or link)
Some of the most commonly used protocols along with port addresses are
FTP | 21 (port address) |
HTTP | 80 |
Telnet | 23 |
Figure 23.2 shows the procedure for accessing the resource corresponding to a URL. The client invokes the browser (such as Internet Explorer or Netscape Communicator) and specifies the URL. The URL is passed to a Domain Name System (DNS) server (Step 1), which gives the IP address of the server (Step 2) that has the resource corresponding to that URL. The server that has the resource is known as the origin server. Using that IP address, the HTTP request is sent to the Web server (Step 3) and the Web server gives the response in the form of an HTML document to the client via the ISP server (Step 4). Sometimes Web servers do not allow access to every user due to security reasons. In such cases, a proxy server is used between the client and the origin server. The proxy server acts as a client to the origin server and as a server to the actual client. When there is no restriction on accessing an origin server, the servers between will act as tunnels. A tunnel is an intermediate program that acts as a blind relay between two connections.
Figure 23.2: Web access.
When a user wants to access a resource by giving a URL, the IP address of the origin server is obtained using a Domain Name System. Then a TCP connection is established between the client and the origin server. The origin server sends the HTML document corresponding to that URL to the client.
HTTP is a transaction-oriented client/server protocol. The client is the Web browser, and the server is the web server. The mechanism for HTTP protocol is shown in Figure 23.3. The origin server is the server on which the resource is located. There are three possibilities for interaction between the client and the origin server:
Direct connection
Proxy
Relay
Figure 23.3: Hypertext transfer protocol (HTTP).
In direct connection, a TCP connection is established between the client (user agent) and the origin server. The user agent sends an HTTP request, and the origin server sends the response.
A proxy is used when the Internet server does not permit direct access to the users. The proxy acts as a client to the server and as a server to the client.
A proxy acts as a server in interacting with clients. The proxy server acts as a client to the origin server. The user agent sends the request to the proxy, and the proxy in turns sends the request to the origin server and gets the response, which is forwarded to the client. The proxy server can be used for different reasons. The origin server, for security or administrative reasons, allows only certain servers to access its resources. In such cases, an authenticated connection has to be established between the proxy and the origin server. Sometimes, different servers run different versions of HTTP. The proxy is used to do the necessary conversions to handle the different versions of HTTP.
A tunnel performs no operation on HTTP requests and responses. It is the relay point between two TCP connections. HTTP messages are of two types, request from client to server and response from server to the client. A request will be in the format GET <URL>, and a response will be a block of data containing information identified by the URL.
There are two types of messages:
Request messages: | GET, POST, DELETE |
Response messages: | OK, Accepted, Use Proxy |
Each message contains a header and a body.
HTTP defines status codes to inform the client/server of the status. Some status codes are:
OK | Request successful. |
No content | No information to send back. |
Moved permanently | Resource URL has changed permanently. |
Moved temporarily | Resource URL has changed temporarily. |
Use Proxy | Resource must be accessed through proxy. |
Unauthorized | Access control is denied due to security reasons. |
When a user wants to access a Web site, the URL given in the address field of the browser is sent to the ISP server, which gets the IP address corresponding to the URL. Then a TCP connection is established and a GET command is sent. The origin server sends back the resource (the HTML file). The HTML code containing the tags will be interpreted by the browser and displayed to the user. In case any error is encountered, the error message is displayed based on the status code received by the browser.
HTTP is a simple transaction-oriented client/server protocol. The client sends a GET request along with a URL, and the server responds with a POST response containing the HTML document corresponding to that URL. In case there is a problem such as nonavailability of the content, nonavailability of the server, and so on, an error message is sent.
The World Wide Web has become the most attractive service on the Internet because of HTTP. The information corresponding to a URL is obtained from the origin server in the form of Hypertext Markup Language (HTML), which is a simple text file with tags (called markup). These tags specify how the content has to be formatted and displayed (bold letters, underline, in table format, and so on). The HTML file also contains special tags called anchor tags. These anchor tags provide links to other HTML pages. When you click on the link, the HTML page corresponding to that link will be displayed. This new HTML page can reside on the same server or another server located in another part of the world. As a result, you can access information without the need to know where the information is physically located. That is the power of the Web.
| < Day Day Up > |
|