Web Browsers | UNIX: The Complete Reference, Second Edition (Complete Reference Series)

To access the web, you will need a web browser. The browser is the program you use to display web pages and to navigate between web pages. Web browsers either are stand-alone products or are bundled with other Internet applications. We will discuss several of the many available web browsers for UNIX, but before we do so, we will briefly describe some of the fascinating history behind the development of web browsers.

Browser History

The first web browser, Mosaic, was developed by Marc Andreessen and Eric Bina at the National Center for Supercomputer Applications and was introduced in 1993. The first release of Mosaic was developed to run on the UNIX X Window System. Later Andreessen left NCSA and became one of the founders of the Mosaic Communications Corporation, which later changed its name to Netscape Communications Corporation. The web browser produced by the Netscape Communications Corporation, simply called Netscape, was the first commercial browser subsequent to the release of Mosaic. Later, Netscape introduced its Netscape Communicator, an Internet applications package which included an enhanced version of their Navigator web browser bundled with an e-mail client, a tool for reading netnews, a program for composing web pages, groupware software, and several other programs. Netscape Navigator and Communicator became extremely popular programs.

To counter the success of the Netscape web browser, Microsoft developed a competing web browser called the Internet Explorer, based on the original Mosaic browser. The Netscape Communications Corporation and Microsoft fought a three-year war, called the browser war, where each introduced new features into their browsers and matched features each other introduced. Furthermore, Microsoft bundled their Internet Explorer with their Windows operating system; they also produced a UNIX version of Internet Explorer. The Microsoft strategy, especially the inclusion of the browser at no extra charge with the operating system, made it impossible for the Netscape Communications Corporation to compete. In 1998, the Netscape Communications Corporations, realizing that Microsoft had succeeded in making the Internet Explorer the dominant web browser, decided to start the open-source Mozilla project, with the goal of developing the next generation of the Netscape Internet applications package.

America Online (AOL) purchased the Netscape Communications Corporation in 1998 and developed several releases of Netscape. In the following years, AOL lost interest in Netscape and in 2003, AOL disbanded the remnants of the Netscape Communications Corporation. Their final release of Netscape was made in 2004. Meanwhile, Internet Explorer was available until 2002 for two UNIX platforms, HP-UX and Solaris; after 2002 the Internet Explorer was not ported to UNIX platforms.

The Mozilla organization developed their Mozilla Applications Suite, including a browser, an e-mail client, a netnews client, a web page editor, and an IRC client. Release 1 of the Mozilla Applications Suite was introduced in 2002. In 2003, the Mozilla Foundation was created to continue the open-source development work of Internet applications based on Mozilla software. The Mozilla Foundation, using the Mozilla Applications Suite as a base, has developed its Firefox web browser and a variety of other Internet applications. The final release of the Mozilla Applications Suite has been released; it is being superseded by a new offering, called SeaMonkey, which is under development by the Mozilla community.

Using UNIX Browsers

There are many browsers available for UNIX platforms; some of these browsers are parts of larger Internet application suites. Noteworthy browsers for one or more UNIX variants include the Mozilla browser, which is part of the Mozilla Applications Suite; Firefox, a browser also developed under the auspices of the Mozilla Foundation; Epiphany, a web browser for the GNOME desktop; and Konqueror, a web browser for the KDE desktop. For a list of web browsers and links you can use to determine which platforms they run on, go to http://en.wikipedia.org/wiki/List_of_web_browsers.

We will briefly introduce some features of the Firefox browser. Other browsers share many features with Firefox, although the particulars on using and configuring each browser will vary. Using a browser is relatively straightforward. Most of the functions are apparent from the button or menu titles. All you really need to know is that hyperlinks are displayed in a distinctive style, either by text color, by underlining, or by a colored border around an image, and that you follow hyperlinks by clicking the distinctive text or image.

Besides following links, you may want to use a number of other functions which most browsers provide. You can bookmark (save) the link to a site, so that you don’t have to retype the entire URL the next time you want to access it. If you want to refer to the content on the page later, you can save it to a file, using the File | Save As menu, or you can print it by using the Print button. If you get lost, you can go back to your home page (the default one when you start your browser) by selecting the Home button. As you become more experienced, you can try to customize your environment to make browsing easy Before you do this, though, you should understand the effects of the changes you make. You may find it helpful to read the online help information for your browser by selecting the Help button.

Your Initial Home Page

When you first invoke your browser, it will probably attempt to display the browser vendor’s home page, or it might just display a local file. If your machine has direct access to the Internet, you might be pleasantly surprised to see a home page appear after a few seconds. If you don’t have direct access to the Internet, because your organization either is not connected or is connected through an unfriendly firewall, the connection attempt will time out and you will be greeted with a message giving some indication of the problem.

You will probably want to configure your browser to display a home page of your choice rather than the vendor’s default page. You may be better off specifying a local file for your initial home page. That way, if the network beyond your machine is down, you won’t have an annoying timeout when you bring up the browser.

Using Firefox

With Firefox, the initial home page is indicated in an option menu, as shown in Figure 10–3. To begin, click Tools on the menu bar, and then click Options (note that this procedure is done by clicking Edit and then Preferences in some Firefox distributions).

image from book
Figure 10–3: Firefox initial home page setting

The Options window displays a group of five categories. You begin with the General category

Firefox General Settings Configure your initial home page here by entering your favorite URL in the text box in the Home Page area in the Location dialog box. You can also select your current page as your home page. You can also use a page that you have previously bookmarked or a blank page. In the General category, you can also choose whether Firefox should be your default browser. You can also specify how Firefox should connect to the Internet. You need to check with your system administrator to see if your organization runs a “caching proxy server.” If so, enter the name of the machine and port number in the appropriate fields in the Manual Proxies form. Normally, the caching proxy is set by your ISP. However, corporations typically use a firewall proxy to keep people from going to unwanted sites. In addition to caching information, firewalls provide a strong degree of security by monitoring outgoing requests as well as incoming ones. Failure to set up adequate protection with a firewall can make your browser environment unsafe; that is, web users that are not on your network can access your information and possibly change things.

Firefox Privacy Setting By selecting the Privacy category and then selecting the History tab, you can specify the length of time that pages you visit should be kept in the History file. By selecting each of the other tabs, you can tell Firefox to save information you enter in forms for later use, you can have Firefox remember login information for different web pages, you can tell Firefox when to remove downloaded files, you can manage cookies (which are pieces of information that web sites store on your computer), and you can specify the size of the cache used by Firefox.

Other Firefox Settings You can select the other tabs: Content, Tabs, Downloads, and Advanced to further configure your Firefox browser. For example, using the Content tab you can block pop-up windows and enable or disable Java and JavaScript. Using the Tabs tab, you can determine how links from other applications are opened, such as in a new window or on a new tab on the most recent window. Using the Downloads tab, you can specify the folder where files should be saved. Using the Advanced tab and selecting the Security tab that follows, you can specify the security protocol or protocols to use.

Helper Applications and Plug-Ins

Documents on the web come in many different media flavors, including text, images, audio, and movies, and each of those media flavors comes in many different formats. For example, a text page may be expressed in many different formats, including HTML, PostScript, LaTeX, Word for Windows, or unstructured text. Browsers differ in their capabilities to display various forms of each type of media. Depending on your browser version, certain file extensions may be handled by a helper application, which associates the file extension with a specific program (for instance, .doc files are opened by MS Word).

With most browsers, you can also integrate new content types using software programs called plug-ins. Although both helper applications and plug-ins enable a browser to expand the number of file types that it can handle, plug-ins are most closely integrated with the browser’s environment. Plug-ins can be loaded and unloaded from memory, whereas helper applications usually remain active even after you have left the web page you were viewing and even after you have closed your browser down.

HelperApplications

Browsers invoke external programs called “Helper Applications” or “Viewers” to deal with document formats that the browsers themselves do not understand. The format of a document is indicated by the last part of the name of the document, sometimes called the “extension” of the document name. For example, several common formats and the corresponding extensions are displayed in Table 10–8.

Table 10–8: Some Common Internet File Formats and Their File Extensions
Format	Extension
HTML	.html
PostScript	.ps
GIF	.gif
JPEG	jpg, .jpeg
Wave (audio)	.wav
MP3	.mp3
Real Audio	.rpm, .ra
PDF (Adobe Acrobat Reader)	.pdf
AVI	.avi
MPEG	.mpeg

Table 10–9 shows some popular helper applications.

Table 10–9: Some Popular Helper Applications
Format	Viewer
Graphics	Xv
Audio	Showaudio
PostScript	gs

Plug-ins

Another way to view different media types with Firefox, Mozilla, and many other browsers, is to use a plug-in. Plug-ins can be used to seamlessly integrate content of different media types in web pages. Firefox, Mozilla, and many other browsers, can determine whether it has a plug-in for playing the file. When a file with one of these extensions is accessed from a web page, the plug-in automatically starts the associated application based on the file type. For example, if you access a file called starspangle.wav that is a wave file, this audio file will begin playing over your attached speakers. To find Firefox and Mozilla plug-ins, go to https://addons.mozilla.org/firefox/plugins/. Some of the plug-ins you will find there are the Acrobat Reader for viewing PDF files, a Flash Player for delivering Macromedia Flash content, the Java Runtime Environment that runs applications and applets written in Java, and the Real Player, which can play streaming audio and video.

Reading RSS Feeds You can use Firefox to access Really Simple Syndication (RSS) feeds, such as blogs and news headlines, using the Live Bookmark feature. Life Bookmarks automatically keeps track of updates for you. To learn more about Live Bookmarks, go to https://addons.mozilla.org/firefox/extensions/. There are also add-on programs to Firefox that can be used to read RSS feeds. Go to https://addons.mozilla.org/firefox/extensions/ to find these.

Web Documents

After you have learned the basic operations of your web browser, you may want to understand more about how information is organized and structured on the web. The most common unit of information on the web is the document. Most documents consist of text and images and are called pages. However, documents may come in a variety of other forms including audio and video and a wide selection of image files. Browsers may display documents directly, or they may invoke another program called a “helper application” or a “viewer.” All browsers can display text, and most can display some image formats. For sound or movies, however, they need to call a viewer. The binding between a document type and a viewer may be configured by the user, making it possible to reference document types unknown to the browser. This is especially helpful for newer types of audio and video applications.

Each document on the web has a unique address known as a Uniform Resource Locator (URL). A document’s URL indicates the Internet protocol needed to access the document (e.g., HTTP, FTP, and so on), the Internet address of the machine serving the document, the filename of the document on the machine relative to a server-specific root, and an optional port number for specialized server configurations.

Although most documents are static files, a document may be generated by executing a program at the server, making it possible to serve dynamic data such as weather, dates, and times, which may change from one reference to the next. These programs are often called CGI-BIN scripts. We will talk about them in Chapter 15, as well as some client-side tools used for presenting dynamic data.

Links

Perhaps the single most significant factor contributing to the phenomenal popularity and growth of the web is the hypertext link or hyperlink. Any document, anywhere on the web, can refer to any other document, anywhere on the web, with a hypertext link. The browser displays a link in a document with some form of highlighting such as a contrasting color or an underline, or in the case of links associated with images, with a distinctive border.

The user follows the link by moving the mouse over the highlighted text or image and clicking with the mouse. This instructs the browser to display the document indicated by the URL associated with the highlighted text. The new document in turn may include links to other documents, which contain links to yet other documents. The web is not hierarchical or tree-structured like a computer’s file system. In other words, after following a thread of links through several pages it is not necessary to make your way back up the first thread before another thread can be started. Instead, any document can link to any other document or documents in a web-like structure-thus, the origin of the term “web” to describe the collection of all hypertext servers.

Addressing

To send traditional mail (in computer circles sometimes called smail for snail mail) to a person, it is necessary to know that person’s house number, street, and city or town (and perhaps postal code and country) To call a person on the telephone requires a phone number. A phone number and a number/street/city/town are both forms of an address, an identifier that is unique in a given context such as the phone or postal systems. On the web each document also has a unique address, known as a Uniform Resource Locator, or more commonly, a URL.

A URL is embedded in a document by the author when a hypertext link is created and is accessed by a browser when the link is followed. Browsers display the URL of the current page and usually also display the URL of a link when the cursor is moved over the hypertext reference. You will usually reference URLs by clicking links, not by typing them in explicitly However, you may see URLs outside the context of a browser, for example, in a netnews article or e-mail. URLs are starting to appear in the nontechnical press as well. A quick review of a recent issue of Time magazine revealed URLs mentioned in two ads. (In these cases you will have to enter the URL into your browser to view the referenced document.)

A key strength of the web is the integration of access to many dissimilar resources from a common browser. The addresses of those resources are likewise integrated into the common syntax of the URL. We will describe the URL structure of several web protocols.

HTTP

Let’s take a look at an http URL. All of the examples here refer to a fictitious company, Foobar Sales, so don’t try to use them. The web is changing very rapidly and many URLs quickly become stale-that is, the documents they refer to may have been moved or deleted or perhaps the machine serving a document has been upgraded and has a new name. We prefer to use a contrived URL that will never work rather than a real one that may be stale by the time you read this. A typical URL looks like this:

http://www.foobar.com/marketing/brochures/overview.html

This URL tells the browser to use the HTTP protocol (hypertext transfer protocol-yes, the word protocol is redundant, but we won’t get into that) to contact a machine named www.foobar.com and to retrieve a document identified by marketing/brochures/overview.html. Often you will see a URL given without any document specified.

http://www.foobar.com

This URL tells the browser to contact machine www.foobar.com and fetch a default document. By default, this document is a file named index.html; however, the name of the default file can be configured at the server.

FTP

HTTP is the most common protocol used on the web, but it is not the only one. Many other protocols, including FTP and telnet, are supported by web browsers. Traditionally FTP was invoked from the UNIX System command line and was used by interactively entering a series of commands such as ls or dir to display directories, cd to move around the directory hierarchy, and get and put to transfer files. Because of the convenience of the point-and-click interface of web browsers, many people have completely abandoned the command-line interface and use only browsers for access to anonymous FTP servers.

This is an example of a URL for an anonymous FTP reference:

ftp://ftp.foobar.com

This instructs the browser to contact machine ftp.foobar.com using the FTP protocol. The browser logs in with the login name anonymous and supplies the user’s login name and machine name in the form of a mailing address as a password. Because the preceding reference does not indicate a specific resource, the home directory for anonymous transfers is displayed. This usually looks like this:

 bin etc incoming pub

All of these entries are directory names. All but pub are used for administrative purposes for anonymous FTP service. The pub directory contains the files offered for anonymous access by foobar, or additional directories that lead to them. Clicking “pub” will display the contents of that directory Clicking any directory is equivalent to the UNIX or Windows cd (change directory) command followed by an ls (UNIX) or dir (DOS) to display the contents of the directory After you have located the name of the desired file, click the filename to transfer it to your machine. Depending on the browser and file type, the file may be displayed directly by the browser. Otherwise, the browser may invoke a helper application or viewer to display the file, or you may be prompted to confirm that the file should be saved and to supply the filename or an alternate filename.

A URL can also supply a full description of a file resource as shown here:

ftp://ftp.foobar.com/pub/drivers/prod1.tar.z

When selected (clicked), the file prod1.tar.Z will be transferred immediately without any intermediate directory display or further file selection from a directory list.

Browsers may also use the FTP protocol for nonanonymous FTP service, although this is much less common and is generally a bad idea. A URL of the form

ftp://bill:letmein@foobar.com/work/src/proj1/pl.c

causes the browser to log in to foobar.com using the name “bill” and supplying the password “letmein.” This is not a good idea because anyone reading your page can obtain your password (the rest should be obvious). Alternatively, you could omit the password as shown here:

ftp://bill@foobar.com/work/src/proj1/pl.c

The browser will prompt the user for a password, which must be correctly supplied before the server will return the document. This may be handy for quickly viewing one of your own files from a remote location but is of dubious value for general, public use.

Telnet

There are circumstances where the author of a page may wish to indicate a link to an interactive, character-based service. Entering the URL

 telnet://foobar.com

instructs the browser to invoke a telnet helper application (if one is available and the browser is configured to use it) and pass to it the machine name so that a telnet session is established with foobar.com. Although this accomplishes nothing more than the user would by invoking telnet directly, it does simplify the process by passing the machine name to telnet and by making telnet available with a single mouse click from within the browser.

The username and even a password may also be included as shown here:

 telnet: //bill@f oobar. com telnet: //bill: letmein@f oobar. com

Netnews

A link to a netnews group or article is specified in your browser using a URL of the form

 news:group_name news:article_number

Using NNTP (Network News Transfer Protocol), the browser contacts an NNTP server and obtains some or all of the articles in the newsgroup “group_name” or just one article indicated by the numeric “article_number.”

The NNTP server supporting netnews in your organization is identified to the browser using an option menu or environment variable.

Mailto

On the web most information flows out from the servers to the users. Although most servers maintain a log of requests that shows who accessed what pages, there is rarely any other feedback from users. The mailto URL, shown here, makes it easy for users to communicate back to the authors of web pages:

 mailto:bill@foobar.com

When this URL is selected, the browser will display a mail dialog box. The user types a message, which is sent to the mail address indicated by the URL. It is a thoughtful touch to include a mailto URL on your home page to make it convenient for your readers to send you comments. These can be a source of valuable feedback.

Personal URLs

Web pages are stored on the server in a directory hierarchy that is rooted in a directory indicated to the server in a configuration file. For security reasons this tree is usually writable only by the system administrator. On systems shared by multiple users where the users do not have administrator privileges, this makes it difficult for individual users to create and maintain their own web pages. Administrators quickly tire of requests to update files in the browser page database. The solution for this problem is the personal URL, shown here:

http://www.foobar.com/~wrw/my_home_page.html

This instructs the server to obtain a document named my_home_page.html from a directory associated with user wrw. This directory is usually named public_html and is located in the user’s home directory We will have more to say about this later.

An Abstract Look at URLs

In the abstract, a URL is defined this way:

 scheme:scheme-specific-data

where scheme is one of these:

 scheme:     http     https (secure http)     ftp     gopher     mailto     news     telnet     wais

(Scheme can also be one of several others that are not frequently encountered.) The scheme-specific-data is a description of a resource or action to perform that is specific to the named scheme such as http or ftp.

Although the syntax for the rest of the URL may vary depending on the particular scheme selected, URL schemes that involve the direct use of an IP-based protocol to a specified host on the Internet use a common syntax for the initial part of the scheme-specific-data:

 scheme-specific-data:     //user: password@host: port     //user:password@host:port/url-path

This initial part starts with a double slash (//) to indicate its presence and continues until the following slash (/), if any Other elements are

user An optional user name. Some schemes (e.g., FTP) allow the specification of a user name.
password An optional password. If present, it follows the user name, separated from it by a colon.
host The fully qualified domain name of a network host, or its IP address as four sets of decimal digits separated by periods.
port The optional port number to connect to. Most schemes designate protocols that have a default port number. Another port number may optionally be supplied, in decimal, separated from the host by a colon.
url-path The rest of the URL consists of data specific to the scheme, and is known as the url-path. It supplies the details of how the specified resource can be accessed. The slash (/) between the host (or port) and the url-path is not part of the url-path.

A Few Formalities

We have not been particularly rigorous in this chapter regarding the term URL or the distinction between a URL path and ordinary files. You should at least be aware of some details in the formal definition of the structure of URLs.

We have used the term URL loosely to mean any identifier for resources on the web. You may encounter two other terms, URI and URN, which, together with URL, have more formal definitions. URI, for Uniform Resource Identifier, is the general term encompassing both URLs and URNs. URL, for Uniform Resource Locator, specifies the “address” of a resource, whereas URN, for Uniform Resource Name, specifies the “name” of a resource. The distinction between them relates to the notion of persistence. A URN has greater persistence than a URL-that is, the URN identifying a document will remain constant even though the physical location, as described by the URL, changes. Through an as-yet-unspecified mechanism, a URN is automatically mapped to a URL.

Because the original implementations of web software were developed on the UNIX system, it is not surprising that the “url-path” looks like a UNIX file path specification. It is particularly easy for UNIX users to fall into the trap of thinking of the “url-path” as a filename. This is not always the case for three reasons. First, some web servers (see Chapter 16) have a mechanism for mapping an arbitrary “url-path” to an arbitrary file. This is useful when server administrators wish to present a document structure or hierarchy to the public that differs from the actual structure as stored on disk. It also makes it possible to maintain a fixed public structure while the internal structure changes (for any of the reasons things change on computer systems). Second, the machine running the web server may not even be running a UNIX variant and the file structure and naming syntax may be quite different from that of UNIX. The obvious example is the case of a web server running on a Windows operating system. As a minimum, the server must translate the forward slashes in the URL to the backslashes used by DOS and add a drive specification such as “C:” or “D:”. Finally, the link may be to an application, not a document page at all. Such is the example in a web link that points to a CGI-BIN script (see Chapter 27).