Static Website Interoperability

We'll begin our discussion of web server interoperability issues by examining those that come up with static websites. Static websites are those made up entirely of files that the web server delivers to the web browser without interpretation. This would include a site made up entirely of HTML pages, GIF images, and even Java applets, because the server treats them all much the same.

We'll then move on to the issues surrounding more complex dynamic websites. These would include sites using server-side programming languages such as PHP and ASP. The key difference is that some of the content on the website is generated on-the-fly when a web browser makes a request.

But first, we'll need web servers to work with! Let's take care of that chore, then move on to what we can do to make moving static web content between Windows and Linux a little easier.

Setting Up Our Web Servers: IIS and Apache

On the Windows side, we already have an IIS web server: exchange2003.ad.corp.com . We installed IIS on this server in Chapter 6 in order to check out Exchange 2003's web-based e-mail access. If you didn't do so then, you'll need to refer back to Chapter 6 now and configure IIS on exchange2003.ad.corp.com or another Windows server.

Note 

In a production environment, we'd likely never use an Exchange server as a general-purpose web server. That's because every new application introduces new security risks and our Exchange server is not a great choice of machine to expose to such risks. But for test lab purposes, there's no need to establish an entirely separate web server machine.

On the Linux side, though, we haven't had a need for a standard Apache web server until this point. We did select the Apache web server package when installing Fedora on linserv1.corp.com , but we never started it up or signaled the operating system to start it up on each reboot. Again, in a production environment it would be more secure to move the web server to a separate host. This is recommended because security holes are routinely found (and fixed) in popular third-party web applications that appear on many sites. But for test lab purposes, there's no harm in using a single server for many purposes. Let's take a moment now to take care of starting up Apache on linserv1.corp.com .

Follow these steps to start the Apache server and ensure that it starts every time the system is booted :

  1. Use the chkconfig command to make sure the Apache server process, httpd, runs at boot time:

     chkconfig httpd on 
  2. Use the service command to start up Apache now:

     service httpd start 

If either command produces an error message, you may not have installed Apache when you installed Fedora Linux on linserv1.corp.com . You can fix that quickly with the up2date command:

 up2date httpd 

Then repeat the two earlier steps.

Now let's verify that the Apache web server is actually working properly by creating a simple test HTML page. By default, Apache expects to find web pages in and beneath the directory /var/www/html . This behavior can be changed by editing the DocumentRoot setting in Apache's main configuration file, /etc/httpd/conf/httpd.conf .

Follow these steps to test the delivery of a simple web page by the Apache web server on linserv1.corp.com :

  1. Using your preferred text editor, create the file /var/www/html/test.html .

  2. Insert the following HTML code:

     <html> <head> <title>Test HTML Page</title> </head> <body> <h1>Test HTML Page</h1> </body> </html> 
  3. Access the web page with a web browser at http://linserv1.corp.com/test.html

In your browser window, you should see the text "Test HTML Page" in a large font. If you receive an error message instead, make sure you successfully followed the steps to start up the Apache web server.

Now that we have working web server environments for both Windows and Linux, we can look at how each server can be convinced to deliver web applications normally compatible with only one or the other.

Static Website Interoperability: "Gotchas" When Moving Content between Servers

A static website is the simplest type of website. Static websites can contain HTML pages, images, audio, video, and more. The key observation is that all of the content comes from existing files , with no database lookups or other forms of dynamically generated content. Sometimes a static website may utilize third-party providers to provide some dynamic features. For instance, Google Adsense ( www.google.com/adsense ) dynamically inserts advertising content using JavaScript code embedded in web pages. But JavaScript is a browser-side technology; nothing "interesting" is happening on the server side here. Similarly, many companies offer access counters that can be embedded in a web page. But these counters are implemented as references to images that reside on the web server of the company providing the counter service. From the standpoint of the web server that hosts the site, there's nothing to do but deliver existing files directly to the web browser.

Moving an existing static website from Linux to Windows or vice versa is a piece of cake most of the time. There just a few interoperability issues on the server side. Indeed, since both the Apache web server found on Linux systems and the Microsoft Internet Information Server most commonly found on Windows systems have virtually no trouble delivering simple static files, one might think there would be no issues at all!

Can't one just copy the content from the IIS inetpub folder right on over to the html directory on the Apache server? Usually, yes. But there are three common problems to be dealt with: case-sensitive filenames, inconsistent file extensions, and differing directory index filenames. Be sure to consider these issues carefully before you move your files over.

Case Sensitivity

On Windows, filenames are not case sensitive. If you have an image file named logo.png and you copy another file named Logo.png into the same folder, the first file is replaced . Also, if a web page embeds that image with an HTML element like the following one, the image will be displayed because Windows always ignores case when considering filenames:

 <IMG src="LoGo.png"> 

On Linux, filenames are case sensitive. An image file named logo.png and an image file named Logo.png can coexist in the same directory at the same time. The preceding HTML element will display nothing but a "broken image" icon in the web browser unless the actual image filename is exactly LoGo.png . And since Linux file extensions are also case sensitive and, by convention, always lowercase, a file named logo.PNG will never be correctly recognized as an image file. (Well, almost never. You could reconfigure Apache to recognize uppercase variations of each and every file extension it recognizes. But that's not a very practical way to do things.)

Unfortunately, case discrepancies like this tend to accumulate over time. And when it comes time to move the content to a Linux server running Apache some of your links and images and so forth are mysteriously broken.

The correct solution to this problem is to fix your filenames and references to those filenames to be completely consistent with regard to case. You can do this by working methodically through your files, correcting case to be consistent. Doing so is far easier if you simply decide on a standard, such as all lowercase for all filenames. Since lowercase is much more common for files on Linux, with uppercase used only rarely for emphasis (as in filenames such as README) , we suggest correcting all filenames to lowercase. Keep in mind that you must both rename the files and correct the case of every link to those files that appears in an HTML document. For those who know a little bit of HTML, that includes every <IMG src="filename.gif"> that embeds an image and every <A HREF="filename.html"> that links to a page.

OK, we all agree that's the right way. Now, what happens if you've been tasked to move a website to a Linux server in a hurry? Maybe you've had time to fix case issues, but you have a nagging suspicion that you missed a few. And you need that site to work properly on Linux now. So promise us you'll fix the problem the right way just as soon as you can and we'll let you in on a secret.

Apache is the most popular web server in the world for a reason. Well, for more than one reason. Apache is free, for one thing. But another major reason is its sheer flexibility. Apache can be extended to do new things by any programmer who wishes to do so by writing a new Apache module . And if you want to do something, chances are there's already an Apache module that allows you to do it. In many cases, the module you want is already part of Fedora's standard build of Apache, which means you won't have to compile anything from source code. That's the case for the mod_speling module (yes, that misspelling is intentional; those Apache developers have a cute sense of humor), which provides a quick and dirty solution to our case-sensitivity transition problem.

The mod_speling module is called into action every time a request for a file on the website is about to fail! Instead of rejecting the request, mod_speling checks for files with any number of case differences from the original. As an added bonus, mod_speling also tolerates one spelling error, such as an incorrect letter. This usually comes into play, of course, only if your web pages contain links or image references that were already broken on Windows. It does, however, come in handy when users manually mistype URLs on your website.

So how do we set it up? Like most generally useful modules, mod_speling is already loaded by Fedora's standard Apache configuration file, /etc/httpd/conf/httpd.conf , which we'll refer to as httpd.conf for brevity's sake. That means the feature is available to be turned on. But the switch hasn't actually been set yet. To do that, we must set the CheckSpelling option to on by adding one line to httpd.conf :

 CheckSpelling on 

And then we signal the web server to reload its configuration files using the service command:

 service httpd reload 
Note 

Yes, that's right: The module's name ( mod_speling ) contains a "cute" intentional spelling error, but the option that turns on the feature does not. Maybe it's not so cute after all.

You can also place this line within the <VirtualHost> ... </VirtualHost> container in httpd.conf for a single virtual website rather than applying it to all websites hosted by your web server.

Warning 

The spelling correction feature of mod_speling is convenient , but there are pitfalls to consider. Case errors are detected first, so you needn't worry that a file with a different spelling will be delivered instead of one with a simple capitalization error. However, you should be concerned about leaving this feature in place for the long term . You don't want your designers to get sloppy and stop fixing their typos. And while it is never adequate security to "hide" a file by "just not linking to it," mod_speling increases the chances of a user accessing the wrong file by accident when filenames are similar enough. This is only an issue if that "private" file is sitting right in your public web folder, of course, which you would never allow. Right?

But how did we find the right Apache module to use? And how did we learn the right way to enable the feature we want in the Apache server configuration file? The Apache web server community maintains a website ( http://httpd.apache.org ) that always provides access to the latest and greatest Apache web server version and documentation for the current version as well as some older releases. Browsing there, especially on the Apache Frequently Asked Questions page ( http://httpd.apache.org/docs/misc/FAQ.html ), will usually reveal the information you need.

OK, so we've learned how to deal with non-case-sensitivity issues when moving content from Windows to Linux. But how does case non-case-sensitivity affect us when moving from Linux to Windows? You're in luck here: it usually doesn't. Completely case-consistent links and filenames won't do any harm when you copy them to an operating system that is not case sensitive. Migrating your static content from Linux to Windows is simpler in this regard.

But there's one catch! Even though most designers would never create separate files named logo.png and LOGO.png , under Linuxwell, they could. And those files would be unique. And a page that embedded both images with <IMG src=" filename "> elements would display two different images.

If, by some chance, your web designers have chosen to do this, you'll need to rename those files so that their names are different even without regard to case. That is, logo.png and LOGO.png are not distinct on Windows, but logo.png and logo2.png are distinct.

Fortunately, this issue is rare. After all, designers who originally create their content on a case-insensitive Windows system would never develop the habit of expecting two filenames differing only in capitalization to refer to two distinct files in the same directory. And developers who work with Linux typically avoid putting files in the same directory if the names only differ in their case. If you do encounter this problem, though, take care to rename one of the conflicting files and change all links and references to it in HTML documents accordingly .

File Extensions and Mime Types

When it comes to file extensions, we often casually refer to, say, a GIF graphics file as a "dot-gif" or an HTML document as a "dot-html." Windows users, especially those who go back to the Windows 3.1 days, will sometimes refer to "dot-htm" as well. By popular convention, a file with a .gif extension is understood to be a Graphics Interchange Format (GIF) image file, and a file with a .htm or .html extension is understood to be an HTML document. But these file extensions are not universal standards. Indeed, at one point, the MacOS operating system did not use file extensions at all , relying on alternative methods to identify file types. While in the past few years file extensions have become more universally accepted across platforms, there is much less consistency in the naming of some newer file types, notably those used for video. That means the web browser can't rely solely on the file extension to tell it what sort of data the file contains. On the Web, this problem is solved by using MIME types .

Multipurpose Internet Mail Extensions (MIME) types avoid the problems of associating files with applications based on the file extension. MIME definitions are standardized to refer to specific file types regardless of the extension.

For instance, the MIME type for a GIF image is image/gif , and the MIME type for an HTML document is text/html . It does not matter whether the file was named welcome.html or welcome.htm or even welcome.mytzylpk on the web server's hard drive, so long as the web server specifies the right MIME type when it sends the file to the web browser.

For the most part, this technique works very well. It doesn't matter what the file is called on the server, so long as the server tells the browser what MIME type the file really contains. This is a sensible way of doing things: The web server, which is under the web designer's control, presumably knows what the files contain. The web browser shouldn't have to guess!

But what happens when we migrate files from a Windows web server to a Linux web server or from Linux to Windows? There's a possibility that a file extension recognized by one web server will be unknown to the other.

The good news is that most of the common file extensions, and quite a few of the less common ones, are recognized by both Apache and IIS "right out of the box." For instance, that includes both .htm and .html , even on Linux, where long filenames have always been available. But if you do encounter a file extension that your web server doesn't know about, the result is rarely desirable. Apache will deliver the file using a default MIME type. This is often set to text/plain , which appears in the browser as plain text, or application/octet-stream , which most web browsers will offer to save to disk and not attempt to interpret further. Some web servers are configured to refuse to deliver a file at all if no MIME type is defined for the file extension. If the expected behavior is, say, MPEG video playback, then none of these behaviors are acceptable substitutes. So how do we solve this problem?

For the Apache server as found on Fedora systems, the standard file extensions are defined by the file /etc/mime.types . This file is used both by Apache and by other applications. But we don't recommend that you modify this file because you would lose the ability to update it automatically via the Red Hat Update Agent (that blinking exclamation point on your server's desktop).

Instead, edit your httpd.conf file and take advantage of the AddType option. For instance, to "teach" Apache to recognize the up-and-coming Scalable Vector Graphics (SVG) file format, an open alternative to Flash, and send the browser the correct MIME type when it sees a .svg file, we do the following:

  1. Add this line to httpd.conf :

     AddType image/svg+xml .svg 
  2. Signal the web server to reload httpd.conf :

     service httpd reload 

That's how we do it on Linux. What about Windows? IIS keeps its master list of MIME type assignments in the IIS metabase , a database of configuration information similar to but separate from the Windows Registry because one Registry just isn't enough. Adding additional MIME type assignments is straightforward via the IIS snap-in, introduced in Chapter 3. In this example, we'll add support for serving files in the SVG file format as shown in Figure 10.1.

image from book
Figure 10.1: Adding a new MIME type to file extension mapping in IIS

Following these steps will allow IIS to deliver .svg files with the right MIME type, image/svg+xml , for all websites hosted on WINDC1:

  1. Click "Start."

  2. Click "Administrative Tools."

  3. Click "Internet Information Services (IIS) Manager."

  4. Right-click "WINDC1 (local computer)" to display a pop-up menu.

  5. Choose "Properties" from this menu.

  6. In the "WINDC1 (local computer) Properties" dialog box, click "MIME Types."

  7. Click "New" to add a new type.

  8. In the "Extension" field, enter svg (note: no period).

  9. In the MIME type field, enter image/svg+xml (note: no period).

  10. Click "OK" to close each dialog box until you return to the IIS snap-in. Close the IIS snap-in by clicking the "X" in the upper-right corner.



Windows and Linux Integration. Hands-on Solutions for a Mixed Environment
Windows And Linux Integration Hands-on Solutions for a Mixed Environment - 2005 publication.
ISBN: B003JFRFG0
EAN: N/A
Year: 2005
Pages: 71

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net