Section 1.4. Setting up an XML web site | ASP.Net 2.0 Cookbook (Cookbooks (OReilly))


Prev	don't be afraid of buying books	Next

1.4 Setting up an XML web site

The two main components of an XML web site, as we've just seen, are the source XML documents and the XSLT transformation stylesheet. Viewed from another angle, on their way from the author to the user , web pages have to go through three distinct environments:

the authoring environment is where web site authoring and maintenance takes place;
the web server is sitting on the Internet, answering requests from clients (typically web browsers); and
the user environment is where a web browser displays the page.

Correspondingly, there are three possible ways to set up an XML-based web site, differing in which of the three above environments hosts the XSLT stylesheet and performs the XML-to-HTML transformation. We will now discuss these three setups in turn .

1.4.1 XML offline

The first setup (Figure 1.3) restricts XSLT processing to the authoring environment. In this scenario, neither source documents nor the stylesheet are ever put on the server; the entire cycle of creating, editing, and transforming XML is performed by site authors and maintainers offline on their own systems. Only the end result of thisHTML pages and graphicsare uploaded to the web server.

Figure 1.3. XML offline: XSLT transformation is done in the authoring environment, while the web server and web client both deal with HTML pages.

graphics/01fig03.gif

The advantages of this approach are obvious:

Works with existing servers and clients. Neither server nor browser software need to be changed in any way. The server will serve HTML and graphics as it always did, and users may never suspect that the web pages they are viewing were produced from XML. Thus, you can use your current hosting providers and target the current browser generationand still use XML, without placing the burden of supporting it on anyone but yourself.
Server performance is optimal. A direct consequence of the previous point is that in this setup, there is no negative impact on server performance. Serving static files, without processing of any kind, is the easiest and fastest thing to do for any server; if this setup involves dynamic processing on the server ( 1.5.2.2 ), it is the same as would be without XML.
Use any software for XML processing. Since you run your own transformation engine, you have complete control over its features. You are not limited by the XSLT processor, extensions, or auxiliary tools installed on a third-party system; you can install and/or program whatever you please . It's only in this setup that certain advanced (and sometimes time-consuming ) tricks are possiblesuch as generating images ( 5.5.2 ), which is too slow for the two other setups. You can still benefit from some degree of portability, of course, since you may want to run the same XSLT transformation on different computers and different platforms within your organization.

Now let's turn these advantages upside down to see what problems they correspond to:

Not for dynamic sites. The offline setup works only for static sites. More precisely ( 1.5.2.2 ), you can combine offline XSLT transformation with a dynamic engine on a web server (such as an online store or a forum), but this combination is rather awkward and can only be used as a last resort. Therefore, if your site will contain significant dynamic components, you should instead consider one of the other two XML setups discussed below.
Offline updates may be slow. As with traditional static sites, content updates with an offline XML site are not particularly fast. Editing the source XML document, running offline transformation, and uploading the result to the server may take some time.
Limited use of XML. Since there's no XML on the web server nor in the user's browser, only the site author can take advantage of it. Therefore, the server cannot index its content in XML, search or process it, or convert it into a format different from HTML.

Summarizing, the offline setup is the best way to start experimenting with XML-based web site architecture and may be perfectly adequate for any static or mostly static web site, such as a small corporate site. Once you get an offline XML web site up and running, you can always migrate it to one of the more advanced setups.

1.4.2 XML on the server

In the second setup (Figure 1.4), the web server stores the source XML documents for all web pages and transforms them into HTML on the fly. This means the web site author only has to deal with XML documents, while web clients get the pages in HTML as usual; the conversion from the former to the latter is done in the middle of the road.

Figure 1.4. XML on the server: XSLT transformation is done on the web server; a client receives an HTML page.

graphics/01fig04.gif

Once again, let's see first what is good about this approach:

Dynamic sites are OK. Unlike the offline setup, this one is well suited for dynamic web sites. This is because normally, XSLT transformation must come after dynamic processing and data retrieval ( 1.5.2.1 ). Since the dynamic engine is always on the server, the XSLT processor must also reside on the server in order to process the XML data produced by the dynamic engine. Using XML with a database-driven dynamic web site makes it possible to cleanly separate the site's content, style, and programming logic ( 7.2.1 , page 363).
No special clients required. Typical web servers do not support XML out of the box, so you'll have to spend some time installing and configuring XML software on your web server (Chapter 7). However, web clients (browsers) do not require an upgrade, as what they receive from the server is the same good ol' HTML. This makes this setup only a bit less practical in today's circumstances than "XML offline."
Set up your server as necessary. If you control your web server, you can install any auxiliary software or extensions on it if it is necessary for XSLT processingjust as you would install it offline in your authoring environment. The only limitations of this setup are security and performance: Whatever you run on the server to process a page must be secure against attacks and must work fast, or the user experience will suffer. ^[11]

^[11] The security and performance requirements are related : A server that spends too much resources serving each page is more vulnerable to a denial-of-service attack.
More benefit from XML. Even though normally, an XML-enabled web server sends out web pages in HTML, the fact that it stores them internally in XML makes it possible to use the XML source for other purposes, such as indexing. XML-capable web browsers can also request XML versions of pages to transform them for presentation locally using their own stylesheets.

Naturally, there are some downsides to the "XML on the server" setup as well:

Server performance may suffer. The most critical issue is of course the server performance. Running the XSLT processor for each requested page may have a significant impact on the server's response time. Caching transformed pages may reduce this effect but not eliminate it, because a dynamic page whose content is generated on the fly still has to be transformed every time it is served .
Server setup may be complicated. You'll have to install quite a bunch of additional software on your web server to make it XML-capable. Apart from the performance considerations, this also has possible security implications: The more software is involved in processing a request, the bigger are the chances of vulnerabilities.
Separate testing of pages is necessary. Before you upload a modified XML document onto the server, you need to make sure it is not broken. On a live site, posting a page that does not transform or results in a malformed HTML is unacceptable. Pre-upload validation with a schema may catch most errors ( especially with a rule-based schema language, 2.2.1.2 ), but only running the actual transformation and examining the resulting HTML pages may give you a guarantee. There are two possible approaches to handling this problem:
- You can run a test transformation in your authoring environment before uploading modified documents to the server. This amounts to combining the "XML offline" and "XML on the server" setups and thus undoes one of the advantages of the latter, namely the relative simplicity of the authoring environment.
- You can set up a separate protected staging area on your web server. Powered by the same XSLT environment as the public site, it will allow you to check your pages in an area that only you can see. Once you verify that everything is all right, you can re-upload the same document to the public web site.

Overall, this setup seems to be the most viable in the mid- term and the logical next step after you outgrow "XML offline." It is easy both on web authors (no XSLT software to install and learn) and, more importantly, on users (no special XML-capable browsers to access the site). The performance issue will likely become less pressing over time as faster XSLT processors and more powerful server hardware appear.

1.4.3 XML in the browser

The last setup (Figure 1.5) moves XSLT processing further down the road: Now it is the software on the user's system that, upon receiving both the XML source of a page and its XSLT stylesheet, must do the transformation and display the result. After our discussion of the two other setups, the advantages of this method should be clear:

Power to the users. The most important benefit of this setup is users' complete control over XML processing. Given full access to the XML source of a page, you can instruct your browser to use either the stylesheet supplied by the site or any other stylesheet. Moreover, you are no longer limited to HTML; you can, for example, render web pages into PDF via XSL-FO ( 5.5.3.2 ).

Due to the lack of a standardized vocabulary for the markup of web pages, ^[12] this freedom may not be immediately useful for most people. Still, it is conceivable that the evolution of the Web toward "XML in the browser" will one day free us from the dependence on HTML.

^[12] It is hardly conceivable that any one vocabulary will ever be able to cover the entire breadth of information on the Net.
XML is accessible from the server. Just as in the previous setup, the XML source of the pages can not only be sent to the clients but indexed, searched, and linked to on the server. As more and more web sites adopt this setup, parts of the Web will crystallize into XML-only semantic clusters.

Figure 1.5. XML in the browser: XSLT transformation is done in a web client.

graphics/01fig05.gif

At the moment, however, the disadvantages of this approach outweigh its advantages:

XML support in the client is required. The largest obstacle to implementing this scheme is the need for XML and XSLT support in the browser. Latest versions of Mozilla and MSIE do offer XSLT support, but you definitely cannot rely on it being available to all visitors to your web site. The only workaround is creating a "best viewed with" type of web site that only offers an XML version of its pages to XML-capable browsers but serves regular HTML to all others, which makes the whole idea impractical .
Server-side transformation is still necessary. Another reason that this approach is less than practical is the difference between the original XML vocabulary used for source markup and the vocabulary that is optimal for rendering in the browser. If nothing else, client-side transformation requires a single source document, while the server may store the information pertaining to one page in more than one source document. This means that an XSLT transformation or some other XML processing engine must still be run on the server for each page sent to the client.
XSLT extensibility is limited. Client-side XSLT processing has serious limitations. For security reasons, no XSLT stylesheet can be allowed to read or write any files on the client systemit can only work as a simple XML-to-HTML converter with one input and one output. Also, it can only use standard XSLT without any extensions. These limitations make many of the advanced techniques of Chapter 5 impossible with this setup.
Viewing performance may suffer. Just as sever-side XSLT processing slows down the server, the user's XML web browser will have to spend extra time transforming each page it views. This problem may become serious for large web pages viewed on less capable systems.

The bottom line is that the "XML in the browser" setup, although compelling as the logical next step after the two other scenarios we discussed, is not quite practical at this time. It may become viable in the future as older browsers with limited or no XSLT support are phased out. However, for this setup to really take off, it also requires that a common XML vocabulary for web pages is standardized and an array of client-side tools emerges, enabling people to do something useful with XML web pages in this vocabulary (other than just viewing them).

And what about us? This book covers the first two setups, "XML offline" and "XML on the server," as the most practical for today's web sites. Of them, offline XSLT processing is our focus during most of Chapter 5, since it's the easiest to set up and experiment with. It will also allow us to explore some techniques that are difficult or impossible to implement on the server. In Chapter 7, we'll see how our offline setup can be adapted to run on an XML-capable web server.


	Amazon