1.5 XML and dynamic sites
A dynamic web page differs from a static one in that some of the information it displays is not stored in files (be it XML documents or files in any other format), but is retrieved from a database and/or calculated on the fly. A dynamic page may also collect data from the user and store it into a database. Database interaction and calculations are done on the web server in response to a page request coming from a client (e.g., a web browser). These days, the majority of web sites contain at least a few dynamic pages.
This section's material will not be needed for most of the book where we deal with the basics of a static XML-based web site; we will return to the dynamic issues only in Chapter 7. So, you can now skip forward to Chapter 2 unless you are specifically interested in combining the XML/XSLT techniques with a dynamic web site engine.
1.5.1 The two sources
Two sources of information are combined by a dynamic web site engine to produce the final browser-viewable web pages. The first source is the dynamic content, usually stored in a database. It tends to be well granularized, that is, broken into final unbreakable atoms dynamic values such as a heading, date, or story text. These dynamic values may change upon each serving of a page to the user.
The second source is static (although it, too, can be stored in a database). It supplies the templates into which the dynamic values are inserted. These change only if the site's designer decides to change them, but remain the same throughout the normal update cycle of the site.
188.8.131.52 Dynamic values
Dynamic values, stored in a database or calculated on the fly, can be atomic (a single data object, e.g., a string or number) or composite (a mix of various objects, e.g., the text of a story that may contain paragraphs, emphasis, links, and other elements).
Self-typing atoms. Atomic values rarely exist outside a database or a program that generated them, so they usually don't require XML for tagging. In other words, there's no need to associate a name and a data type with each atomic value, as these properties are intrinsic for them.
Mixed-content composites. Composite values with their mixed content ( 1.2 ) are another matter. They are typically marked up using HTML or some other regular markup convention, such as Wiki,  to be translated to HTML by the dynamic engine. This is not really a good idea; a piece of text with some HTML markup is not a complete HTML document and as such cannot be validated , which may lead to markup errors in the final served pages that are hard to track. Also, HTML markup makes these composites unsuitable for any other use except generating HTML pages.
A much better solution is defining an XML document type for each distinct type of a composite dynamic value. By using XML markup, you can make your composite values complete, self-describing , and validated.
184.108.40.206 Static templates
In the simplest case, a template is an HTML file with some of its content replaced by scripts that produce actual content (from a database or from calculations) when the page is about to be served (Figure 1.6). Such templates are usually quite messy (as are any big chunks of HTML) and hard to maintainit is really a mixup of what would better be kept apart!
Figure 1.6. A simple dynamic web page: Note that while some of the atomic dynamic values are drawn from the database, others may be calculated by the page's embedded script.
Many sites rectify the situation a little by removing the code bits from the templates. In this approach, instead of being envelopes for actual code, templates are stored and used as data. The dynamic engine retrieves the templates and the corresponding dynamic values at the same time and merges them, using certain markers within the templates as clues for where to insert the dynamic data.
Other sites go even further by assembling templates on the fly from separate fragments, such as a menu template, a sidebar template, a footer template, etc. (Figure 1.7). The problem with this approach is the same: Template fragments are non-validatable, messy HTML that is very fragileit is difficult to edit each fragment so that it still fits into its place and the final assemblage is working as designed.
Figure 1.7. A more complex dynamic web page: The program code is removed from the template, and the template is constructed from separate fragments.
220.127.116.11 Do you need XML?
Some developers tend to think that since a dynamic site separates a static template (i.e., formatting) from what will be inserted into that template (i.e., content), it can deliver the same benefits as an XML-based web site architecture. To some extent, this is true. However, XML offers much more than the simple "template/content" model, and you can get extra benefits by combining a dynamic engine and an XML/XSLT engine in one site.
The problem with most dynamic sites is that the boundary between their static and dynamic components is drawn where it is convenient for the programmer, not necessarily where it is best from the semantic viewpoint. You can think of it this way: A programmer of a dynamic web site takes static, HTML-only pages drafted by the web designer, tracks down whatever HTML snippets will be changing on those pages, replaces these snippets with some PHP or ASP code pulling data from a database, debugs the whole thingand voil , a brand new dynamic web site is up and running.
This has little to do with proper semantic analysis ( 2.3 ) as performed for XML. I'm not saying that programmers of dynamic web sites fail in their job; it's just that the job of an XML web developer is very differentbut not less necessary.
It is important to understand that XML is not just another technology for implementing dynamic sites. Moreover, it turns out that XML is largely orthogonal to the template/content distinction in that it can be meaningfully applied to both static templates and dynamic values.
1.5.2 The two scenarios
Logically, the first step toward a dynamic web site with XML is separating the information on where to insert each value from the information on how to format the whole thing. Templates or template fragments must be stripped of any formatting details and converted into a lean, semantic XML vocabulary whose only goal is to define what pieces of content are present on the page and what structural role is assigned to each piece.
Starting from here, two different scenarios of combining the XML/XSLT engine and the dynamic engine are possible. We will now discuss these scenarios in turn .
18.104.22.168 Compile, then transform
This scenario, illustrated by Figure 1.8, attempts to do as much as possible in XML and only transforms the final result into HTML at the very last stage of the process. This is the most natural approach, although not always optimal.
Figure 1.8. Incorporating XML/XSLT into a dynamic web site: "Compile, then transform."
At first, XML template fragments (stored either in a database or in static files) are assembled into a complete static template that includes both page-specific static content and site-wide metadata. Since the template is not going to change often, you can perform this aggregation step offline and upload the finished template on the server.
Next, dynamic values are inserted into the template. Of course, composite dynamic values are also marked up with a semantic XML vocabulary; for example, they can use the schema of the template fragments or a subset thereof. This second step is normally performed on the server. Finally, the fully assembled XML document is transformed into HTML.
Note that each of the triangular XML documents shown in Figure 1.8, as well as the final complete XML page source, may have its own schema and go through its own validation stage. Usually, however, it is more convenient to have a single schema that allows certain variation and thus accommodates all kinds of objects participating in the process. Similarly, the three distinct diamond-shaped processing blocks may be separate, but they can just as well be combined into one XSLT stylesheet that compiles the template, inserts dynamic values (possibly using extensions to access or calculate them), and finally produces the HTML page.
Before this is possible. There are a couple of obvious prerequisites for this scenario. First, since the entire process of database retrieval, processing, and assembling data is performed in XML, the dynamic engine itself must be XML-aware. You'll have to teach it how to store, search, and extract XML documents instead of plain text or HTML fragments it may have dealt with before. Therefore, native XML databases exist that make storing and reusing XML objects transparent.
Second, the dynamic engine is normally located on the web server so it can retrieve and serve data on the fly. And since the XML-to-HTML transformation comes after the dynamic engine, it must also be installed on the server and perform transformation in real time, in response to each request. As a result, this scenario can only work with the "XML on the server" setup ( 1.4.2 , Chapter 7).
"Compile, then transform" scorecard. The benefits of this approach are:
The formatting layer is abstracted out from many different templates into one stylesheet. Thus, you can make automatic changes to the presentation of not only all pages produced from one template, but all pages produced from different templates. This may even include static pages (those that do not use any dynamic values), as it is only natural to store them using the same XML vocabulary and transform them using the same stylesheet as dynamic pages.
A direct result of the above is better modularization of the site's infrastructure. Your site's content staff, design staff, and programming staff can all work independently without much risk of obstructing each other's efforts.
The site's dynamic engine can be made more generic in its design and therefore usable in almost any dynamic web site application so long as the data is in XML. Recently, a number of such generic XML-based web site engines emerged implementing the "compile, then transform" scenario (notably Cocoon, 7.2 ). On the other hand, many if not all functions of a dynamic engine such as that in Figure 1.8 can be performed by XSLT stylesheets.
Besides being a natural fit for "XML on the server," this scenario is as close as you can get to "XML in the browser" ( 1.4.3 ). Indeed, if it ever becomes practical to serve XML+XSLT from your web site instead of HTML, all you have to do is remove the last transformation stageeverything else remains untouched. Your investment into the XML infrastructure is thus truly long term , as it may outlast the current HTML-centric Web.
Sometimes, however, the requirement that everything is installed on the server may make the uncompromising "compile, then transform" scenario impractical . As we'll see in Chapter 7, installing new software on a web server may be a complex task, or the server may be completely out of your control (e.g., if you use outsourced hosting). Therefore, you might need to look for other, less drastic scenarios that will combine traditional dynamic engines, such as PHP or Perl scripting available from most hosting providers, with offline ( 1.4.1 ) XML processing.
Compile and transform offline. Not all dynamic data needs to be produced the moment the page is requested . Quite often, a reasonably delayed update (e.g., once a day) is acceptable. This means that you can take the entire setup depicted in Figure 1.8 and implement it locally, without worrying about web server setup or performance.
Then, all you have to do is set up your system to run the aggregation and transformation periodically and upload the resulting HTML pages to the severand you're done. You can also implement a "watch" script that runs the transformation in response to a change in your dynamic data, or simply run the transformation manually. The best way to do that will of course depend on the nature of your data and your web site.
This "offline but dynamic" setup may be a good first step toward implementing the completely dynamic server-side setup of Figure 1.8. It will let you test your source definition and the stylesheet before you start setting up an XML-enabled web server.
22.214.171.124 Transform, then compile
The second scenario, illustrated by Figure 1.9, is a straightforward combination of a simple dynamic site (such as that in Figure 1.6) with offline XSLT transformation producing HTML templates with embedded code. You program your stylesheet to generate an HTML page complete with embedded scripts, upload the resulting template on the server, and let its scripts work exactly as they would in any other HTML pageretrieving dynamic data and inserting it into the page as it is served.
Figure 1.9. Another way to combine XML/XSLT with a dynamic engine: "Transform, then compile." Composite dynamic values are best avoided in this scenario.
Obviously, to get any advantages from XML's content/formatting separation, we must restrict the dynamic data to atomic (i.e., structureless) values only. This way, all formatting is stored in the stylesheet, and the main difference from the previous scenario is that instead of inserting dynamic values into XML, we insert opaque objects (scripts) into HTML.
This approach works best for mostly static sites that only need a few simple dynamic bits here and there. For example, it is perfectly adequate for a registration form or a feedback collection page that only writes user data into a database but never extracts it back for display.
However, this scenario may work for complex sites too, if you take time to carefully disassemble your dynamic values into atoms (no mixed content allowed) and write robust scriptlets to access these atoms. In return, you will enjoy the first two benefits of the previous scenario (all formatting code is in one place; content, design, and programming jobs are largely independent).