1.3 Components of an XML web site
This section describes the major components of a working XML-based web site. You already know the role of the source documents and the XSLT transformation stylesheet (Figure 1.1); now let's look at how they fit together and what other data and software you'll need.
1.3.1 Source documents
In your own tongue. There is no single accepted standard of XML vocabulary for source documents of a web site. This is not surprising, given the extreme diversity of topics and structures of web sites.
In this book, therefore, we focus on creating your own custom source vocabulary suitable for your unique document types and web site requirements. Compared to reusing an existing vocabulary, this approach has many advantagesa much simpler structure, names that are more intuitive (at least for you ), and complete freedom in developing the vocabulary further in any direction and at any time as needed. Chapter 2 lays the foundation, and Chapter 3 builds a robust and convenient source vocabulary on it.
Master and slaves. Source XML documents of a web site not only must abstract out the pure content of the pages; they also need to be structured differently than the target HTML pages. As we'll see in 18.104.22.168 , information that is common to more than one page must be removed into a different document that we will call the master document of the web site. In a simple web site, all other documents are page documents leaves of the source tree that have one-to-one correspondence with the web pages.
Validation is what sets SGML and XML apart from most other data and document formats. Having a complete and self-contained specification of what constitutes a valid documentand being able to automatically check documents against this specification before they go into processingis becoming critical as information systems grow more complex and more distributed.
The choice of a schema language to use for validation is important, but may be difficult. SGML first standardized DTDs (document type definitions) for validation, but today the choice of schema languages implementing different approaches is much wider. A part of Chapter 2 analyzes the major approaches to validation and the features of existing schema languages.
Schematron rules. In that analysis, rule-based schema languages come out at the top as most convenient for a flexible validation layer that grows as you develop your markup vocabulary and adapts to the requirements of web site maintainers. Of rule-based schema languages, Schematron is the most developed; it also has the big advantage of sharing its XPath component with XSLT. Basics of Schematron are given in Chapter 2, a simple schema for our web site documents is at the end of Chapter 3, and some more advanced Schematron techniques are covered in Chapter 5.
1.3.3 Transformation stylesheet
The XSLT transformation stylesheet is the kernel of an XML web site, and mastering XSLT and XPath is therefore the key to the efficient use of XML. Chapter 4 gives an overview, focusing on the new features of XSLT 2.0 and XPath 2.0 as well as the existing XSLT extensions.
The stylesheet we'll be writing throughout Chapter 5 actually does much more than just translate XML into HTML. Here's a list of the stylesheet components that are discussed (some of them must be present in any stylesheet, while others are optional):
Variables, parameters, and functions are defined once and used many times throughout the stylesheet. In the sample setup described in the book, this code is separated into a shared XSLT library ( 5.1.1 ) that is imported by both the stylesheet and the Schematron schema allowing the latter to use some of the values and algorithms of the stylesheet for a more meaningful validation.
Trunk templates and branch templates ( 4.5.1 ) are the two kinds of templates controlling the XML-to-HTML transformation; this is where the snippets of HTML code are stored and output in response to the source XML constructs. Trunk templates create the top-level constructs of a page, such as menu and layout; branch templates are responsible for processing textual data, links, and other elements at the lower levels of the source tree.
Extension Java classes optionally allow you to program any functions absent in XSLT and then call them from your stylesheet. We will use this facility for all sorts of useful tasks , such as accessing directories ( 22.214.171.124 ), querying graphic files ( 5.5.1 ), running external applications ( 126.96.36.199 ), and processing text ( 188.8.131.52 ). The Java classes given in the book will work with most Java-based XSLT processors (they were tested with Saxon), but you can use a similar approach for extending other processors as well.
Batch processing ( 5.6 ) is another optional component that enables the stylesheet to process not one but all of the site's source XML documents in one go. The good thing about this is that you don't have to prepare a separate list of the files to processthe stylesheet itself will retrieve this list from the site's master document. Thus, a single list of pages in the master can be used for different tasks, such as creating menus , resolving internal links, and batch processing. Programming batch processing in XSLT is only one of the ways to automate site updates; other approaches are possible ( 6.5 ) that are external with regard to the stylesheet.
1.3.4 Static objects
Another component of a web site, static objects , includes everything that goes from the web designer directly to the server without being involved in XSLT processing in any way. Most static objects are imageslogos, decorations, backgrounds, and so onas well as other non-HTML objects such as Flash animations and Java applets.
However, the line between the static and nonstatic objects is not the same as between non-HTML and HTML. Some of the HTML pages whose content and design will never change (e.g., a 404 error page) can be static. On the other hand, graphic files ( 5.5.2 ) and other binary objects ( 5.5.3 ) can and should be generated by the stylesheet if their content is linked to that of the web pages or is otherwise changeable .
After the XML/XSLT core of the web site is ready, you must integrate it with other software. If you visualize the sequence of transformations that stretches all the way from the author to the user , then the XML-to-HTML transformation stage, occupying the very end of that sequence, is the most interesting for us. There are, however, other tools that come onto the scene before (or sometimes during) that stage.
Thus, to put your information into XML, you need to use at least a text editor, or a specialized XML editor, or a converter from some other document format. Writing the stylesheet may be assisted by various kinds of XSLT and XPath software. Finally, separate classes of software may be used to control building the site offline and integrating it with a server-side dynamic engine. These tools are the subject of the two last chapters of the book.