Architecture

   

Webmasters typically edit their Web sites with an HTML editor. The major disadvantage of this approach is that it freezes the site. Indeed, to change the presentation, you must manually re-edit every page. It's possible to do, but it's a lot of work.

The XML solution is to separate authoring from publishing. The author of the pages writes the document in XML. While doing so, she ignores presentation. She instead adopts an XML vocabulary that focuses on the organization of the document: sections, titles, abstracts, and more.

Publishing the document then simply requires converting the document into HTML, WML, or another popular format. Fortunately, this can be automated because the original XML document is structure rich. The operative word here is automated.

For medium to large sites, it is more cost effective to automate publishing. Rewriting a couple of pages by hand is feasible ; however, for a hundred pages, it is too expensive.

Figure 4.1 illustrates how we'll apply these principles in this chapter. The tree main elements are as follows :

  • Documents in structure-rich XML

  • XSLT style sheets that implement the conversion to HTML, WML, and RSS (more on RSS in the next section)

  • A servlet that is responsible for applying the style sheets

Figure 4.1. XML separates authoring and publishing.

graphics/04fig01.gif

XML Stylesheet Language

To publish XML documents we will use XSL, the XML Stylesheet Language. More specifically , we will use XSLT, XSL Transformation.

XSLT is a scripting language optimized for conversion between XML documents. In that respect it differs from early style sheet languages, such as CSS (Cascading Style Sheet), or word processor style sheets.

CSS describes how each element should be presented onscreen: which font, which color , which size , and more.

XSLT transforms the XML document into another XML document. It goes much further than simple presentation instructions. In fact, XSLT can completely reorganize a document and, for example, add a table of contents or delete a section.

How does that help? The trick is to transform from a structure-rich XML document into a format that contains display instructions, such as HTML or WML.

A browser (or another viewer) can render the second document onscreen or on paper. What display format should you use? The following are some popular options:

  • HTML ” Strictly speaking, HTML is not an XML vocabulary. This is not an XML-to-XML transformation. However, HTML is so popular, and so close to XML, that the W3C decided to support it.

  • XHTML ” The XML version of HTML.

  • WML ” The markup language for WAP devices.

  • Open eBook ” The format for eBooks, based on HTML.

  • XSLFO ” A new display language that is optimized for printed documents. At the time of writing, two XSLFO viewers exist: a browser (http://www.indelv.com) and a PDF converter (http://xml.apache.org).

The XSLT standard is available online at http://www.w3.org/TR/xslt.

XML Vocabulary

As we saw in the previous chapters, XML does not define any vocabulary. It is up to developers to create vocabularies for their applications.

For this application, we have two realistic options. The first option is to use DocBook (http://www.docbook.org) or another standard SGML/XML vocabulary for documents. DocBook is particularly attractive because it is widely used and well supported.

However, DocBook is so rich that it is too complicated for such a simple project.

The second option, and the one we'll adopt in this chapter, is to create our own vocabulary ”one that is simple and limited to only the tags we need.

Listing 4.1 illustrates the vocabulary we'll use in this chapter. As you can see, it is almost trivial: It's just a list of news items.

Listing 4.1 index.xml
 <?xml version="1.0"?> <News>    <URL>http://localhost:8080/publish/index</URL>    <Item>       <Title>Applied XML Solutions</Title>       <Author>Beno&#238;t Marchal</Author>       <Abstract>A new intermediate/advanced book for XML          developers.</Abstract>       <Para>Learn advanced XML programming with Applied XML          Solutions. This hands-on teaching book is filled with          practical examples.</Para>       <Para>Applied XML Solutions is a great complement to XML by          Example.</Para>    </Item>    <Item>       <Title>Jetty</Title>       <Author>Greg Wilkins</Author>       <Abstract>Open Source Java Server.</Abstract>       <Para>Jetty is a powerful, open-source Java web server. It          supports standard Java servlets making it the ideal          development environment.</Para>       <Para>Jetty is also highly-configurable which helps custom          developments.</Para>    </Item>    <Item>       <Title>Hypersonic SQL</Title>       <Author>Thomas M&#252;ller</Author>       <Abstract>Open Source SQL Database.</Abstract>       <Para>Hypersonic SQL is an open source database that          supports the JDBC API.</Para>       <Para>Hypersonic SQL is efficient and can run in three          modes: in-memory, standalone or client/server. This          provides lots of flexibility when writing          software.</Para>    </Item> </News> 

The list starts with a URL that points to the server where the document resides. The W3C suggests using the xml:base attribute for this purpose, but it turns out that Xalan, the XSLT processor I use, has a problem with the xml namespace, so I use a URL element as a workaround:

 <URL>http://localhost:8080/publish/index</URL> 

Each item has a title, author, abstract, and list of paragraphs:

 <Item>       <Title>Applied XML Solutions</Title>       <Author>Beno&#238;t Marchal</Author>       <Abstract>A new intermediate/advanced book for XML          developers.</Abstract>       <Para>Learn advanced XML programming with Applied XML          Solutions. This hands-on teaching book is filled with          practical examples.</Para>       <Para>Applied XML Solutions is a great complement to XML by          Example.</Para>    </Item> 

Figure 4.2 illustrates the structure.

How can you develop such a format? When should you use existing formats (such as DocBook) rather than develop your own? Unfortunately, there are no hard rules that you can follow to guarantee success.

As you develop your XML vocabulary, remember that a good vocabulary achieves a reasonable compromise between two opposite goals: On the one hand, it must mark up as much information as possible; on the other hand, it must be simple.

Figure 4.2. The document structure in XML.

graphics/04fig02.gif

It is important to mark up as much data as is realistically possible because the markup drives the transformation to HTML, WML, and others. If something has not been marked up, transforming it will be difficult (or outright impossible ).

Yet, as you define the vocabulary, be realistic. If you provide too many tags and too many options, you will confuse authors. This is particularly true if authors don't use the format regularly.

A format that is too complex can be dangerous because it gives the false impression that we're creating quality documents, whereas, in fact, authors usually ignore most of the markup. I am sure you have already encountered a database with a complex table organization. In most cases, developers have misused it and retrieving useful information is difficult. The same could happen with a markup vocabulary that is too complex.

Tip

Consider using an XML editor, as introduced in the previous chapter, to guide authors.


   


Applied XML Solutions
Applied XML Solutions
ISBN: 0672320541
EAN: 2147483647
Year: 1999
Pages: 142

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net