Webmasters typically edit their Web sites with an HTML editor. The major disadvantage of this approach is that it freezes the site. Indeed, to change the presentation, you must manually re-edit every page. It's possible to do, but it's a lot of work. The XML solution is to separate authoring from publishing. The author of the pages writes the document in XML. While doing so, she ignores presentation. She instead adopts an XML vocabulary that focuses on the organization of the document: sections, titles, abstracts, and more. Publishing the document then simply requires converting the document into HTML, WML, or another popular format. Fortunately, this can be automated because the original XML document is structure rich. The operative word here is automated. For medium to large sites, it is more cost effective to automate publishing. Rewriting a couple of pages by hand is feasible ; however, for a hundred pages, it is too expensive. Figure 4.1 illustrates how we'll apply these principles in this chapter. The tree main elements are as follows :
Figure 4.1. XML separates authoring and publishing.
XML VocabularyAs we saw in the previous chapters, XML does not define any vocabulary. It is up to developers to create vocabularies for their applications. For this application, we have two realistic options. The first option is to use DocBook (http://www.docbook.org) or another standard SGML/XML vocabulary for documents. DocBook is particularly attractive because it is widely used and well supported. However, DocBook is so rich that it is too complicated for such a simple project. The second option, and the one we'll adopt in this chapter, is to create our own vocabulary ”one that is simple and limited to only the tags we need. Listing 4.1 illustrates the vocabulary we'll use in this chapter. As you can see, it is almost trivial: It's just a list of news items. Listing 4.1 index.xml<?xml version="1.0"?> <News> <URL>http://localhost:8080/publish/index</URL> <Item> <Title>Applied XML Solutions</Title> <Author>Benoît Marchal</Author> <Abstract>A new intermediate/advanced book for XML developers.</Abstract> <Para>Learn advanced XML programming with Applied XML Solutions. This hands-on teaching book is filled with practical examples.</Para> <Para>Applied XML Solutions is a great complement to XML by Example.</Para> </Item> <Item> <Title>Jetty</Title> <Author>Greg Wilkins</Author> <Abstract>Open Source Java Server.</Abstract> <Para>Jetty is a powerful, open-source Java web server. It supports standard Java servlets making it the ideal development environment.</Para> <Para>Jetty is also highly-configurable which helps custom developments.</Para> </Item> <Item> <Title>Hypersonic SQL</Title> <Author>Thomas Müller</Author> <Abstract>Open Source SQL Database.</Abstract> <Para>Hypersonic SQL is an open source database that supports the JDBC API.</Para> <Para>Hypersonic SQL is efficient and can run in three modes: in-memory, standalone or client/server. This provides lots of flexibility when writing software.</Para> </Item> </News> The list starts with a URL that points to the server where the document resides. The W3C suggests using the xml:base attribute for this purpose, but it turns out that Xalan, the XSLT processor I use, has a problem with the xml namespace, so I use a URL element as a workaround: <URL>http://localhost:8080/publish/index</URL> Each item has a title, author, abstract, and list of paragraphs: <Item> <Title>Applied XML Solutions</Title> <Author>Benoît Marchal</Author> <Abstract>A new intermediate/advanced book for XML developers.</Abstract> <Para>Learn advanced XML programming with Applied XML Solutions. This hands-on teaching book is filled with practical examples.</Para> <Para>Applied XML Solutions is a great complement to XML by Example.</Para> </Item> Figure 4.2 illustrates the structure. How can you develop such a format? When should you use existing formats (such as DocBook) rather than develop your own? Unfortunately, there are no hard rules that you can follow to guarantee success. As you develop your XML vocabulary, remember that a good vocabulary achieves a reasonable compromise between two opposite goals: On the one hand, it must mark up as much information as possible; on the other hand, it must be simple. Figure 4.2. The document structure in XML.
It is important to mark up as much data as is realistically possible because the markup drives the transformation to HTML, WML, and others. If something has not been marked up, transforming it will be difficult (or outright impossible ). Yet, as you define the vocabulary, be realistic. If you provide too many tags and too many options, you will confuse authors. This is particularly true if authors don't use the format regularly. A format that is too complex can be dangerous because it gives the false impression that we're creating quality documents, whereas, in fact, authors usually ignore most of the markup. I am sure you have already encountered a database with a complex table organization. In most cases, developers have misused it and retrieving useful information is difficult. The same could happen with a markup vocabulary that is too complex. Tip Consider using an XML editor, as introduced in the previous chapter, to guide authors. |