Writing XML involves entering structured information that complies with a document type definition or schema. Even within Emacs, the XML support you receive varies. At the low end of the spectrum, there is plain vanilla Fundamental mode. It provides simply a screen where you type. Specialized modes like SGML mode provide support for entering tags, as we saw earlier in our discussion of HTML mode, a derivative of SGML mode. But neither of these approaches help you parse or validate XML (SGML mode has a command for validating, but it is tricky to set up correctly). More advanced Lisp packages, though currently not included in Emacs, are available to provide these functions. These add-on packages provide validation against DTDs or schemas, parsing capabilities, and, typically, an array of standard DTDs and schema definitions. In Emacs, these tools primarily work in conjunction with one of two major modes. psgml mode validates XML (and SGML) against DTDs. The newer nxml mode validates against RELAX NG schemas. We cover both of these options in this section. Before we go into detail on those modes, however, let's look briefly what Emacs has built-in with SGML mode. 8.4.1 Writing XML with SGML ModeEmacs's own SGML mode provides support for entering tags. We covered much of this earlier under HTML mode, so we provide just one brief example here. Inserting, hiding, and showing tags are especially helpful features provided by SGML mode. Let's look at a chapter on enumerated types by Java in a Nutshell author David Flanagan. This chapter uses the DocBook DTD.
Note that Emacs displays XML on the mode line. XML mode in this context is a subset of SGML mode. Actually, despite this name, all the commands in this mode start with sgml, not xml. The menu of relevant commands is called SGML as well. Emacs doesn't pretend to have extensive XML support. We want to insert a paragraph before the first paragraph.
Note that Emacs is not following our indentation style. We can correct it by moving to the beginning of the line and pressing Tab. See Table 8-4 earlier in this chapter for details on SGML mode commands. 8.4.2 TEI Emacs: XML Authoring for Linux and WindowsThe Text Encoding Initiative (TEI) wanted an XML authoring environment for Emacs, so it created (the somewhat misleadingly named) TEI Emacs.[9] Despite its name, TEI Emacs does not include Emacs itself. Rather, it creates an authoring environment for writing XML using nxml mode or psgml mode. It incorporates XSLT tools, along with most of the standard DTDs, such as the three forms of XHTML DTDs (strict, frameset, and transitional), DocBook DTDs, and more. Naturally, the TEI's own DTDs and schemas are also included.
The active development of this tool and its careful packaging led us to describe this tool despite the fact that it is limited to Linux and Windows at this writing.[10] You should have Emacs 21.3 already installed before you install this tool. Installing TEI Emacs is trivial. The Windows version has an installer, and Linux users follow simple instructions at http://www.tei-c.org/Software/tei-emacs/, the web site for downloading TEI Emacs.
8.4.3 Writing XHTML Using nxml ModeJames Clark, an XML pioneer, wrote nxml mode to provide Emacs support for his schema standard RELAX NG. For details on the standard, visit http://www.relaxng.org/ or pick up a copy of RELAX NG by Eric van der Vlist (O'Reilly). The important thing about nxml mode is that it validates text as you type instead of making validation and debugging separate steps. If you did not install TEI Emacs, you can download nxml mode and its schemas from http://thaiopensource.com/download/. If you decide to become an active nxml mode user, you may want to join a related Yahoo Group discussion list (see http://groups.yahoo.com/group/emacs-nxml-mode/). In this section, we change our running HTML example to XHTML, first using a RELAX NG schema and nxml mode. Open dickens.html, then enter nxml mode.
nxml mode tells you what schema it is using in the minibuffer. It's smart enough to know that its XHTML schema is best for this purpose. The mode line tells us that this file is currently invalid. Emacs highlights errors with red underscores. Let's deal with these errors one at a time.
Editing XHTML with a schema requires a namespace definition in the <html> tag. nxml mode knows what we need. This is a good time to use nxml's completion feature to let it supply the details for us. C-Enter completes the current tag.
The mode line tells us that this file is still invalid. Moving to the underlined address tag gives us a fairly cryptic reason; it says, Element not allowed in this context. Let's move down to the closing body tag to see if that error provides any more insight into the problem.
This message provides a clue. Although HTML authors are not accustomed to adding closing tags to paragraphs, XHTML requires them. Let's insert a closing tag after our paragraph.
Note that just typing </ was adequate to insert a closing tag for the current element. We don't need to type C-Enter to invoke completion. That's because in nxml mode, slash is bound to nxml-electric-slash. It automatically completes the nearest open element, another shortcut for us. A similar command is C-c C-f (for nxml-finish-element). With C-c C-f, you don't have to type anything; it inserts the relevant closing tag for you. Look at the mode line now. It says valid. Using nxml mode, it's not too tough to take an HTML file and change it to valid XHTML. Validating text as you type it is a key feature of nxml mode. It's validating against a schema. To specify a different schema, type C-c C-s (for rng-set-schema-and-validate). The minibuffer prompts for the file where the schema resides. A number of schemas can be found online at http://www.relaxng.org/#schemas. You can also convert DTDs to schemas using tools listed on that page. Your menus vary depending on whether you install nxml mode directly or whether you use TEI's version. TEI provides support for encoded characters using the UniChar menu. It also provides extensive XSLT support. TEI's NXML menu includes some TEI skeletons as well as nxml mode options. Nxml mode installed from thaiopensource.org includes an XML menu with options for setting the schema and customizing the mode. Table 8-7 lists some of the commands available in nxml mode.
8.4.4 Using psgml ModeLennart Stafflin's psgml mode has been around for a while. It is more robust than Emacs's own SGML mode, but, like any add-on, you have to install it in order to use it. Either install TEI Emacs as described earlier or download psgml mode from http://www.lysator.liu.se/projects/about_psgml.html and follow the installation instructions there. TEI Emacs includes a functioning psgml mode, so if you've installed TEI Emacs, your setup work is done. psgml mode consists of two parts: sgml-mode for writing SGML and xml-mode for writing XML (and in our case XHTML).
The *SGML LOG* window displays messages about this session. (If it doesn't appear immediately, click on the first character in the file.) The log buffer complains that it could not find an external entity called html. This file has been changed to work with the XHTML RELAX NG schema. psgml mode expects it to conform to an XHTML DTD. To get started with the (minimal) work needed to undertake the transformation from a schema-based file to a DTD-based file, we ask psgml to normalize the buffer.
More needs to be done, however. The first statements in an XHTML file include an XML statement and a DOCTYPE entry that identifies the DTD this document should be validated against. One of the nice things about TEI Emacs is that it includes a variety of DTDs. (Users of standard psgml mode don't have this feature; sorry.[11])
That's all it takes to make this file a well-formed XHTML file. psgml mode allows for validation against the DTD. Let's validate it using C-c C-v to make sure it's okay.
Of course, typical documents are far more complex than this one. Options on the View menu provide selective hiding and showing of elements, including an option to hide all tags, allowing you to focus on the content of the file instead. psgml mode also offers numerous options. If you are running TEI Emacs, you'll find the File Options and User Options submenus on the XML/SGML menu. If you've installed psgml mode standalone, you'll find them on the SGML menu. Table 8-8 summarizes some of the psgml commands.
|