8.4. Writing XML


Writing XML involves entering structured information that complies with a document type definition or schema. Even within Emacs, the XML support you receive varies. At the low end of the spectrum, there is plain vanilla Fundamental mode. It provides simply a screen where you type. Specialized modes like SGML mode provide support for entering tags, as we saw earlier in our discussion of HTML mode, a derivative of SGML mode. But neither of these approaches help you parse or validate XML (SGML mode has a command for validating, but it is tricky to set up correctly). More advanced Lisp packages, though currently not included in Emacs, are available to provide these functions. These add-on packages provide validation against DTDs or schemas, parsing capabilities, and, typically, an array of standard DTDs and schema definitions. In Emacs, these tools primarily work in conjunction with one of two major modes. psgml mode validates XML (and SGML) against DTDs. The newer nxml mode validates against RELAX NG schemas. We cover both of these options in this section. Before we go into detail on those modes, however, let's look briefly what Emacs has built-in with SGML mode.

8.4.1 Writing XML with SGML Mode

Emacs's own SGML mode provides support for entering tags. We covered much of this earlier under HTML mode, so we provide just one brief example here. Inserting, hiding, and showing tags are especially helpful features provided by SGML mode.

Let's look at a chapter on enumerated types by Java in a Nutshell author David Flanagan. This chapter uses the DocBook DTD.

Initial state:

Editing a document that uses the DocBook DTD (Mac OS X).


Note that Emacs displays XML on the mode line. XML mode in this context is a subset of SGML mode. Actually, despite this name, all the commands in this mode start with sgml, not xml. The menu of relevant commands is called SGML as well. Emacs doesn't pretend to have extensive XML support.

We want to insert a paragraph before the first paragraph.

Add a blank line following the title and type: C-c C-t

Emacs inserts an open angle bracket and prompts for the tag name (Mac OS X).


Type: para Enter

Emacs inserts opening and closing paragraph tags (Mac OS X).


Note that Emacs is not following our indentation style. We can correct it by moving to the beginning of the line and pressing Tab. See Table 8-4 earlier in this chapter for details on SGML mode commands.

8.4.2 TEI Emacs: XML Authoring for Linux and Windows

The Text Encoding Initiative (TEI) wanted an XML authoring environment for Emacs, so it created (the somewhat misleadingly named) TEI Emacs.[9] Despite its name, TEI Emacs does not include Emacs itself. Rather, it creates an authoring environment for writing XML using nxml mode or psgml mode. It incorporates XSLT tools, along with most of the standard DTDs, such as the three forms of XHTML DTDs (strict, frameset, and transitional), DocBook DTDs, and more. Naturally, the TEI's own DTDs and schemas are also included.

[9] We'd like to thank Emacs guru Eric Pement for pointing out TEI Emacs to Deb.

The active development of this tool and its careful packaging led us to describe this tool despite the fact that it is limited to Linux and Windows at this writing.[10] You should have Emacs 21.3 already installed before you install this tool. Installing TEI Emacs is trivial. The Windows version has an installer, and Linux users follow simple instructions at http://www.tei-c.org/Software/tei-emacs/, the web site for downloading TEI Emacs.

[10] We sincerely hope that this support will be extended to Mac OS X as well, providing developers and writers on that platform the benefits of this tool's capabilities. Meanwhile, Mac users may want to install nxml mode from http://thaiopensource.com/download/ and psgml mode from http://www.lysator.liu.se/projects/about_psgml.html.

8.4.3 Writing XHTML Using nxml Mode

James Clark, an XML pioneer, wrote nxml mode to provide Emacs support for his schema standard RELAX NG. For details on the standard, visit http://www.relaxng.org/ or pick up a copy of RELAX NG by Eric van der Vlist (O'Reilly). The important thing about nxml mode is that it validates text as you type instead of making validation and debugging separate steps.

If you did not install TEI Emacs, you can download nxml mode and its schemas from http://thaiopensource.com/download/. If you decide to become an active nxml mode user, you may want to join a related Yahoo Group discussion list (see http://groups.yahoo.com/group/emacs-nxml-mode/).

In this section, we change our running HTML example to XHTML, first using a RELAX NG schema and nxml mode. Open dickens.html, then enter nxml mode.

Type: C-x C-f dickens.html Enter M-x nxml-mode Enter

Editing dickens.html in nxml mode.


nxml mode tells you what schema it is using in the minibuffer. It's smart enough to know that its XHTML schema is best for this purpose.

The mode line tells us that this file is currently invalid. Emacs highlights errors with red underscores. Let's deal with these errors one at a time.

Move the cursor to the red underscore at the end of the html tag.

The minibuffer describes what's missing.


Editing XHTML with a schema requires a namespace definition in the <html> tag. nxml mode knows what we need. This is a good time to use nxml's completion feature to let it supply the details for us. C-Enter completes the current tag.

Type: Space xmlns=" C-Enter

Emacs inserts the rest of the namespace declaration.


The mode line tells us that this file is still invalid. Moving to the underlined address tag gives us a fairly cryptic reason; it says, Element not allowed in this context. Let's move down to the closing body tag to see if that error provides any more insight into the problem.

Move to </body>.

The minibuffer says Missing end-tag "p" .


This message provides a clue. Although HTML authors are not accustomed to adding closing tags to paragraphs, XHTML requires them. Let's insert a closing tag after our paragraph.

Move to the line following the Dickens paragraph and type: </

Emacs inserts a closing tag.


Note that just typing </ was adequate to insert a closing tag for the current element. We don't need to type C-Enter to invoke completion. That's because in nxml mode, slash is bound to nxml-electric-slash. It automatically completes the nearest open element, another shortcut for us.

A similar command is C-c C-f (for nxml-finish-element). With C-c C-f, you don't have to type anything; it inserts the relevant closing tag for you.

Look at the mode line now. It says valid. Using nxml mode, it's not too tough to take an HTML file and change it to valid XHTML.

Validating text as you type it is a key feature of nxml mode. It's validating against a schema. To specify a different schema, type C-c C-s (for rng-set-schema-and-validate). The minibuffer prompts for the file where the schema resides. A number of schemas can be found online at http://www.relaxng.org/#schemas. You can also convert DTDs to schemas using tools listed on that page.

Your menus vary depending on whether you install nxml mode directly or whether you use TEI's version. TEI provides support for encoded characters using the UniChar menu. It also provides extensive XSLT support. TEI's NXML menu includes some TEI skeletons as well as nxml mode options. Nxml mode installed from thaiopensource.org includes an XML menu with options for setting the schema and customizing the mode. Table 8-7 lists some of the commands available in nxml mode.

Table 8-7. Nxml mode commands

Keystrokes

Command name

Action

C-Enter

nxml-complete

Complete the current tag.

/

nxml-electric-slash

Add a closing tag for the last open element.

C-c C-n

rng-next-error

Move to the next error.

C-c C-l

rng-save-schema-location

Creates (or updates) a file called schemas.xml in your home directory. This file associates schemas with files.

C-c C-s

rng-set-schema-and-validate

Set the schema and validate against it.

C-c C-a

rng-auto-set-schema

Set the schema automatically according to the contents of the file.

C-c C-w

rng-what-schema

Show in the minibuffer the current schema associated with this file.

C-c C-v

rng-validate-mode

Toggles whether the mode line indicates that the file is valid or invalid.

C-c C-u

nxml-insert-named-char

Insert a named character; press Tab to see a list.

(none)

nxml-insert-xml-declaration

Insert an XML declaration at the beginning of the file.

C-c Tab

nxml-balanced-close-start-tag-inline

Insert the ending tag for the starting tag you are typing, putting the ending tag on the current line.

C-c C-b

nxml-balanced-close-start-tag-block

Insert the ending tag for the starting tag you are typing, putting the ending tag on a separate line.

C-c C-f

nxml-finish-element

Finish the current element.

M-h

nxml-mark-paragraph

Mark the current paragraph.

M-}

nxml-forward-paragraph

Move forward one paragraph.

M-{

nxml-backward-paragraph

Move back one paragraph.

C-M-p

nxml-backward-element

Move back one element.

C-M-n

nxml-forward-element

Move forward one element.

C-M-d

nxml-down-element

Move down one element (if nested).

C-M-u

nxml-backward-up-element

Move up one element (if nested).


8.4.4 Using psgml Mode

Lennart Stafflin's psgml mode has been around for a while. It is more robust than Emacs's own SGML mode, but, like any add-on, you have to install it in order to use it. Either install TEI Emacs as described earlier or download psgml mode from http://www.lysator.liu.se/projects/about_psgml.html and follow the installation instructions there. TEI Emacs includes a functioning psgml mode, so if you've installed TEI Emacs, your setup work is done.

psgml mode consists of two parts: sgml-mode for writing SGML and xml-mode for writing XML (and in our case XHTML).

To start psgml mode to edit our XHTML file, type M-x xml-mode.

XML appears on the mode line and an *SGML LOG* window opens. If you are using TEI Emacs, XSLT appears on the mode line along with XML.


The *SGML LOG* window displays messages about this session. (If it doesn't appear immediately, click on the first character in the file.) The log buffer complains that it could not find an external entity called html. This file has been changed to work with the XHTML RELAX NG schema. psgml mode expects it to conform to an XHTML DTD. To get started with the (minimal) work needed to undertake the transformation from a schema-based file to a DTD-based file, we ask psgml to normalize the buffer.

Type: M-x sgml-normalize or select Normalize from the Modify menu

psgml mode eliminates the namespace declaration in the <html> tag.


More needs to be done, however. The first statements in an XHTML file include an XML statement and a DOCTYPE entry that identifies the DTD this document should be validated against. One of the nice things about TEI Emacs is that it includes a variety of DTDs. (Users of standard psgml mode don't have this feature; sorry.[11])

[11] A straightforward introduction to setting up a complete environment for psgml mode can be found at http://openacs.org/doc/openacs-5-0-0/psgml-mode.html.

At the beginning of the file, select DTD Insert DTD XHTML Transitional.

Emacs inserts the two required elements for us.


That's all it takes to make this file a well-formed XHTML file. psgml mode allows for validation against the DTD. Let's validate it using C-c C-v to make sure it's okay.

Type: C-c C-v

psgml mode inserts the default validate command in the minibuffer; press Enter to run it.


Press Enter and type y to save the buffer when prompted

The *compilation* buffer indicates (somewhat cryptically) that the document is valid.


Of course, typical documents are far more complex than this one. Options on the View menu provide selective hiding and showing of elements, including an option to hide all tags, allowing you to focus on the content of the file instead.

psgml mode also offers numerous options. If you are running TEI Emacs, you'll find the File Options and User Options submenus on the XML/SGML menu. If you've installed psgml mode standalone, you'll find them on the SGML menu. Table 8-8 summarizes some of the psgml commands.

Table 8-8. Bindings in psgml mode

Keystrokes

Command name

Action

C-M-Space

sgml-mark-element

Mark the current element.

M-Tab

sgml-complete

Complete the current tag.

C-M-t

sgml-transpose-element

Transpose two elements.

C-M-h

sgml-mark-current-element

Mark the current element.

C-M-k Modify

sgml-kill-element

Delete the current element (and any child elements).

C-M-u Move

sgml-backward-up-element

Move up to the parent element for this element.

C-M-d Move

sgml-down-element

Move down to the next child element.

C-M-b Move

sgml-backward-element

Move to the previous element.

C-M-f Move

sgml-forward-element

Move to the next element.

C-M-eMove

sgml-end-of-element

Move to the end of the current element.

C-M-a Move

sgml-beginning-of-element

Move to the beginning of the current element.

C-c C-w SGML

sgml-what-element

Similar to sgml-position but describes hierarchy in terms of tags versus content (for example, start-tag in title in head in html).

C-c C-v SGML

sgml-validate

Insert validation command in the minibuffer so you can modify it if necessary before pressing Enter to execute it.

C-c C-t SGML

sgml-list-valid-tags

List tags that are valid in the current context.

C-c C-q Modify

sgml-fill-element

Fill element according to the mode's indentation rules.

C-c C-o Move

sgml-next-trouble-spot

Find the next problem spot and display the problem in the minibuffer.

C-c C-n Move

sgml-up-element

Move to the parent element.

C-c Enter

sgml-split-element

Split current element.

C-c C-l SGML

sgml-show-or-clear-log

Display or delete the SGML LOG buffer (menu option name is misleading).

C-c C-k Modify

sgml-kill-markup

Delete current tag.

C-c / Markup

sgml-insert-end-tag

Insert closing tag for current tag.

C-c - Modify

sgml-untag-element

Delete the current tag pair.

C-c # Modify

sgml-make-character-reference

Change character under the cursor to the equivalent entity.

C-c C-f C-e View

sgml-fold-element

Hide the current element and its children if any.

C-c C-u C-e View

sgml-unfold-element

Show the current element and its children if any.

C-c C-f C-s View

sgml-fold-subelement

Hide subelements.

C-c C-f C-r View

sgml-fold-region

Hide the region.

C-c C-u C-a View

sgml-unfold-all

Show all hidden tags and text.




Learning GNU Emacs
Learning GNU Emacs, Third Edition
ISBN: 0596006489
EAN: 2147483647
Year: 2003
Pages: 161

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net