4.4 RELAX NG


RELAX NG is a powerful schema validation language that builds on earlier work including RELAX and TREX. Like W3C Schema, it uses XML syntax and supports namespaces and data typing. It goes further by integrating attributes into content models, which greatly simplifies the structure of the schema. It offers superior handling of unordered content and supports context-sensitive content models.

In general, it just seems easier to write schemas in RELAX NG than in W3C Schema. The syntax is very clear, with elements like zeroOrMore for specifying optional repeating content. Declarations can contain other declarations, leading to a more natural representation of a document's structure.

Consider the simple schema in Example 4-7 which models a document type for logging work activity. It's easy to read this schema and understand the structure of a typical document.

Example 4-7. A simple RELAX NG schema
 <element name="worklog"          xmlns="http://relaxng.org/ns/structure/1.0"          xmlns:ann="http://relaxng.org/ns/compatibility/annotations/1.0">   <ann:documentation>A document for logging work activity, broken down          into days, and further into tasks.</ann:documentation>   <zeroOrMore>     <element name="day">       <attribute name="date">         <text/>       </attribute>       <zeroOrMore>         <element name="task">           <element name="description">             <text/>           </element>           <element name="time-start">             <text/>           </element>           <element name="time-end">             <text/>           </element>         </element>       </zeroOrMore>     </element>   </zeroOrMore> </element> 

The same thing would look like this as a DTD:

 <!ELEMENT worklog (day*)> <!ELEMENT day (task*)> <!ELEMENT task (description, time-start, time-end)> <!ELEMENT description #PCDATA> <!ELEMENT time-start #PCDATA> <!ELEMENT time-end #PCDATA> <!ATTLIST day date CDATA #REQUIRED> 

Although the DTD is more compact, it relies on a special syntax that is decidedly not XML-ish. RELAX NG accomplishes the same thing with more readability.

RELAX NG also offers a compact syntax that looks somewhat like a DTD but offers all the features of RELAX NG. For a brief introduction, see http://www.xml.com/pub/a/2002/06/19/rng-compact.html. James Clark's Trang program, available at http://www.thaiopensource.com/relaxng/trang.html, makes it easy to convert between RELAX NG, RELAX NG Compact Syntax, and DTDs, as well as create W3C XML Schema from any of these formats.

The basic component of a RELAX NG schema is a pattern . A pattern denotes any construct that describes the order and types of structure and content. It can be an element declaration, an attribute declaration, character data, or any combination. Elements in the schema are used to group , order, and parameterize these patterns.

Note that any element or attribute in a namespace other than the RELAX NG namespace ( http://relaxng.org/ns/structure/1.0 ) is simply ignored by the parser. That gives us a mechanism for putting in comments or annotations, which explains why I created the ann namespace in the previous example.

4.4.1 Elements

The element construct is used both to declare an element and to establish where the element can appear (when placed inside another element declaration). For example, the following schema declares three elements, report , title , and body , and specifies that the first element contains the other two in the exact order and number that they appear:

 <element name="report"       xmlns="http://relaxng.org/ns/structure/1.0">   <element name="title">     <text/>   </element>   <element name="body">     <text/>   </element> </element> 

Whitespace between these elements is allowed, as it would be for a DTD. The text element, which is always empty, restricts the content of the inner elements to character content.

4.4.1.1 Repetition

To allow for repeating children, RELAX NG provides two modifier elements, zeroOrMore and oneOrMore . They function like DTD's star ( * ) and plus ( + ) operators, respectively. In this example, the body element has been modified to allow an arbitrary number of para elements:

 <element name="report"       xmlns="http://relaxng.org/ns/structure/1.0">   <element name="title">     <text/>   </element>   <element name="body">     <zeroOrMore>       <element name="para">         <text/>       </element>     </zeroOrMore>   </element> </element> 
4.4.1.2 Choices

The question mark ( ? ) operator in DTDs means that an element is optional (zero or one in number). In RELAX NG, you can achieve that effect with the optional modifier. For example, this schema allows you to insert an optional authorname element after the title :

 <element name="report"       xmlns="http://relaxng.org/ns/structure/1.0">   <element name="title">     <text/>   </element>   <optional>     <element name="authorname">       <text/>     </element>   </optional>   <element name="body">     <text/>   </element> </element> 

It is also useful to offer a choice of elements. Corresponding to DTD's vertical bar ( ) operator is the modifier choice . Here, we require either an authorname or a source element after the title :

 <element name="report"       xmlns="http://relaxng.org/ns/structure/1.0">   <element name="title">     <text/>   </element>   <choice>     <element name="authorname">       <text/>     </element>     <element name="source">       <text/>     </element>   </choice>   <element name="body">     <text/>   </element> </element> 

This declaration combines choice with zeroOrMore to create a container that can have mixed content (text plus elements, in any order):

 <element name="paragraph"       xmlns="http://relaxng.org/ns/structure/1.0">   <zeroOrMore>     <choice>       <text/>       <element name="emphasis">         <text/>       </element>     </choice>   </zeroOrMore> </element> 
4.4.1.3 Grouping

For a required sequence of children, you can use the group modifier, which functions much like parentheses in DTDs. For example, here the (now required) authorname is either plain text or a sequence of elements:

 <element name="report"       xmlns="http://relaxng.org/ns/structure/1.0">   <element name="title">     <text/>   </element>   <element name="authorname">     <choice>       <text/>       <group>         <element name="first"><text/></element>         <element name="last"><text/></element>       </group>     </choice>   </element>   <element name="body">     <text/>   </element> </element> 

The group container is necessary because without it the first and last elements would be part of the choice and become mutually exclusive.

DTDs provide no way to require a group of elements in which order is not significant but contents are required. RELAX NG provides a container called interleave which does just that. It requires all the children to be present, but in any order. In the following example, title can come before authorname , or it can come after:

 <element name="report"       xmlns="http://relaxng.org/ns/structure/1.0">   <interleave>     <element name="title">       <text/>     </element>     <element name="authorname">       <text/>     </element>   </interleave>   <element name="body">     <text/>   </element> </element> 
4.4.1.4 Nonelement content descriptors

The text content descriptor is only one of several options for describing non-element content. Here's the full assortment:

Name

Content

empty

No content at all

text

Any string

value

A predetermined value

data

Text following a specific pattern (datatype)

list

A sequence of values

The empty marker precludes any content. With this declaration, the element bookmark is not allowed to appear in any form other than as an empty element:

 <element name="bookmark">   <empty/> </element> 

RELAX NG provides the value descriptor for matching a string of characters . For example, here is an enumeration of values for a size element:

 <element name="size">   <choice>     <value>small</value>     <value>medium</value>     <value>large</value>   </choice> </element> 

By default, value normalizes the string, removing extra space characters. The example element below would be accepted by the previous declaration:

 <size>  small    </size> 

If you want to turn off normalization and require exact string matching, you need to add a type="string " attribute. The following declaration would reject the above element's content because of its extra space:

 <element name="size">   <choice>     <value type="string">small</value>     <value type="string">medium</value>     <value type="string">large</value>   </choice> </element> 

The most interesting content descriptor is data . This is the vehicle for using datatypes in RELAX NG. Its type attribute contains the name of a type defined in a datatype library. (Don't worry about what that means yet, we'll get to it in a moment.) The content of the element declared here is set to be an integer value:

 <element name="font-size">   <data type="integer"/> </element> 

One downside to using data is that it can't be mixed with elements in content, unlike text .

The list descriptor contains a sequence of space-separated tokens. A token is a special type of string consisting only of nonspace characters. Token lists are a convenient way to represent sets of discrete data. Here, one is used to encode a set of numbers :

 <element name="vector">   <list>     <oneOrMore>       <data type="float"/>     </oneOrMore>   </list> </element> 

Here is an acceptable vector :

 <vector>44.034 19.0 -65.33333</vector> 

Note how the oneOrMore descriptor works just as well with text as it does with elements. It's yet another example of how succinct and flexible RELAX NG is.

4.4.2 Data Typing

Although RELAX NG supports datatyping, the specification only includes two built-in types: string and token . To use other kinds of datatypes, you need to import them from another specification. You do this by setting a datatypeLibrary attribute like so:

 <element name="font-size">   <data type="integer"         datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"/> </element> 

This will associate the datatype definitions from the W3C Schema specification with your schema. The datatypes you can use depend on the implementation of your RELAX NG validating parser.

It isn't so convenient to put the datatypeLibrary attribute in every data element. The good news is it can be inherited from any ancestor in the schema. Here, we declare it once in an element declaration, and all the data descriptors inside call from that library:

 <element name="rectangle"       xmlns="http://relaxng.org/ns/structure/1.0"       datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">   <element name="width">     <data type="double"/>   </element>   <element name="height">     <data type="double"/>   </element> </element> 
4.4.2.1 String and token

Both string and token match arbitrary strings of legal XML character data. The difference is that token normalizes whitespace and string keeps whitespace as is. They correspond to the datatypes value and fixed , respectively.

4.4.2.2 Parameters

Some datatypes allow you to specify a parameter to further restrict the pattern. This is expressed with a param element as a child of the data element. For example, the element below restricts its content to a string of no more than eight characters:

 <element name="username"       xmlns="http://relaxng.org/ns/structure/1.0">   <data type="string">     <param name="maxLength">8</param>   </data> </element> 

4.4.3 Attributes

Attributes are declared much the same way as elements. In this example, we add a date attribute to the report element:

 <element name="report"       xmlns="http://relaxng.org/ns/structure/1.0">   <attribute name="date">     <text/>   </attribute>   <element name="title">     <text/>   </element>   <element name="body">     <text/>   </element> </element> 

Unlike elements, the order of attributes is not significant. In this next example, the attributes for any alert element can appear in any order, even though we are not using a choice element:

 <element name="alert"       xmlns="http://relaxng.org/ns/structure/1.0">   <attribute name="priority">     <value>emergency</value>     <value>important</value>     <value>warning</value>     <value>notification</value>   </attribute>   <attribute name="icon">     <value>bomb</value>     <value>exclamation-mark</value>     <value>frown-face</value>   </attribute>   <element name="body">     <text/>   </element> </element> 

Element and attribute declarations are interchangeable. Here, we use a choice element to provide two cases: one with an icon attribute and another with an icon element. The two are mutually exclusive:

 <element name="alert"       xmlns="http://relaxng.org/ns/structure/1.0">   <choice>     <attribute name="icon">       <value>bomb</value>       <value>exclamation-mark</value>       <value>frown-face</value>     </attribute>     <element name="icon">       <value>bomb</value>       <value>exclamation-mark</value>       <value>frown-face</value>     </element>   </choice>   <element name="body">     <text/>   </element> </element> 

Another difference with element declarations is that there is a shorthand form in which the lack of any content information defaults to text. So, for example, this declaration:

 <element name="emphasis">   <attribute name="style"/> </element> 

...is equivalent to this:

 <element name="emphasis">   <attribute name="style">     <text/>   </attribute> </element> 

This interchangeability between element and attribute declarations makes the schema language much simpler and more elegant.

4.4.4 Namespaces

RELAX NG is fully namespace aware. You can include namespaces in any name attribute using the xmlns attribute:

 <element name="poem"         xmlns="http://relaxng.org/ns/structure/1.0">         xmlns:foo="http://www.mystuff.com/commentary">   <optional>     <attribute name="xml:space">       <choice>         <value>default</value>         <value>preserve</value>       </choice>     </attribute>   </optional>   <zeroOrMore>     <choice>       <text/>       <element name="foo:comment"><text/></element>     </choice>   </zeroOrMore> </element> 

Add the attribute ns to any element or attribute declaration to set an implicit namespace context. For example, this declaration:

 <element name="vegetable" ns="http://www.broccoli.net"       xmlns="http://relaxng.org/ns/structure/1.0">   <empty/> </element> 

would match either of these:

 <food:vegetable xmlns:food="http://www.broccoli.net"/> <vegetable xmlns="http://www.broccoli.net"/> 

...but fail to match these:

 <vegetable/> <food:vegetable xmlns:food="http://www.uglifruit.org"/> 

The namespace setting is inherited, allowing you to set it once at a high level. Here, the inner element declarations for title and body implicitly require the namespace http://howtowrite. info :

 <element name="report" ns="http://howtowrite.info"       xmlns="http://relaxng.org/ns/structure/1.0">   <element name="title"><text/></element>   <element name="body"><text/></element> </element> 

4.4.5 Name Classes

A name class is any pattern that substitutes for a set of element or attribute types. We've already seen one, choice , which matches an enumerated set of elements and attributes. Even more permissive is the name class anyName , which allows any element or attribute type to have the described content model.

For example, this pattern matches any well- formed document:

 <grammar       xmlns="http://relaxng.org/ns/structure/1.0">   <start>     <ref name="all-elements"/>   </start>   <define name="all-elements">     <element>       <anyName/>       <!-- use in place of the "name" attribute -->       <zeroOrMore>         <choice>           <ref name="anyElement"/>           <text/>           <attribute><anyName/></attribute>         </choice>       </zeroOrMore>     </element>   </define> </grammar> 

The anyName appears inside the element instead of a name attribute. The zeroOrMore is required here because each name class element matches exactly one object.

The nsName class matches any element or attribute in a namespace specified by an ns attribute. For example:

 <element       xmlns="http://relaxng.org/ns/structure/1.0">   <nsName ns="http://fakesite.org" />   <empty/> </element> 

This will set any element in the namespace http://fakesite.org to be an empty element. If you leave out the ns attribute, nsName will inherit the namespace from the nearest ancestor that defines one. So this will also work:

 <element ns="http://fakesite.org"       xmlns="http://relaxng.org/ns/structure/1.0"> >   <nsName />   <empty /> </element> 

If you don't want to let everything through, trim down the set using except . Use it as a child to anyName or nsName to list classes of elements or attributes you don't want to allow. Here, only elements not in the current namespace are declared empty:

 <element ns="http://fakesite.org"       xmlns="http://relaxng.org/ns/structure/1.0">   <anyName>     <except>       <nsName />     </except>   </anyName>   <empty /> </element> 

The only place you cannot use a name class is as the child of a define element. This is wrong:

 <define name="too-ambiguous">   <anyName/> </define> 

We'll discuss define elements in the next section.

As this book was going to press, James Clark announced the Namespace Routing Language (NRL), which provides enormous flexibility for describing how content in different namespaces should be validated and processed . See http://www.thaiopensource.com/relaxng/nrl.html for more information and an implementation.

4.4.6 Named Patterns

The patterns we have seen so far are monolithic. All the declarations are nested inside one big one. This is fine for simple documents, but as complexity builds, it can be hard to manage. Named patterns allow you to move declarations outside of the main pattern, breaking up the schema into discrete parts that are more easily handled. It also allows for reusing patterns that recur in many places.

A schema that uses named patterns follows this layout:

 <grammar>   <start>   main pattern   </start>   <define name="   identifier   ">   pattern   </define>   more pattern definitions   </grammar> 

The outermost grammar element encloses both the main pattern and a set of named pattern definitions. It contains exactly one start element with the primary pattern, and any number of define elements, each defining a named pattern. Named patterns are imported into a pattern using a ref element. For example:

 <grammar       xmlns="http://relaxng.org/ns/structure/1.0">   <start>     <element name="report">       <ref name="head"/>       <ref name="body"/>     </element>   </start>   <define name="head">     <element name="title">       <text/>     </element>     <element name="authorname">       <text/>     </element>   </define>   <define name="body">     <zeroOrMore>       <element name="paragraph">         <text/>       </element>     </zeroOrMore>   </define> </grammar> 

The start element must contain exactly one pattern. However, a define may contain any number of children, since its contents will be copied into another pattern.

You can write a grammar to fit the style of DTDs, with one definition per element: [6]

[6] This is how DTDs can be mapped directly into RELAX NG schema. This kind of backward compatibility is important, since most people are still using DTDs. So this is a good way to upgrade to RELAX NG.

 <grammar       xmlns="http://relaxng.org/ns/structure/1.0">   <start>     <element name="report">       <ref name="title"/>       <ref name="authorname"/>       <zeroOrMore>         <ref name="paragraph"/>       </zeroOrMore>     </element>   </start>   <define name="title">     <element name="title">       <text/>     </element>   </define>   <define name="authorname">     <element name="authorname">       <text/>     </element>   </define>   <define name="paragraph">     <element name="paragraph">       <text/>     </element>   </define> </grammar> 
4.4.6.1 Recursive definitions

Recursive definitions are allowed, as long as the ref is enclosed inside an element . This pattern describes a section element that can contain subsections arbitrarily deep:

 <grammar       xmlns="http://relaxng.org/ns/structure/1.0">   <start>     <element name="report">       <element name="title"><text/></element>       <zeroOrMore>         <ref name="paragraph">       </zeroOrMore>       <zeroOrMore>         <ref name="section"/>       </zeroOrMore>     </element>   </start>   <define name="paragraph">     <element name="paragraph">       <text/>     </element>   </define>   <define name="section">     <element name="section">       <zeroOrMore>         <ref name="paragraph"/>       </zeroOrMore>       <zeroOrMore>         <ref name="section"/>       </zeroOrMore>     </element>   </define> </grammar> 

Failing to put the ref inside an element in a recursive definition would set up a logical infinite loop. So this is illegal:

 <define name="foo">   <choice>     <ref name="bar"/>     <ref name="foo"/>   </choice> </define> 

The order of definitions for named patterns doesn't matter. As long as every referenced pattern has a definition within the same grammar , everything will be kosher.

4.4.6.2 Aggregate definitions

Multiple pattern definitions with the same name are illegal unless you use the combine attribute. This tells the processor to merge the definitions into one, grouped with either a choice or an interleave container. The value of this attribute describes how to combine the parts. For example:

 <define name="block.class" combine="choice">   <element name="title">     <text/>   </element> </define> <define name="block.class" combine="choice">   <element name="para">     <text/>   </element> </define> 

...which is equivalent to this:

 <define name="block.class"       xmlns="http://relaxng.org/ns/structure/1.0">   <choice>     <element name="title">       <text/>     </element>     <element name="para">       <text/>     </element>   </choice> </define> 

The usefulness of aggregate definitions becomes more clear when used with patterns in other files.

4.4.7 Modularity

Good housekeeping of schemas often requires putting pieces in different files. Not only will it make parts smaller and easier to manage, but it allows them to be shared between schemas.

4.4.7.1 External references

The pattern externalRef functions like ref and uses the attribute href to locate the file containing a grammar. externalRef references the whole grammar , not a named pattern inside it.

Suppose we have a file section.rng containing this pattern:

 <grammar       xmlns="http://relaxng.org/ns/structure/1.0">   <start>     <ref name="section"/>   </start>   <define name="section">     <element name="section">       <zeroOrMore>         <ref name="paragraph"/>       </zeroOrMore>       <zeroOrMore>         <ref name="section"/>       </zeroOrMore>     </element>   </define>   <define name="paragraph">     <text/>   </define> </grammar> 

We can link it to a pattern in another file like this:

 <element name="report"       xmlns="http://relaxng.org/ns/structure/1.0">   <element name="title"><text/></element>   <oneOrMore>     <externalRef href="section.rng"/>   </oneOrMore> </element> 
4.4.7.2 Nested grammars

One consequence of external referencing is that grammars effectively contain other grammars. To prevent name clashes , each grammar has its own scope for named patterns. The named patterns in a parent are not automatically available to its child grammars. Instead, ref will only reference a definition from inside the current grammar .

To get around that limitation, you can use parentRef . It functions like ref but looks for definitions in the grammar one level up. For example, consider this case where two grammars reference each other. I am defining one element, para , as a paragraph that can include footnotes. The footnote element contains some number of para s. They are stored in files para.rng and footnote , respectively, and shown in Examples Example 4-8 and Example 4-9.

Example 4-8. para.rng
 <grammar       xmlns="http://relaxng.org/ns/structure/1.0">   <start>     <element name="para">       <zeroOrMore>         <choice>           <ref name="para.content"/>           <externalRef name="footnote.rng"/>         </choice>       </zeroOrMore>     </element>   </start>   <define name="para.content">     <text/>   </define> </grammar> 
Example 4-9. footnote.rng
 <grammar       xmlns="http://relaxng.org/ns/structure/1.0">   <start>     <element name="footnote">       <oneOrMore>         <parentRef name="para.content"/>       </oneOrMore>     </element>   </start> </grammar> 

The footnote pattern relies on its parent grammar to define a pattern for para .

4.4.7.3 Merging grammars

You can merge grammars from external sources by using include as a child of grammar . Like externalRef , include uses an href attribute to source in the definitions. However, it actually incorporates them in the same context, unlike externalRef which keeps scopes for named patterns separate.

One use for include is to augment an existing definition with more patterns. Suppose, for example, this pattern is located in block.rng :

 <grammar       xmlns="http://relaxng.org/ns/structure/1.0">   <start>     <ref name="block.class"/>   </start>   <define name="block.class">     <choice>       <element name="title">         <text/>       </element>       <element name="para">         <text/>       </element>     </choice>   </define> </grammar> 

I can add more items to this class by including it like so:

 <grammar       xmlns="http://relaxng.org/ns/structure/1.0">   <include href="block.rng">   <start>     <oneOrMore>       <element name="section">         <ref name="block.class"/>       </element>     </oneOrMore>   </start>   <define name="block.class" combine="choice">     <element name="poem">       <text/>     </element>   </define> </grammar> 

The combine attribute is necessary to tell the processor how to incorporate the new definition with the previous one imported from block.rng . Note that for multiply defined patterns of the same name, one is allowed to leave out the combine attribute, as is the case in the file block.rng .

4.4.7.4 Overriding imported definitions

You can override some definitions that you import by including new ones inside the include element. Say we have a file report.rng defined like this:

 <grammar       xmlns="http://relaxng.org/ns/structure/1.0">   <start>     <element name="report">       <ref name="head"/>       <ref name="body"/>     </element>   </start>   <define name="head">     <element name="title"><text/></element>   </define>   <define name="body">     <element name="section">       <oneOrMore>         <element name="para"><text/></element>       </oneOrMore>     </element>   </define> </grammar> 

We wish to import this grammar, but adjust it slightly. Instead of just a title , we want to allow a subtitle as well. Rather than rewrite the whole grammar, we can just redefine head :

 <grammar       xmlns="http://relaxng.org/ns/structure/1.0">   <include href="report.rng">     <define name="head">       <element name="title"><text/></element>       <optional>         <element name="subtitle"><text/></element>       </optional>     </define>   </include>   <start>     <ref name="report">   </start> </grammar> 

This is a good way to customize a schema to suit your own particular taste.

4.4.8 CensusML Example

In case you are curious , let's go back to the CensusML example from Section 4.3 and try to do it as a RELAX NG schema. The result is Example 4-10.

Example 4-10. A RELAX NG schema for CensusML
 <element name="census-record">          xmlns="http://relaxng.org/ns/structure/1.0"          datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">   <attribute name="taker">     <data type="integer">       <param name="minInclusive">1</param>       <param name="maxInclusive">9999</param>     </data>   </attribute>   <element name="date">     <data type="date"/>   </element>   <element name="address">     <interleave>       <element name="street"><text/></element>       <element name="city"><text/></element>       <element name="county"><text/></element>       <element name="postalcode">         <data type="string">           <param name="pattern">[0-9][0-9][0-9][A-Z][A-Z][A-Z]</param>         </data>       </element>     </interleave>   </element>   <oneOrMore>     <element name="person">       <interleave>         <attribute name="employed">           <choice>             <value>fulltime</value>             <value>parttime</value>             <value>none</value>           </choice>         </attribute>         <attribute name="pid">           <data type="integer">             <param name="minInclusive">1</param>             <param name="maxInclusive">999999</param>           </data>         </attribute>         <element name="age">           <data type="integer">             <param name="minInclusive">0</param>             <param name="maxInclusive">200</param>           </data>         </element>         <element name="gender">           <choice>             <value>male</value>             <value>gender</value>           </choice>         </element>         <element name="name">           <interleave>             <element name="first"><text/></element>             <element name="last"><text/></element>             <optional>               <choice>                 <element name="junior"><empty/></element>                 <element name="senior"><empty/></element>               </choice>             </optional>           </interleave>         </element>       </interleave>     </element>   </oneOrMore> </element> 

This schema certainly looks a lot cleaner than the W3C Schema version. Enumerations and complex types are much more clear. The grouping structures are very easy to read. Personally, I think RELAX NG is just more intuitive all around.



Learning XML
Learning XML, Second Edition
ISBN: 0596004206
EAN: 2147483647
Year: 2003
Pages: 139
Authors: Erik T. Ray

Similar book on Amazon

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net