Section 7.1. Using XML Documents


7.1. Using XML Documents

XML allows developers to create tag-based markup structures that are bound by a set of rules defined in a public specification. The actual content of any particular XML file is left undefined by the specification. Here's an example, orders.xml, that represents two orders made to a fictional online shopping site. Each order includes identifying information (an order number and a customer number), a shipping address, and one or more items. The shipping address encapsulates both the shipping method and the shipping destination, and each item includes an identifying number and the quantity ordered, as well as an optional handling instruction.

Most elements include an opening and closing tag, with the element attributes set in the opening tag. Some elements are "empty," such as the first example of the <item> tag. Empty elements terminate with /> instead of simply >, and don't need a separate closing tag. The significance of an empty tag is in either its attributes or its mere presence. The data is simple enough that the structure should be clear to the reader. Here's the actual XML:

 <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE orders SYSTEM "orders.dtd"> <orders>   <order idnumber="3123" custno="121312">     <shippingaddr method="Camel"><![CDATA[One Main St                                       Boston, MA 02112]]></shippingaddr>     <item idnumber="7231" quantity="13"/>     <item idnumber="1296" quantity="2">       <handling>Please embroider in a tasteful manner!</handling>     </item>   </order>   <order idnumber="3124" custno="12">     <shippingaddr method="FedEx"><![CDATA[285 York St.                                     New Haven, CT 06510]]></shippingaddr>     <item idnumber="12" quantity="8"/>   </order> </orders>

At the simplest level, an XML document must merely be well-formed, meaning that the document adheres to all of the syntax rules defined by the XML specification. These rules define the XML declaration on the first line and specify how tags may be formed and nested.[*] The requirements for a well-formed document don't include any particular XML tags except for the XML version and specify structure in only the broadest possible terms. So for most applications, simply knowing that a document is well-formed is not particularly helpful. Of course, one can specify that only files of a particular format should be used as input for a particular program, but without a way to define what that format should be and whether documents conform to it, an extensible markup language doesn't make a great deal of sense.

[*] And quite a bit more, including entity escape sequences, namespaces, valid character sets, and so on. But we don't have room here for a full discussion, so again, we recommend O'Reilly's Learning XML.

However, there is a solution. The second line of orders.xml specifies that the file should conform to a DTD. The DTD goes a step beyond the well-formed requirement and specifies the allowable XML tags, their formats, and the allowable structure. The DTD for the orders.xml file requires that all <order> tags be nested within an <orders> tag, all orders have at least one item and a shipping address, all items include identifier and quantity attributes, and so forth. Here's the DTD orders.dtd:

 <?xml version="1.0" encoding="UTF-8"?> <!ELEMENT orders (order+)> <!ELEMENT order (shippingaddr, item+)> <!ELEMENT shippingaddr (#PCDATA)> <!ELEMENT item (handling?)> <!ELEMENT handling (#PCDATA)> <!ATTLIST order      idnumber CDATA #REQUIRED      custno CDATA #REQUIRED > <!ATTLIST shippingaddr      method (FedEx | UPS | USPS | Camel) #REQUIRED > <!ATTLIST item      idnumber CDATA #REQUIRED      quantity CDATA #REQUIRED >

The XML layer of an application generally consists of one or more DTDs and a set of documents. The DTDs are written ahead of time, by an individual developer, a working group, an application server vendor, or other provider. Some documents, particularly those related to configuration and profiling tasks (such as the J2EE deployment descriptors), are edited by hand and read by software.

The previous example is more transaction-oriented. Documents like orders.xml would likely be generated by a purchasing frontend (such as a web site) and transmitted over the network to a fulfillment system (such as a corporate order tracking database) via HTTP, JMS, or some other transport layer. The receiving software reads the document and processes it, often without any human intervention at all. Standardized DTDs mean that the two sides of the exchange can easily be provided by different vendors.

7.1.1. XML Schema

You may have noticed in orders.dtd that the DTD standard is missing a few things. For one, the specification of data types for elements and their attributes is extremely limited. Your application may require that the contents of the idnumber attribute be a valid integer, for example. The closest you can get to this with a DTD is to declare the attribute as an enumerated type. But this is useful only if you can enumerate all of the allowed integer values for idnumber, which isn't possible in most cases. So typically the only option you'd have is to declare the attribute as character data, as we've done in the example, and then validate the attribute values in your application code. Things are even more limited for data within elements, where DTDs allow only character data, no data (empty), or completely unspecified content. Again, in most cases, you would need to do additional data validation within your application.

DTDs are also limiting in terms of entity namespaces and granularity. You can have only a single DTD associated with a given XML document, and that DTD applies to the entire document.

The XML Schema standard provides a much richer mechanism for describing the structure and content of an XML file. A schema defines a set of data types that is then used to define element and attribute types. One or more schemas can then be referenced in a given XML document and applied to specific elements in the document. This provides a much more flexible model than DTDs. The data type descriptions are more powerful, the namespace facility allows you to integrate multiple schemas into a single document, and schema types can be applied at the individual element or attribute level in your XML documents.

In addition, XML schema files are regular XML files, so they can be processed using standard XML-handling tools. This makes it much easier to write software that can respond to incoming XML dynamically.

An XML schema describing the orders.xml file might look something like this:

 <?xml version="1.0" encoding="UTF-8" standalone="no"?> <xs:schema xmlns="jent:xml-orders"            xmlns:xs="http://www.w3.org/2001/XMLSchema"            targetNamespace="jent:xml-orders">     <!-- An item contains a sequence of handling elements, as well          as idnumber and quantity attributes -->     <xs:complexType name="itemType">         <xs:sequence>             <xs:element name="handling" type="xs:string"                 minOccurs="0"/>         </xs:sequence>         <xs:attribute name="idnumber"                       type="xs:positiveInteger" use="required"/>         <xs:attribute name="quantity" type="xs:integer"                       use="required"/>     </xs:complexType>     <!-- A shippingaddr contains a method attribute, that must be          one of the enumerated values listed below, and contains          a single text element as a child.  -->     <xs:complexType name="shippingaddrType" mixed="true">         <xs:attribute name="method" use="required">             <xs:simpleType>                 <xs:restriction base="xs:NMTOKEN">                     <xs:enumeration value="USPS"/>                     <xs:enumeration value="Camel"/>                     <xs:enumeration value="FedEx"/>                     <xs:enumeration value="UPS"/>                 </xs:restriction>             </xs:simpleType>         </xs:attribute>     </xs:complexType>     <!-- An order is a sequence of items with shippingaddrs, with          idnumber and custno atttributes that are integer values -->     <xs:complexType name="orderType">         <xs:sequence>             <xs:element name="shippingaddr" type="shippingaddrType"/>             <xs:element name="item" type="itemType"                 maxOccurs="unbounded"/>         </xs:sequence>         <xs:attribute name="idnumber" type="xs:positiveInteger"             use="required"/>         <xs:attribute name="custno" type="xs:positiveInteger"             use="required"/>     </xs:complexType>     <!-- An orders element contains a sequence of order elements -->     <xs:complexType name="ordersType">         <xs:sequence>             <xs:element name="order" type="orderType"                         maxOccurs="unbounded"/>         </xs:sequence>     </xs:complexType>     <!-- Now define our root element, referencing the appropriate type          from the set defined above -->     <xs:element name="orders" type="ordersType"/> </xs:schema>

The schema representation is longer than the DTD representation, but it's also easier to read. The most important pieces are the <xs:element> and <xs:complexType> elements. Complex types in XML Schema are collections of element and attribute descriptions. These types are bound to XML elements using the type attribute on element declarations. In our example, the root element, orders is defined at the end of the schema, and it uses the type ordersType, which is defined by the complexType defined above it.[*] The type attribute on the element specifies that the data included in the element or attribute value should conform to a particular data type.

[*] This is a two-step process because we might want to use the same complex type definition for two or more different elements without having to copy and paste.

The complexType structure is the top of the food chain in terms of the data-typing facilities in XML Schema. In orders.xsd, we also constrain various elements and attributes using the <xs:string>, <xs:integer>, and <xs:positiveInteger> data types, standard data types specified in the XML Schema specification. XML Schema also allows you to define new data types based on existing data types, using the <xs:simpleType> and <xs:restriction> tags. These are used in the shippingaddrType to define the type for the name attribute, specified as an enumerated type using the built-in NMTOKEN type. Other restrictions are available for other data types. The xs:string data type, in particular, supports restrictions for minimum and maximum length and for regular expression matching, allowing for effectively unlimited control of data formatting.

As mentioned earlier, XML schemas are referenced within XML documents as a namespace that defines entities (elements and attributes) to be used in the document. Typically, this is done by specifying the namespace (and its corresponding schema) for the root element in your XML document. Adjusting our orders.xml document to refer to our XML schema would look like this:

 <?xml version="1.0" encoding="UTF-8"?> <orders xmlns="jent:xml-orders"         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"         xsi:schemaLocation="jent:xml-orders                             http://mycompany.com/orders.xsd">   <order idnumber="3123" custno="121312">     <shippingaddr method="Camel"><![CDATA[One Main St                                       Boston, MA 02112]]></shippingaddr>     <item idnumber="7231" quantity="13"/>     <item idnumber="1296" quantity="2">       <handling>Please embroider in a tasteful manner!</handling>     </item>   </order>   <order idnumber="3124" custno="12">     <shippingaddr method="FedEx"><![CDATA[285 York St.                                     New Haven, CT 06510]]></shippingaddr>     <item idnumber="12" quantity="8"/>   </order> </orders>

Note the new attributes on the <orders> element. xmlns is a standard attribute supported by all XML elements and is used to specify a namespace to be used by the element and its children. If you want or need to use a prefix with the elements and attributes referenced from the schema, you can specify the prefix by appending it (with a colon separator) to the xmlns attribute entry. We've done this with our orders.xml example when we import the XML Schema "instance" schema, a standard schema that defines some attributes that are useful when dealing with schemas in XML documents. We've imported the schema as a namespace using the xsi prefix. When we want to reference elements or attributes from this namespace, we use this prefix on their names. The schemaLocation attribute shows this in actionthis attribute comes from the "instance" schema, so we refer to it as xsi:schemaLocation.

The schemaLocation attribute we use on the orders element also demonstrates one way to specify the location of schema definition files. In the xmlns attribute, we specified our orders schema using a URN, jent:xml-orders. The schemaLocation attribute tells the XML parser how to resolve this URN to a physical schema file. In our case, we're telling the parser that the schema file is located at the URL http://mycompany.com/orders.xsd.

There's quite a lot more to XML Schema, but this should get you started. For a much more detailed introduction, we suggest XML Schema by Eric van der Vlist (O'Reilly).[*]

[*] There is also an excellent short tutorial on O'Reilly's XML.com at http://www.xml.com/pub/a/2000/11/29/schemas/part1.html.



Java Enterprise in a Nutshell
Java Enterprise in a Nutshell (In a Nutshell (OReilly))
ISBN: 0596101422
EAN: 2147483647
Year: 2004
Pages: 269

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net