Editing XML Schemas in XMLSPY | The Official XMLSPY Handbook

You have had a chance to learn how to navigate through Schema Design view by reviewing a prebuilt schema. Next, you build the Purchase Order Schema of the previous section starting from scratch with the XMLSPY Schema Editor. This will be a relatively light technical process initially, but you will delve into the technical details of XML Schema syntax later on in the chapter in the section “XML Schema Syntax.”

A major use of XML Schema is for describing business objects. The business objects are typically data elements (price, quantity, date, and so on) as well as prose-oriented, document-like elements (product descriptions, customer remarks, and so on). It is common for an XML Schema to contain both data and document-like elements. In the next example, you build a sample Purchase Order Schema. This is a very common example, as it could potentially be used in conjunction with other XML technologies in various business-to-business (B2B) application scenarios.

Choose File → New, and select XML Schema (.xsd). Save the file to your local hard drive as myorder.xsd. In the file, you should see an empty Schema Overview table. Double-click the text ENTER_NAME_OF_ROOT_ELEMENT_HERE and type Order because the principle element you are trying to build is the Purchase Order Schema. Then, double-click the Ann (annotation) column and type A purchase order schema. Your file should look like Figure 4-5.

Figure 4-5: Creating the Order element.

Any edits that you make in Schema Editing view are automatically reflected in the underlying source document. To verify this, switch back to Text view periodically so that you can see the code that XMLSPY is inserting for you as you are editing in Schema Design view. Here is the code listing for your sample XML Schema, thus far:

<?xml version=”1.0” encoding=”UTF-8”?> <xsd:schema xmlns:xsd=”http://www.w3.org/2001/XMLSchema” elementFormDefault=”qualified” attributeFormDefault=”unqualified”>    <xsd:element name=”Order”>       <xsd:annotation>          <xsd:documentation>A purchase order schema</xsd:documentation>       </xsd:annotation>    </xsd:element> </xsd:schema>

Using namespaces

In the partial XML Schema code listing shown in the preceding section, notice that the element names are all prefixed with the label xsd:, and the xsd:schema element declares an attribute called xmlns:xsd with the value http://www.w3.org/2001/XMLSchema . XML Schema makes extensive use of XML namespaces, a built-in mechanism for preventing naming collisions. A collision occurs when two symbols in an XML document (for example, names of elements, attributes, or entities) are defined using the same name, resulting in an error.

In our collaborative world today, it is likely that an XML application needs to process documents from different parties—a B2B exchange frequently processes Purchase Order Schemas from many companies, each representing its data elements in a different way. If used within the same document, the XML processing application requires a way to differentiate between the various data definitions. Consider the differences between the two hypothetical Order elements shown in the following code:

<Order>    <ShippingAddress>       <Street1>234 Anystreet</Street1>       ... <Order>

The preceding uses the Order element in a completely different way than it’s used to describe orders from the company president in the next code block:

<Order>    <from>Company President</from>    <to>IT Department</to>    <instructions>Convert website to XML-driven application Immediately</instructions>    ... </Order>

Clearly, confusing the two Order elements would likely result in unpredictable program behavior. XML namespaces help prevent naming collisions and require only minimal work in both the XML Schemas and the XML instance document itself. If you assume that the two different Order elements are defined in two different XML Schemas, the first step is to assign a distinct name to each XML Schema. The name that you assign to an XML Schema in order to uniquely identify and differentiate it from other schemas is known as a target namespace. Assume that the boss’s orders are described in an XML Schema with a target namespace of http://www.company.com/management, whereas the Purchase Order XML Schema is defined in the namespace http://www.company.com/sales. Specifying a target namespace in XML Schema ensures that your custom XML vocabulary is not confused with any other XML vocabulary, provided that the target namespace you choose is indeed unique.

Cross-Reference In Chapter 5, I show you an example of how you can build an XML Schema that imports type definitions from an externally defined XML Schema, using XML namespaces.

The second step in using namespaces is associating the namespace with a prefix in the XML instance document. In the previous example with the two conflicting Order elements (the boss’s order and the purchase order), you could define two short prefixes: mgmt for the order coming from the company management and sales for the order coming from the sales department. Then, you could associate the prefixes with their respective target namespaces. The two different Order elements are then qualified (that is, differentiated from each other) by prepending their respective prefixes to the element name, as in mgmt:Order and sales:Order.

There is nothing special about the prefixes mgmt and sales. They are just a non-permanent, abbreviated form of the full XML Schema target namespace—they are required to be unique in that specific instance document, but do not need to be unique across space and time. Therefore, using namespaces requires a two-step process: first, creating a globally unique namespace name (a target namespace) for your own XML vocabulary, and second, associating the namespace name with a short prefix. Listing 4-1 provides an example XML instance document that uses Order elements defined in two different XML Schemas.

Listing 4-1: An Instance Document That Uses Multiple Namespaces

<?xml version="1.0" encoding="UTF-8"?> <mgmt:Order xmlns:mgmt="http://www.altova.com/management"  xmlns:sales="http://www.altova.com/sales" xmlns:xsd="http://www.w3.org/2001/XMLSchema-instance" xsd:schemaLocation="http://www.altova.com/management management.xsd http://www.altova.com/sales sales.xsd" ... >         <mgmt:to>XML programmer</mgmt:to>         <mgmt:from>The Boss</mgmt:from>         ...         <sales:Order>      <sales:ProductName>XMLSPY 5 Enterprise Ed.</sales:ProductName>                    </sales:Order> </mgmt:Order>

Understanding Namespace Mappings and Prefixes

Although some parts of Listing 4-1 have been omitted for the sake of brevity, two different Order elements are used that are presumably defined in different XML Schemas (not shown here). What you do know is that the two XML Schemas are uniquely identified by their respective target namespaces http://www.altova. com/management/ and http://www.altova.com/sales. Figure 4-6 illustrates how you can associate the target namespaces of the two XML Schemas with the prefixes mgmt and sales.

click to expand
Figure 4-6: Declaring namespaces in an XML document.

You associate a target namespace with a namespace prefix by using a special attribute: xmlns: (XML namespace). The characters immediately following the xmlns: attribute indicate the prefix name, and the attribute’s value is the target namespace. You may introduce multiple namespaces anywhere within an XML document, although the scope of the namespace is implicitly determined by where you introduce it. If you want to use an element belonging to a particular namespace, the namespace must be introduced as an attribute in either the current element or in one of its ancestor elements. In the instance document example of Figure 4-6, I introduced three namespaces inside the mgmt:Order element: http://www.altova.com/sales (sales), http://www.altova.com/management (mgmt), and http://www.w3c.org/2001/XML-Schema-instance (xsi). The xsi namespace is used by the XML Schema processor to locate a physical location of the XML Schema file that corresponds to each namespace (in the form of a URI to a file on the network or local file system).

Two commonly used namespaces—http://www.w3.org/2001/XMLSchema and http://www.w3c.org/2001/XML-Schema-instance—do not require that you specify the path to a corresponding physical XML Schema file. These two namespaces are frequently used by XML Schema–aware processors. Consequently, the processor already has its own internal copy of these well-known schemas and knows where to locate them. In this book, I use either xsd or xs as the namespace prefix to refer to the http://www.w3.org/2001/XMLSchema namespace. It doesn’t matter what prefix you choose, provided that you are consistent within a single file.

Note

Although the concept of namespaces is relatively new to XML, namespaces are certainly not a new technological feature and are used extensively throughout most modern programming languages including Java Packages and C++ Namespaces.

Declaring a Target Namespace

Namespaces are guaranteed to prevent naming collisions only if the target namespace that you are using to name your XML Schema is globally unique across all XML Schemas. An XML Schema processor would fail if it happened to encounter two different schemas with the same target namespace. Because the whole point of a namespace name is to differentiate your XML vocabulary from other potentially similar XML vocabularies, you should pick a name that is unique, descriptive, and permanent. Figure 4-7 shows a possible namespace name for a schema example from this book.

click to expand
Figure 4-7: Declaring a target XML namespace name that is unique, descriptive, and permanent.

Because Internet domain names are, by definition, guaranteed to be unique, I recommend beginning your namespace with a domain name that you own or is registered to your company. When selecting your namespace, do not use a domain name that you do not own or is not affiliated with your company; this helps ensure that your chosen namespace is indeed unique. Next, include information such as your department name because, within a large organization, people in different departments could be building different XML Schemas. Append some information about the project for which you are using the XML Schema, followed by some kind of internal version number. Separate all this information by forward slashes. It is important to note that although a namespace’s structure resembles a URL, it does not necessarily correspond to any physical resource on the Web (although sometimes XML Schema authors do put a file at that location, but only because people confuse a namespace name with a Web resource and expect to find something there). Rather, your namespace is simply a string that is unlikely to be used by someone else and that provides some descriptive information about the schema author and the purpose of the XML Schema. Although the W3C does not provide any official guidance on how to name a namespace, I highly recommend this common sense approach to naming namespaces to avoid naming collisions.

To declare a target namespace for the Purchase Order Schema, choose Schema Design → Schema Settings, click the Target Namespace button, and type http://www.company.com/example/purchaseorder. If you type nothing at all in the prefix column, you are setting your target namespace as the default namespace. As a result, any unqualified (that is, unprefixed) elements and attributes that you define in the current XML Schema will belong to the http://www.company.com/ example/purchase namespace. XMLSPY automatically assigns xmlns:=”http:// www.company.com/examples/purchaseorder” as the target namespace for the current XML Schema file. Therefore, if you leave the prefix empty, there will be no prefix name after the xmlns: attribute, signifying the default namespace. By contrast, if you explicitly enter a prefix, such as myco, to be used in conjunction with the target namespace, XMLSPY generates xmlns:myco=”http://www.company.com/examples/purchaseorder” (substituting company.com to a domain name that you own or are affiliated with) as the target namespace assignment. In the event that you have explicitly specified a namespace prefix (that is, if the target namespace is not assigned to the default namespace), there are several other options in the Schema Settings panel, including the default element and attribute form. If you choose qualified, XMLSPY properly prefixes all user-defined elements and attributes with the specified namespace prefix.

Building up the Purchase Order XML Schema

Now that you know something about namespaces, I want you to return to building the Purchase Order Schema. Expand the Order element in Schema Design view so that it can be visually edited. The example purchase order should be a sequence of elements such as an address (billing and/or shipping), product type, and a note to convey any special instructions. Each element is graphically depicted as a node (a rectangular box) in a diagram. Because the Order element contains child elements, it is said to be a complex type.

Note

Elements are classified in two categories: simple types and complex types. Complex types refer to elements that have either attributes or child elements or both. An element that contains only body content is considered a simple type. All attribute values are simple types because attributes cannot contain nested attributes or elements.

To add child elements to the Order element, first select the Order element so that it is highlighted. Next, right-click anywhere on the screen and a pop-up menu appears. Choose Add Child → Sequence. A compositor appears immediately to the right of the Order element. A sequence compositor enables you to add other elements. Be mindful of the order, however, because by default, the order in which you define the elements in your XML Schema is the same order in which they must appear within an XML instance document in order to be valid. First, build an address component that can be used for both billing and shipping addresses. Right-click the sequence compositor and choose Add Child → Element. A new (empty) element node appears. Double-click the New Element node and type Address as the name of the element.

The new Address is a complex element because it contains other child elements such as street name and apartment number (abbreviated as Street1 and Street2), as well as City, State, and Zip. Add a sequence compositor to the Address element, select the compositor, and then add five child elements to the compositor under the Address element. Name the elements Address1, Address2, City, State, and Zip. The partial diagram is shown in Figure 4-8. Make sure that yours looks similar, but don’t worry if your diagram doesn’t already display the type information. You add that next.

click to expand
Figure 4-8: Building the Purchase Order Schema in Schema Design view.

Building Content Models with the XML Schema Editor

When you are building XML Schemas, the three most commonly used menu options are Add Child, Insert, and Append. These create new elements or compositors (sequence, choice, all, and so on). All three operations are performed relative to the currently selected node in the XML Schema Editing view, which is highlighted (typically in blue). The operations are accessible through a right-click menu.

The main difference between the three menu options is that the Add Child option creates the desired element or compositor at a lower depth than the currently selected node—in other words, farther to the right of the schema diagram. Insert and Append create the desired element or compositor at the same depth as the currently selected node, making the element or compositor a sibling of the currently selected node. In the case of Insert, the new node is placed on top of the currently selected node. In the case of Append, the new element or compositor is placed as the bottom-most element at the current level. Of course, there are many invalid menu options. For example, if you select the root element of a particular schema component as the current node, you cannot Insert or Append sibling elements or compositors because there can be only one root element per schema component. To make this clear, invalid menu selections are grayed out.

Take a moment to play around with the XML Schema Editor, trying out when it is possible to use the Add Child, Insert, or Append elements or compositors. Remember that if you make a mistake, you can always select the element or compositor and do one of the following: press the Delete key, drag it over to the correct location, or press Ctrl+Z (Undo).

So far, both Order and Address elements have been complex types because they both contain child elements. By contrast, the elements Street1, Street2, City, State, and Zip are known as simple types because they contain no attributes and no child elements—they contain only text content. Elements that are simple types must be assigned a corresponding data type that restricts the value space (that is, places constraints on what kind of textual information is considered to be valid). If a type declaration is omitted, the default value of anyType (an unconstrained value) is assumed. To assign a data type to an element, select the node, such as the Street1 element, on the Schema Editing page and choose from one of the many built-in XML Schema simple types. In the Details Entry Helper window, choose the xs:string type (see Figure 4-9).

Figure 4-9: Assigning one of the XML Schema built-in data types to the Street1 simple type.

Note

Figure 4-9 shows a partial listing of some of the many built-in XML Schema data types. In this screen shot, the XML Schema language constructs have all been prefixed with the xs prefix, which is the prefix associated with the http://www.w3.org/2001/XMLSchema namespace in this particular document. The xs: prefix has no meaning outside the context of this particular XML Schema, and xsd works just as well.

Assign a string data type to both Street2 and City. Addresses often require two lines, with the second line used for apartment numbers and so on. You can specify the second line as an optional element by selecting the Street2 element, right-clicking, and checking Optional. Alternatively, you can make this choice optional by setting the minOcc (minimum occurrence) equal to 0 in the Details Entry Helper window. Notice that the default for new elements is a minOcc equal to 1, which means that the element is required. The contents of the State element is also set to a string; but you can further restrict the value space of the string by specifying an enumeration of string tokens containing valid two-character U.S. state abbreviations:

Select the State element on the Schema Editing page.
From the Facets window, choose the Enumerations tab.
Enter valid enumeration tokens by clicking the Add New Enumeration button and typing a token value, such as CA, MA, NY, TX, or FL.

The Facets → Enumerations window contains three buttons and is shown in Figure 4-10. From left to right, the buttons are for appending, inserting, and deleting tokens from the enumeration.

Figure 4-10: Inserting, appending, and deleting possible enumeration values for the State element.

Finally, the Zip (U.S. postal code) element is set to be a string. However, you must further restrict the range of valid string values to exactly five characters, each being a number between zero and nine, with all leading zeros preserved. This requires the use of a regular expression, which is a string pattern-matching language discussed in Appendix C. Regular expressions are used for data validation, manipulation, conversion, and text extraction. Regular expressions in XML Schema are the same as regular expressions in the Perl programming language. Select the Zip element and in the Facets → Patterns Entry Helper window, insert a new pattern with the value [0-9]{5}. The numerical range of zero to nine is specified in square brackets, and the curly braces indicate that the pattern is to be repeated exactly five times. To remember this syntax, XMLSPY includes a built-in regular-expression builder in the Facets → Patterns Entry Helper window. Access the window by adding a pattern facet and then clicking the drop-down arrow. Some of the most common regular-expression–pattern syntaxes are displayed.

You have just completed one address structure, currently nested within the Order element (known as an anonymous complex element). However, you need two identical Address elements to represent the billing and shipping addresses, respectively. Rather than duplicate the effort of creating a similar address structure; you can simply convert the address structure into a global schema construct so that it can be reused. Select the Address element, right-click, and choose Make Global → Complex Type from the menu. XMLSPY automatically converts the anonymous sequence of elements into a global complex type called AddressType. This is graphically depicted by an orange box around the sequence of address elements. Switch back to the Schema Overview page, and you can see that there are now two schema components listed: AddressType and Order. Change back to the Schema Editing page and rename the element from Address to ShippingAddress, as shown in Figure 4-11.

click to expand
Figure 4-11: Modularizing the Purchase Order Schema.

Now that you have defined AddressType as a global complex type, you can add a BillingAddress element to the Order element by selecting the compositor to the right of the Order element, right-clicking (anywhere on the main editing screen), and choosing Add Child → Element. Type BillingAddress as the name of the element, and specify the type as AddressType from the Details Entry Helper window. The result should resemble the diagram shown in Figure 4-12.

click to expand
Figure 4-12: Declaring BillingAddress and ShippingAddress to be elements of type AddressType.

Next, create the Line-Items element, which is a sequence of one or more Product elements. Follow these steps to create the Line-Items element:

Select the compositor adjacent to the Order element, right-click, and choose Add Child → Element. Name the new element Line-Items.
Add a sequence compositor underneath Line-Items by selecting the Line-Items node, right-clicking, and choosing Add Child → Sequence.
Add an element to the newly added sequence compositor and choose Add Child → Element. Name the new element Product.
Click the Product element and add another sequence compositor.
Click the newly added sequence compositor (underneath Product). Then add the following elements and set their respective data type values: Description (xsd:string), Price (xsd:string), Quantity (xsd:positiveInteger), Ship-Date (xsd:date), Note (xsd:string).

You’ve completed the basic steps for creating a Line-Items element (an anonymous local element defined within the Order element). Follow these steps to place a few additional restrictions on the value spaces of the newly created elements to help ensure correct program behavior:

Restrict the Price element by specifying the following regular expression:
```
[0-9]{1,}\.[0-9]{2}
```
This represents a pattern of one to an unspecified number of numerical digits, followed by a period, followed by two more numerical digits. For example, 0.34, 4790.90, and 14.50 would all satisfy this regular expression.
Make Ship-Date and Note into optional elements by right-clicking and choosing Optional.

As with a DTD, an XML Schema has an easy way for you to specify the cardinality of a parent-child relationship. In the case of the Line-Items element, follow these steps to enforce the notion that the element must contain one to an unspecified number of products:

Expand the Order component on the Schema Editing page.
Click the compositor that joins Line-Items with Product, right-click, and select Unbounded from the pop-up menu.

You can add attributes to the Product element by following these steps:

Add an id attribute to the Product element by selecting Product on the Schema Editing page and clicking the Add New Attribute button located in the bottom frame (beneath the editing region). A new row appears.
Type id as the attribute name, specify the type as xsd:integer, and specify the use as required.
Add a department attribute, specify the type as xsd:string, and specify the use as optional.

Finally, you can modularize the XML Schema so that the Product complex type is defined as a global complex type and the Note element is a global element. By making these elements into either global types or global elements, they can subsequently be reused as building blocks for developing additional schema components. Although these two components are only declared once within the Order schema component, it is still a good idea to modularize your XML Schema to facilitate future development efforts. Follow these steps to perform the required modularization:

Right-click the anonymous Product element on the Schema Editing page and choose Make Global → Complex Type. This automatically creates a global complex element called ProductType. Now, instead of having a local anonymous element definition for Product located inside the Order element, change the Product element to declare itself to be of type ProductType. You can do this by selecting the Product element, and then, from the Details window, specify that the type is ProductType.
Right-click the Note element on the Schema Editing page and choose Make Global → Element.

Congratulations! You just built your first XML Schema with XMLSPY. You should now have something that looks like Figure 4-13. Make sure that you have four schema components on the Schema Overview page: Order, AddressType, ProductType, and Note.

click to expand
Figure 4-13: The completed Purchase Order Schema.

In this example, you took a top-down approach to building the XML Schema. In other words, you created the Order element (the document element) and then created all other XML Schema components used within the Order element. You could just as easily have taken a bottom-up approach by independently creating the smaller schema components (AddressType, ProductType, and Note), creating the Order element, and then simply referencing or declaring the smaller schema components from within the Order element’s definition. Both top-down and bottom-up XML Schema design concepts have their roots in proven object-oriented design techniques. Both are widely practiced and accepted as good programming practice throughout the software industry. In practice, I find that XMLSPY’s capability to convert any section of a complicated XML Schema component into a separate global schema component (converting an anonymous element to either a global element or a global complex type) makes the XML Schema Editor ideal for a top-down development style. Ultimately, however, XML Schema design strategy is left up to the developer.

Configuring XML Schema Design view

When you expand an XML Schema component on the Schema Editing page, XMLSPY graphically displays a component’s name in the element’s box representation (also known as a node). The view can be configured to display as few or as many facets (details) about each schema component as you want. For relatively small schemas like the Purchase Order Schema, I recommend configuring XML Schema Design view to automatically display the data type of each component as well as other additional information. You do this by choosing Schema Design → View Config. In the configuration panel, click the Add New Entry button (near the top-left of the panel), and a new row appears. Click the drop-down arrow box and select Type. Add additional details to display as required and then click OK. A schema component diagram that shows an element’s name, type, derivedBy, and pattern is shown in Figure 4-14.

Figure 4-14: A schema diagram that shows additional component information such as type, derivedBy, and pattern.

The View Config panel also enables you to load or save different user-defined schema viewing configurations, as well as specify the distances and widths between components, the drawing direction (schemas can be viewed top to bottom instead of left to right), and several other Schema Design view settings. You can also zoom in or out of a schema diagram by choosing Schema Design → Zoom. Finally, you can generate graphic files of individual schema components as I have done in this chapter by viewing a selected schema component in Schema Design view and choosing Schema Design → Save Diagram. The output is a PNG (Portable Network Graphics) file, which you can use to help document and publish your XML Schema to others, including developers on your team or business partners wanting to integrate with your systems. The generated schema component diagrams contain a small Generated by XMLSPY message on the bottom-right side. If you would like a cleaner graphic, this message can be removed by choosing Tools → Options → File → Save File and unchecking Include XMLSPY Logo in Schema Diagram. If you embed XMLSPY-generated schema diagrams into you program’s documentation, I guarantee that people reading the documentation will be impressed by the quality of the graphics.

Editing and validating XML documents using the XML Schema

XMLSPY can edit and validate XML documents using an XML Schema, just as you edited and validated XML documents in the previous chapter in conjunction with a DTD. All the editing support such as code completion, Entry Helper windows, error messages, and so on (which were discussed in the last chapter) also apply to XML Schemas.

Try creating a new XML document, assigning the Purchase Order XML Schema, and editing and validating the instance document. Choose File → New → XML Document and specify that the new XML document is to be based on an XML Schema. XMLSPY inspects the XML Schema file and attempts to figure out which element is the document element to be used as the root (document) element of the new XML document. If your XML Schema contains multiple global elements and it is ambiguous as to which element is intended to be the document element, XMLSPY will prompt you to pick a root element from a list of possible candidates. After this has been determined, XMLSPY autoinserts all the mandatory child elements into the file. XMLSPY also properly inserts the schemaLocation and imports the namespace that corresponds to the specified XML Schema. A sample valid XML instance document is shown in the following code:

<?xml version=”1.0” encoding=”UTF-8”?> <Order xmlns=”http://www.company.com/examples/purchaseorder” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:schemaLocation=”http://www.company.com/examples/purchaseorder C:\Program Files\Altova\XMLSPY\examples\xmlspyhandbook\ch4\order.xsd”>    <ShippingAddress>       <Street1>123 First St.</Street1>       <City>Boston</City>       <State>MA</State>       <Zip>02115</Zip>    </ShippingAddress>    <BillingAddress>       <Street1>22 Green St</Street1>       <City>Cambridge</City>       <State>MA</State>       <Zip>02139</Zip>    </BillingAddress>    <Line-Items>       <Product>          <Description>Kitchen Utensils</Description>          <Price>4.99</Price>          <Quantity>2</Quantity>       </Product>    </Line-Items>    <Note>Please use expedited shipping</Note> </Order>