What Is XML?

Way back in the mists of time (around 20 or so years ago), a language called the Standardized General Markup Language (SGML) was developed. SGML is a metalanguage, a language for defining markup languages, and although you might not be familiar with SGML itself, you certainly have heard of one of the languages based on it: Hypertext Markup Language (HTML).

HTML is the language of the Web. HTML uses markup tags to encode information so that the information can be parsed (usually by a browser) and displayed. A typical HTML document might contain code similar to this:

 <P align="center">Welcome to the <STRONG>Northwind</STRONG> Web      site</P> 

This HTML code contains two markup tags, <P> and <STRONG>. The tags surround the data to which they apply, and a closing tag such as </STRONG> or </P> indicates the end of the data the tag applies to. The entire data within an opening and closing tag is referred to as an element. The <P> tag contains an align attribute to provide additional formatting information for the data in the P element. Because the tags in HTML are used to format data for presentation, an HTML parser needs to understand what kind of formatting the tags represent. An HTML parser would interpret the preceding example as the following instructions:

  1. Start a new paragraph, and align the text to the center of the page (<P align="center">).
  2. Display the text Welcome to the.
  3. Display the text Northwind emphatically (<STRONG>Northwind </STRONG>).
  4. Display the text Web site.
  5. End the paragraph (</P>).

To process HTML code like the preceding example, an HTML parser must not only be able to find the HTML tags in a document, it must also know what those tags mean. When a browser renders an HTML document for display, it needs to be able to apply the various formats indicated by the markup tags.

Like HTML, Extensible Markup Language (XML) is also based on SGML, and it too uses markup tags to encode data. The main difference between HTML and XML is that while HTML tags provide formatting instructions that should be applied to the data, XML tags describe the structure of the data itself. For example, an XML order document might look like this:

 <Order OrderNo="1234">     <OrderDate>2001-01-01</OrderDate>     <Customer>Graeme Malcolm</Customer>     <Item>         <ProductID>1</ProductID>         <Quantity>2</Quantity>     </Item>     <Item>         <ProductID>4</ProductID>         <Quantity>1</Quantity>     </Item> </Order> 

The XML order document contains no formatting information; it contains only data. This means that an XML parser doesn’t need to understand the meaning of the tags in an XML document. It needs only to be able to find the tags and verify that the document is in fact an XML document. Because parsers aren’t required to understand the tags, any tags can be used. This is why the X in XML stands for extensible. (I guess EML wasn’t catchy enough!)

Attribute values in XML can be enclosed in either double or single quotation marks. For example, you can represent an Order element with an OrderNo attribute of 1234 using either <Order OrderNo='1234'> or <Order OrderNo='1234'>. To an XML parser, these tags are the same.

XML Tags Up Close

Let’s examine some XML tags and see how an XML document is composed. Each tag indicates the beginning of a new element in the document. Elements can have attributes and can contain values and other elements. For example, the Order element in the XML order example in the previous section has an OrderNo attribute with a value of 1234 and contains the following data:

  • An OrderDate child element containing the value 2001-01-01
  • A Customer child element containing the value Graeme Malcolm
  • Two Item child elements, each containing a ProductID child element and a Quantity child element

Of course you might need to include an element in your document that doesn’t contain anything—for example, if you need to indicate an optional piece of data that hasn’t been provided on this occasion. The most obvious way to represent an empty element is simply to place the closing tag immediately after the opening tag, as shown in the MiddleInitial element in the following XML customer data:

 <Customer>     <FirstName>Graeme</FirstName>     <MiddleInitial></MiddleInitial>     <LastName>Malcolm</LastName> </Customer> 

This approach to representing an empty element is acceptable, but it results in the unnecessary repetition of the element name. To cut down on the amount of text in your XML document, you can use the shorthand syntax for an empty element: a tag with a slash at the end, as shown in this revised customer data:

 <Customer>     <FirstName>Graeme</FirstName>     <MiddleInitial/>     <LastName>Malcolm</LastName> </Customer> 

Of course, empty elements can still have attributes, as shown in the second PhoneNumber element in the following example:

 <Customer>     <FirstName>Graeme</FirstName>     <MiddleInitial/>     <LastName>Malcolm</LastName>     <PhoneNumber Location="Home">555 112233</PhoneNumber>     <PhoneNumber Location="Work"/> </Customer> 

Representing Data in an XML Document

XML documents must be well formed and valid. Although these two terms are similar, they refer to two different aspects of verifying an XML document. The term well formed indicates that an XML document is suitable for interpretation by an XML parser. In other words, an XML document must follow all the rules for XML documents so that a parser can read it and identify all the elements, attributes, and values it contains. Note that a well-formed document isn’t necessarily a useful document in a business process; being well formed merely ensures that the structure of the document is correct. To be suitable for processing in a business solution, the XML document must not only be well formed, but also valid. A document is valid if it contains all the data required in a document of that type or class. For example, an order document might be required to contain an Order element with an OrderNo attribute and an OrderDate child element. The parser validates XML documents by comparing the contents of the document to a specification defined for that class of document, which could be in a Document Type Definition (DTD) document or a schema. We’ll discuss document validation later in this appendix; first let’s look at what it takes to make a well-formed document.

Creating Well-Formed XML

To be well formed, an XML document must obey the following rules:

  • There must be a single root element that contains all the other elements in the document.
  • Each opening tag must be matched with a corresponding closing tag.
  • XML tags are case sensitive, so opening tags must match exactly their corresponding closing tags.
  • Each element inside the root element must be wholly nested within its parent element.

The first rule states that to be well formed, an XML document must have a single root element. The following example is not a well-formed document because there is no single top-level element:

 <Product Product>Chai</Product> <Product Product>Chang</Product> 

An XML document with no root element is known as an XML fragment. To make this fragment into a well-formed XML document, we need to add a root element, as shown here:

 <Catalog>     <Product Product>Chai</Product>     <Product Product>Chang</Product> </Catalog> 

The second rule states that each opening tag must have a corresponding closing tag. In other words, each tag that is opened must be closed. Empty elements indicated with the shorthand tag (<MiddleInitial/>, for example) are considered self-closing. All other tags must have a closing tag. For example, the following document is not well formed because it contains an <Item> tag with no matching </Item> tag:

 <Order>     <OrderDate>2001-01-01</OrderDate>     <Customer>Graeme Malcolm</Customer>     <Item>         <ProductID>1</ProductID>         <Quantity>2</Quantity>     <Item>         <ProductID>4</ProductID>         <Quantity>1</Quantity>     </Item> </Order> 

To make this well formed, a closing tag must be added for the first Item element, as shown here:

 <Order>     <OrderDate>2001-01-01</OrderDate>     <Customer>Graeme Malcolm</Customer>     <Item>         <ProductID>1</ProductID>         <Quantity>2</Quantity>     </Item>     <Item>         <ProductID>4</ProductID>         <Quantity>1</Quantity>     </Item> </Order> 

The third rule is that XML tags are case sensitive, so a tag you use to close an element must match the case of the corresponding opening tag. Thus, in an XML document, <Product> isn’t the same as <product>, so <Product> can’t be closed using </product>. The example below is not well formed because a different case is used for the opening and closing tags of the OrderDate element.

 <Order>     <OrderDate>2001-01-01</Orderdate>     <Customer>Graeme Malcolm</Customer> </Order> 

To be well formed, the case used for all tags must match, as shown in this example:

 <Order>     <OrderDate>2001-01-01</OrderDate>     <Customer>Graeme Malcolm</Customer> </Order> 

The final rule states that each element must be contained within its parent. In other words, elements can’t overlap. For example, the following document is not well formed because a Category element overlaps a Product element:

 <Catalog>     <Category CategoryName="Beverages">         <Product Product>             Chai         </Category>      </Product> </Catalog> 

To make this document well formed, the Product element needs to be closed before the Category element, as shown here:

 <Catalog>     <Category CategoryName="Beverages">         <Product Product>             Chai         </Product>     </Category> </Catalog> 

Processing Instructions and Comments

As well as containing data meant for use in a business process, XML documents can contain processing instructions to provide additional information to an XML parser and comments to provide additional information to a human reader.

Processing instructions are contained within < ? and ?> tags. They’re commonly used to indicate the version of the XML specification that a document adheres to or to instruct the parser to apply a style sheet. Although processing instructions aren’t required to make a document well formed, it’s generally good practice to include a processing instruction in the XML prologue before the root element of an XML document to indicate the XML version. The following example shows an XML document with a processing instruction in the prologue:

 <?xml version="1.0"?> <Order OrderNo="1234">     <OrderDate>2001-01-01</OrderDate>     <Customer>Graeme Malcolm</Customer>     <Item>         <ProductID>1</ProductID>         <Quantity>2</Quantity>     </Item>     <Item>         <ProductID>4</ProductID>         <Quantity>1</Quantity>     </Item> </Order> 

You can also include comments in an XML document by enclosing them in <!-- and --> tags, as shown in the following example:

 <?xml version="1.0"?> <!-- This is a purchase order. --> <Order OrderNo="1234">     <OrderDate>2001-01-01</OrderDate>     <Customer>Graeme Malcolm</Customer>     <!-- The items in the order are listed here. -->     <Item>         <ProductID>1</ProductID>         <Quantity>2</Quantity>     </Item>     <Item>         <ProductID>4</ProductID>         <Quantity>1</Quantity>     </Item> </Order> 

Namespaces

Another important concept in XML is the namespace. A namespace is a mechanism for allowing two different types of element with the same name in a single XML document. For example, the following order might be taken in a bookstore:

 <?xml version="1.0"?> <BookOrder OrderNo="1234">     <OrderDate>2001-01-01</OrderDate>     <Customer>         <Title>Mr.</Title>         <FirstName>Graeme</FirstName>         <LastName>Malcolm</LastName>     </Customer>     <Book>         <Title>Treasure Island</Title>         <Author>Robert Louis Stevenson</Author>     </Book> </BookOrder> 

Close inspection of this document reveals a possible source of confusion. The document has two Title elements, but they represent two different things. The Customer element contains a Title element to identity the title of the customer (Mr., Mrs., Ms., Dr., and so on). The Book element also contains a Title element, but this element identifies the title of the book.

To prevent confusion, you can use namespaces to distinguish between different elements of the same name. A namespace associates elements or attributes with a unique identifier known as a Universal Resource Identifier (URI). A URI can be a URL or some other universally unique identifier. The namespace doesn’t actually need to reference an actual Internet location; it just needs to be unique.

Declaring and Using Namespaces

You can declare namespaces in any element by using the xmlns attribute. You can declare a default namespace that will apply to the contents of the element in which you make the declaration. For example, the book order document could be rewritten as shown here:

 <?xml version="1.0"?> <BookOrder OrderNo="1234">     <OrderDate>2001-01-01</OrderDate>     <Customer xmlns="http://www.northwindtraders.com/customer">         <Title>Mr.</Title>         <FirstName>Graeme</FirstName>         <LastName>Malcolm</LastName>     </Customer>     <Book xmlns="http://www.northwindtraders.com/book">         <Title>Treasure Island</Title>         <Author>Robert Louis Stevenson</Author>     </Book> </BookOrder> 

This solves our duplicate name problem because all the elements in the Customer element belong to the http://www.northwindtraders.com/customer namespace while the elements in the Book element belong to the http://www.northwindtraders.com/book namespace.

However, what if we need to have a document containing multiple books or customers? Continually changing the default namespace throughout the document would soon become confusing. An alternative approach is to declare explicit abbreviations for multiple namespaces and to add as prefixes to element names the appropriate namespace abbreviation. With this approach, it is common to declare in the root element all the namespaces that will be used:

 <?xml version="1.0"?> <BookOrder xmlns="http://www.northwindtraders.com/order"     xmlns:cust="http://www.northwindtraders.com/customer"     xmlns:book="http://www.northwindtraders.com/book"         OrderNo="1234">     <OrderDate>2001-01-01</OrderDate>     <cust:Customer>         <cust:Title>Mr.</cust:Title>         <cust:FirstName>Graeme</cust:FirstName>         <cust:LastName>Malcolm</cust:LastName>     </cust:Customer>     <book:Book>         <book:Title>Treasure Island</book:Title>         <book:Author>Robert Louis Stevenson</book:Author>     </book:Book> </BookOrder> 

Now the document references three namespaces: a default namespace of http://www.northwindtraders.com/order, the http://www.northwindtraders.com/customer namespace (abbreviated as cust), and the http://www.northwindtraders.com/book namespace (abbreviated as book). Elements and attributes with no abbreviations, such as BookOrder, OrderNo, and OrderDate, are assumed to belong to the default namespace. To indicate that an element or attribute belongs to a namespace other than the default, the abbreviation for the namespace is included in the element or attribute name.

Later in this appendix we’ll see how to use namespaces to validate XML documents by referencing schemas, but for now let’s look at how to process data in an XML document.



Programming Microsoft SQL Server 2000 With Xml
Programming Microsoft SQL Server(TM) 2000 with XML (Pro-Developer)
ISBN: 0735613699
EAN: 2147483647
Year: 2005
Pages: 89

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net