If you have ever seen or worked with HTML, you will find an XML document very similar to it. Both HTML and XML are markup languages based on tags within angled bracket. But unlike HTML, XML does not have a fixed set of tags. You can create your own tags, as long as the tags adhere to the rules defined by the XML specification, and then include your data between those tags. Because of this extensibility feature, today more than 450 other markup languages or standards are based on XML, which means they use XML syntax (for example, Scalable Vector Graphics [SVG], VoiceXML, Wireless Markup Language [WML], NewsML).
Whereas HTML's primary focus is presentation, XML's main focus is data and its structure. You can use technologies such as XSL Transformations (XSLT) and XSL Formatting Objects (XSL-FO) to present the data in an XML document in different formats. This is one of the reasons XML is being used heavily in content management and document publishing applications.
Unlike HTML, XML is case-sensitive.
The textual nature of XML makes data highly portable, enabling you to send data and integrate applications across platforms. The most common use of XML is for "data on move"that is, transferring data from one machine to another, cross-platform, crossnetworks, and over the Internet.
Let's now see what an XML document looks like.
Well-Formed XML Document
As mentioned earlier, an XML document might look very similar to an HTML document. However, XML has more strict syntax rules than HTML. An XML document must meet the following requirements:
If an XML document meets all these requirements, it is considered a well-formed XML document. Here is an example of a well-formed XML document that contains some details from a survey form:
<IndividualSurvey xmlns="http://schemas.microsoft.com/AW/IndividualSurvey"> <TotalPurchaseYTD>8248.99</TotalPurchaseYTD> <DateFirstPurchase>2001-07-22Z</DateFirstPurchase> <BirthDate>1966-04-08Z</BirthDate> <MaritalStatus>M</MaritalStatus> <YearlyIncome>75001-100000</YearlyIncome> <Gender>M</Gender> <TotalChildren>2</TotalChildren> <NumberChildrenAtHome>0</NumberChildrenAtHome> <Education>Bachelors & Masters</Education> <Occupation>Professional</Occupation> <HomeOwnerFlag>1</HomeOwnerFlag> <NumberCarsOwned>0</NumberCarsOwned> <CommuteDistance>< 4 Miles</CommuteDistance> </IndividualSurvey>
The XML specification identifies five characters (that is, <, >, &, ', and ") that have special meanings. If any of these characters is required, the alternative entity references (that is, <, >, &, ', and ") must be used in its place in an XML document.
In addition to elements and attributes, an XML document may contain other specialpurpose tags, such as comments (enclosed in <!-- ... -->), processing instructions (enclosed in <? ... ?>) to provide some instructions to the XML parser, and CDATA sections (enclosed in <![CDATA[ ... ]]>). Everything inside a CDATA section is ignored by the parser, so you can include special characters such as <, >, and & within a CDATA section without escaping them as <, >, and &.
Valid XML Documents
One of the primary goals of XML is to enable exchange of data between organizations and applications. To do this, the XML document format that is used for the exchange of information must first be defined and agreed upon. W3C provides a specification called XML Schema Definition (XSD) that can be used to define the structure of an XML document, the data types of elements and attributes, and other rules and constraints that an XML document should follow. If a well-formed XML document adheres to the structure and rules defined by a XSD schema document, that XML document is considered a valid XML document. You can find more details on XSD at www.w3.org/XML/Schema.
Namespaces are generally used for two purposes:
XML namespaces also serve these two purposes. The following sample XML document declares two namespaces, and for each namespace, it defines a prefix (s1 and s2), which is a convenient name for the lengthy namespace URI. The first three child elements are in the s1 namespace, and the others are in the s2 namespace (see Listing 10.1).
Listing 10.1. Survey XML Document
<?xml version="1.0" encoding="UTF-8"?> <StoreSurvey xmlns:s1="http://schemas.microsoft.com/Survey1" xmlns:s2="http://schemas.microsoft.com/Survey2"> <s1:AnnualSales>300000</s1:AnnualSales> <s1:AnnualRevenue>30000</s1:AnnualRevenue> <s1:BankName>International Bank</s1:BankName> <s2:AnnualSales currency="USD">320000</s2:AnnualSales> <s2:AnnualSales currency="Euro">2000</s2:AnnualSales> <s2:AnnualRevenue>29000</s2:AnnualRevenue> <s2:BankName>National Bank</s2:BankName> </StoreSurvey>
XML does not enforce the use of namespaces, and it is totally legal for elements to have the same name. So you could have AnnualSales or BankName elements repeated without using the namespace, but how would you identify in your code which one is which? This is where namespaces come handy. Note in Listing 10.1 that the root element (StoreSurvey) is in no namespace, which is the default namespace here.
Also note the use of a processing instruction, an XML declaration line at the top of Listing 10.1, which tells the XML parser about the XML specification version that the document is based on and the text encoding used for the document.
Navigating XML by Using XPath
XML Path (XPath) is yet another W3C specification (see www.w3.org/TR/xpath), and it is widely used to search through XML documents and to retrieve specific parts of an XML document. XPath is based on the notion that every XML document can be visualized as a hierarchical tree. XPath is a language for expressing paths through such trees, from one node to another. XPath query syntax is similar to the syntax you would use to locate a file or files on a Unix file system (that is, using forward slashes to indicate levels of hierarchy). In addition, XPath enables you to retrieve elements or attributes that satisfy a given set of criteria.
For instance, the following XPath expression can be used to retrieve the annual revenue element in the s2 namespace from Listing 10.1:
The conditions are expressed in square brackets, and the attribute names are preceded with the @ symbol. For instance, the //s2:AnnualSales[@currency="USD"] XPath expression returns annual sales elements that have the currency attribute value USD.
In the following sections, you'll see how SQL Server 2005 supports XML.