XML is a text-based format for describing data. It brings with it new ways to deliver data to Web-enabled applications, allowing nearly any type of data to be sent and used anywhere. This sounds like some type of magic wand, but XML's usefulness stems from two simple features. First, XML is extensible (hence the name, Extensible Markup Language), meaning you can easily extend it by adding your own custom tags and structure. Second, it's text-based, meaning you can create or read XML with any text editor, such as Notepad.
Like HTML, XML uses simple markup tags to describe its contents. Unlike HTML, however, there are no standard tags you make up your own. This is the power of XML you can create any tags you want, so you can represent any type of data. This also means that XML is easy to read and modify.
For example, you could make up the following tags and XML would understand them, even though HTML browsers wouldn't:
<Name>...</Name> <Occupation>...</Occupation> <FavRestaurant>...</FavRestaurant>
XML is represented as plain text you can create XML data in Notepad if you want. Thus, it's an easy way to transport data to various sources. With most database applications, the internal data is stored in a database-specific format. When you're working with different data stores (or different computer platforms) in the same project, complex conversions from one format to the other commonly are needed. With XML, on the other hand, the data is presented in a structured, textual format, removing the need for any archaic conversions. Due to its textual nature, XML is easy for users to read and understand. Also, XML can bypass sophisticated security measures that often prevent other types of communication from occurring, or that need complex workarounds. These security measures often allow plain text to pass through, so XML is perfect for transporting data anywhere.
Internet applications often involve working with multiple data sources residing on different platforms. XML is an ideal way to represent data in such an Internet application.
The XML Data Model
Listing 11.1 contains an example XML file representing the inventory of a bookstore.
Listing 11.1 A Bookstore's Inventory in XML Form
1: <bookstore> 2: <book genre="novel" style="hardcover"> 3: <title>The Handmaid's Tale</title> 4: <price>19.95</price> 5: <author> 6: <first-name>Margaret</first-name> 7: <last-name>Atwood</last-name> 8: </author> 9: </book> 10: <book genre="novel" style="paperback"> 11: <title>The Poisonwood Bible</title> 12: <price>11.99</price> 13: <author> 14: <first-name>Barbara</first-name> 15: <last-name>Kingsolver</last-name> 16: </author> 17: </book> 18: <book genre="novel" style="hardcover"> 19: <title>Hannibal</title> 20: <price>27.95</price> 21: <author> 22: <first-name>Richard</first-name> 23: <last-name>Harris</last-name> 24: </author> 25: </book> 26: <book genre="novel" style="hardcover"> 27: <title>Focault's Pendulum</title> 28: <price>22.95</price> 29: <author> 30: <first-name>Umberto</first-name> 31: <last-name>Eco</last-name> 32: </author> 33: </book> 34: </bookstore>
| || |
Save this file as books.xml. You'll learn about the specifics later today in the section on XML schemas. For now, notice that XML is made up of structured, hierarchical tags. There are two <book> tags, each with its own attributes (genre and style), and several subelements (title, author, price). The actual data is represented within these sub-element tags. The entire set is wrapped in <bookstore> tags that describe this data set. This type of data representation is often called a document tree or data tree.
In a traditional database, such as Access, the data will look something like Figure 11.1.
Figure 11.1. The XML data from Listing 11.1, viewed in Microsoft Access.
The XML version is more portable, it's easier for others to read and use, and it doesn't need any complex mechanisms to set up. Let's save this listing in a text file called books.xml for use later. Save it in the c:\inetpub\wwwroot\tyaspnet21days\day11 folder (or somewhere else you'll remember easily). Now try viewing this file from your browser. If you're using a newer browser (IE 5.0 and above), you should see something similar to Figure 11.2.
Figure 11.2. XML viewed from the browser.
Internet Explorer can parse the XML automatically and display it as a hierarchy for you. You can click on the - sign to collapse a branch, or click the + sign to expand it. XML provides a wonderful mechanism for representing data.
If you define your own tags, how will others know what kind of data you're talking about? XML schemas define this format. They describe what types of data to expect, how fields should be formatted, their sizes, and so on. It isn't necessary for an XML document to have a schema, but it helps tremendously; after all, if you want to speak the same language with someone else, it helps that you both know the language.
There are three major types of schemas: the document type definition (DTD), the Microsoft XML-Data Reduced schema (XDR), and the XML Schema Definition Language (XSD). The actual schemas are simply plain text files that have the appropriate extension (.dtd, .xdr, or .xsd). Any of these can be used with your XML files, but for applications in the .NET Framework, the XDR is the preferred one.
Let's look at the XDR schema that defines Listing 11.1.
Listing 11.2 The XML Schema of Listing 11.1
1: <?xml version="1.0"?> 2: <Schema xmlns="urn:schemas-microsoft-com:xml-data" 3: xmlns:dt="urn:schemas-microsoft-com:datatypes"> 4: <ElementType name="first-name" content="textOnly"/> 5: <ElementType name="last-name" content="textOnly"/> 6: <ElementType name="name" content="textOnly"/> 7: <ElementType name="price" content="textOnly" 8: dt:type="fixed.14.4"/> 9: <ElementType name="author" content="eltOnly" order="one"> 10: <group order="seq"> 11: <element type="name"/> 12: </group> 13: <group order="seq"> 14: <element type="first-name"/> 15: <element type="last-name"/> 16: </group> 17: </ElementType> 18: <ElementType name="title" content="textOnly"/> 19: <AttributeType name="genre" dt:type="string"/> 20: <AttributeType name="style" dt:type="enumeration" 21: dt:values="paperback hardcover"/> 22: <ElementType name="book" content="eltOnly"> 23: <attribute type="genre" required="yes"/> 24: <attribute type="style" required="yes"/> 25: <element type="title"/> 26: <element type="price"/> 27: <element type="author"/> 28: </ElementType> 29: <ElementType name="bookstore" content="eltOnly"> 30: <element type="book"/> 31: </ElementType> 32: </Schema>
| || |
Save this listing as books.xdr in the same folder as books.xml. This also looks like plain HTML, but it's much more. Let's examine it more closely. On line 1, you simply specify the type of XML you're creating version 1.0 in this case. This line is required for these examples to work properly. On line 2, you open your <Schema> tag. xmlns stands for XML namespace, which is a standard group of XML tags that someone has put together to promote further standardization sort of like providing a standard dictionary to use for XML terms. Typically, if you specify a namespace at all, you'll use a standard one such as schemas-microsoft-com:xml-data, as done here, or use a custom schema, which we'll talk about later today.
There is, however, one important thing to note about the namespace, which will come in very handy in tomorrow's lesson. The namespace in XML is used to categorize elements in the schema, just like namespaces in .NET categorize .NET classes. The difference in XML is that XML uses a colon to separate the namespace name from the element, whereas .NET uses a period. The syntax to define a namespace then is as follows:
The prefix is the optional name you'll use to refer to the namespace, and its actual value is arbitrary. So, for example, if you had the following declaration:
Then each of the elements on lines 4 31 could be prefixed by "xs:":
<xs:ElementType ...> <xs:AttributeType ...> ...
As you can see, there are no prefixes on lines 4 31. Why? Because, and this is important, an xmlns attribute with a blank prefix signifies the default namespace. (If you don't specify an xmlns attribute at all, you don't have any namespace at all.) Anytime you specify a value for the prefix in the xmlns attribute, you are providing a non-default namespace. If you leave the prefix out, XML assumes that all elements in the schema that are not otherwise specified belong to this default namespace.
Which brings up another point: You can have as many namespaces as you want in an XML file or schema. Each will follow the same syntax as xmlns did here, but using a different prefix. Make sure that you prefix your elements accordingly when you have more than the default namespace.
The key thing to note about XML namespaces is that they group elements in the XML file. When you learn about searching XML files tomorrow, this will become important.
The <ElementType> tags are where you define the format of data. Lines 4 7 define the first-name, last-name, name, and price elements, which you'll use later in the schema. Placing these definitions here is similar to declaring variables at the top of your pages. The content attribute defines what type of data may go in this tag, and dt:type defines a few more attributes, such as the data type and how it's formatted.
Lines 9 17 define another element, author. This element also has a few other tags inside it, defined by the elements described on lines 4 7. Lines 18 21 define a few more elements and attributes that you may use in your schema.
Finally, line 22 defines the book element, which contains all the other elements defined so far. This section should match the format of the XML file; in other words, the title, followed by the price, followed by the author. On line 29, you define another element that in turn contains the book element.
You didn't have to define the elements before you used them you could have defined them all in the book element. However, doing it this way is a better method for building schemas, just like it's a good idea to define your variables before you use them in ASP.NET.
The schema, books.xdr, defines the columns in a database table. The books.xml XML file defines the rows. In this way, these two files can represent nearly any type of data. Now if you want your books.xml file to use the new schema, you need to alter it slightly. Change the first two lines to read:
<?xml version="1.0"?> <bookstore xmlns="x-schema:books.xdr">
In other words, you need to specify the version of XML you're using, and then just add a reference to your new schema file. You only need to do so for the top element in your XML file; all subelements will follow suit.
We'll skip over the other two types of schemas for now we'll touch on them briefly again later today. Much of the syntax is similar for all three schema types, so it really depends on what you're comfortable with.
| || |
An XML file that follows a schema and whose tags are properly expressed (meaning they adhere to the XML standards set forth by the World Wide Web Consortium, W3C) is known as a well-formed XML document. Creating well-formed XML files ensures that your data will be readable from any XML-compliant application. In general, a well-formed XML file must follow these guidelines:
It must contain at least one element.
It must contain a unique opening and closing tag that contains the entire document, forming the root element.
All other tags must be nested, with opening and closing tags, and cannot overlap.
For more information on W3C standards documentation, check out http://webreference.com/xml/reference/standards.html.