Working with XML: System.Xml

< BACK  NEXT >
[oR]

XML is certainly among the most important new technologies to emerge in the last few years. Recognizing this, Microsoft has chosen to use XML in many different ways throughout .NET. The company also apparently recognizes that its customers wish to use XML in a variety of ways. Accordingly, the .NET Framework class library includes a substantial amount of support for working with XML, most of it contained in the System.Xml namespace.

Microsoft has gotten XML religion

The XML Technology Family

The basics of XML were described in Chapter 2. To get a feeling for what the System.Xml namespace provides, however, you need to understand a bit more about the family of XML technologies. From its beginning as a way to define documents, elements in those documents, and namespaces for those elements, XML has evolved into a significantly more powerful and more complex group of technologies.

XML is more than angle brackets

XML Infosets

The familiar angle bracket form of XML implies a logical hierarchy of related information. This abstract set of information and relationships is known as the XML document's Information Set, a term that's usually shortened to just Infoset. An Infoset consists of some number of information items, each of which represents some aspect of the XML document from which this Infoset was derived. For example, every Infoset has a document information item that acts as the root of the tree, with a single root element information item just beneath it. Most Infosets have some number of child element information items below this root element.

An Infoset provides an abstract view of the information in an XML document

For example, consider this simple XML document:

 <employees>    <employee>       <name>Bob</name>       <age>36</age>    </employee>    <employee>       <name>Casey</name>    </employee> </employees> 

The Infoset for this document can be represented as shown in Figure 5-2. The root of the Infoset's tree is a document information item, while below it is a hierarchy of element information items, one for each element in the XML document. The leaves of the tree are the values of the elements in this simple document.

Figure 5-2. An XML document's Infoset is an abstract representation of the document's contents.
graphics/05fig02.gif
Other XML Technologies

XML documents and the Infosets they imply can provide the foundation for tools that manipulate a document's data. Among the most important of these is XPath, which provides a mechanism for identifying a subset of an Infoset. A simple and quite accurate way to think of XPath is as a query language for information in XML documents (that is, for XML Infosets). Just as SQL provides a standard language for querying information contained in a relational database, XPath provides a language for querying information represented as a hierarchy. Using an XPath expression, a user can identify specific nodes in a tree.

XPath allows querying an XML document

For example, imagine that this query is issued against the simple XML document just described:

 /employees/employee/name 

This simple XPath request first identifies each employee element below the root employees element and then identifies the values of each name element in each of those employee elements. Far more complex queries are also possible, including queries that use comparison operators, compute sums, include wildcards, and much more. With XPath, a developer need not write her own code to search through information. Instead, this abstract language can be used to find easily information represented as an in-memory XML document.

Another technology built on the abstract foundation provided by XML Infosets is the Extensible Stylesheet Language Transformations, universally referred to as XSLT. XSLT is a mechanism for specifying transformations of XML documents, transformations that can be described in an XSLT stylesheet. For instance, a set of XSLT rules that transforms an XML document from one schema to another can be defined. XSLT also relies on the abstract form of an XML document represented by its Infoset, and it relies on XPath for some of its functionality.

XSLT allows transforming XML documents

Figure 5-3 summarizes the relationships among the fundamental XML technologies. An XML Schema definition describes the structure and contents of an XML document it defines a group of types while an XML document itself can be thought of as an instance of the document type defined by some schema.[2] This XML document, in turn, is the foundation for an Infoset, which provides an abstract view of the document's data. Technologies for working with that data, such as XPath and XSLT, are effectively defined to work against the Infoset, allowing them to remain independent of the specific representation used for the XML document itself. Note that because these technologies rely on the Infoset rather than the familiar angle bracket based syntax of an XML document, they can actually be used with any data that can be represented in a strict hierarchy. That data need not necessarily come from a traditional XML document as long as it can be represented as an Infoset. For example, hierarchical data such as a file system or the Windows registry might be accessed in this way.

[2] In effect, an XML document's metadata is provided by its associated XML Schema definition.

Figure 5-3. XML is a family of technologies, with the Infoset at the center.
graphics/05fig03.gif

XML today is a unified family of technologies

XML APIs

The XML standards don't mandate any particular approach to processing the information in an XML document. As it happens, two styles of APIs have come into common use. In one approach, the information in an XML document is read sequentially, traversing a document's tree in a depth-first search. An API that supports this kind of access is referred to as a streaming API, and one common choice for this is the Simple API for XML (SAX). SAX was created by a group of volunteers independent of the W3C or other formal standards groups, but it is supported by many vendors today.

SAX is a streaming API for accessing XML-defined information

In the second approach, the entire document is represented as an in-memory data structure (conceptually, at least), which allows an application to navigate through it, moving back and forth as needed. The most commonly used API for this option is an implementation of the Document Object Model (DOM) defined by the W3C. Because of the style of access it allows, the DOM is an example of a navigational API.

The DOM is a navigational API for accessing XML-defined information

What System.Xml Provides

The System.Xml namespace has a great deal of built-in support for working with XML. Among the features available are support for both streaming and navigational APIs, the ability to use XPath queries, the ability to perform XSLT transformations, and more. While describing all of these features in any detail is well beyond the scope of this book, this section provides an overview of their most important aspects.

System.Xml includes support for XPath, XSLT, the DOM, and more

The most fundamental types for handling XML-defined data are contained directly in System.Xml itself. One of these is the abstract class XmlNode. As its name suggests, this class represents a node in an XML document, such as a particular element. Another fundamental type in System.Xml is the abstract class XmlReader. XmlReader provides a streaming interface for accessing XML data. (Note that the .NET Framework does not directly support SAX; instead it bases all streaming access on XmlReader.) XmlReader is the parent for three concrete classes:

  • XmlTextReader: Provides a streaming API that reads the information in an XML document sequentially, much like the SAX API. It's the fastest option for reading XML- defined data, but it's also somewhat limited in that no navigation is possible through the document. Also, this class makes no attempt to determine whether the document is valid, that is, whether it corresponds correctly with some XML Schema definition.

  • XmlValidatingReader: Like XmlTextReader, this class reads the information in an XML document sequentially. It also makes sure that the document corresponds to a specified XML Schema definition, that is, that the document is valid.

  • XmlNodeReader: Rather than reading from an XML document, this class provides forward-only access to a single XmlNode or a tree of XmlNodes. This is the same in-memory structure used to represent an XML document accessed via the DOM, but unlike the navigational DOM interface, XmlNodeReader allows only sequential access.

The XmlReader class allows streaming access to XML-defined information

System.Xml also includes an abstract XmlWriter class, along with one implementation of a concrete class, called XmlTextWriter, that inherits from XmlWriter. The methods in this class allow writing XML information, angle brackets and all, to a stream. As described earlier in this chapter, a stream can be maintained in memory, written to a file, or used in some other way.

The XmlWriter class allows writing XML documents

System.Xml also includes the XmlDocument class. This class, which inherits from XmlNode, provides an implementation of the DOM API. While the various implementations of XmlReader just described are the fastest way to access information in an XML document, the XmlDocument class is more general because it allows navigation, moving backward and forward through the document at will. A developer is free to choose whichever approach best meets the needs of his application.

An XmlDocument object allows navigational access to XML-defined information

The methods and properties provided by XmlDocument give some idea of the kinds of operations the DOM allows. Those methods include the following:

  • Load: Loads an XML document and parses it into its abstract tree form

  • Save: Saves an in-memory document to a stream, file, or some other location

  • InsertBefore: Inserts a new node, represented as an instance of the XmlNode class, in front of the currently referenced node in the tree

  • InsertAfter: Inserts a new node, once again an XmlNode instance, in back of the currently referenced node in the tree

  • SelectNodes: Allows selecting nodes using an XPath expression

XmlDocument also exposes a number of properties that allow navigation through the tree. They include the following:

  • HasChildNodes: Indicates whether the current node has any nodes beneath it

  • FirstChild: Returns the first child of the current node

  • LastChild: Returns the last child of the current node

  • ParentNode: Returns the parent, that is, the node immediately above the current node

Several other namespaces are defined beneath System.Xml:

  • System.Xml.Schema: Contains classes for creating and working with XML Schema definitions. Because this language is quite complex, this namespace contains a large set of classes, including a class for each of the elements in the XML Schema language. Microsoft refers to these classes collectively as the Schema Object Model (SOM).

  • System.Xml.XPath: Contains types that support using XPath expressions to query hierarchical data. Among them are the XPathNavigator class, which allows navigating through a document and issuing XPath queries, and the XPathExpression class, which can contain a compiled XPath query.

  • System.Xml.Xsl: Contains types that support using XSLT. The most important of these is the XslTransform class, which allows transforming data using an XSLT stylesheet.

  • System.Xml.Serialization: Contains types that allow serializing data into an XML format. This is another large namespace, but a key type within it is the XmlSerializer class. This class is similar to the SoapFormatter class described earlier, in that it provides Serialize and Deserialize methods that write and read an object's state in XML. This namespace also contains many other classes that allow customizing the serialization process, working with SOAP, and other aspects of converting between state information stored in a language object and the serialized XML form of that information.

Types for XPath, XSLT, and XML Schema support are provided in separate namespaces

XML has become an essential part of modern computing. By providing a standard way to describe information, it fills an important hole in the complex, multivendor world we live in. The .NET Framework's large set of namespaces and types devoted to XML are intended to make this increasingly important technology significantly easier to use.

The .NET Framework class library has a great deal of support for XML

< BACK  NEXT >


Understanding. NET. A Tutorial and Analysis
Understanding .NET: A Tutorial and Analysis (Independent Technology Guides)
ISBN: 0201741628
EAN: 2147483647
Year: 2002
Pages: 60

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net