Using XML in a .NET Application | Microsoft Visual J# .NET (Core Reference) (Pro-Developer)

I l @ ve RuBoard

If you're working in an environment that consists entirely of platforms that support the .NET Framework, you might wonder why you need XML. You can use the facilities of ADO.NET to manipulate and transport your data, so why bother converting to and from a text format? The answer lies largely in the ubiquity of XML; XML has become the universal data representation of modern distributed applications. This representation allows data to be passed relatively easily between different platforms, different vendors , new and old applications, and so forth. The generic integration provided by Web services relies on XML to deliver interoperability. Almost all applications now have some ability to import and export XML, regardless of their background. Because of its ubiquity, XML forms an important part of the world that your applications will occupy.

XML as a Data Format

XML documents consist of data and tags that provide meaning and context for that data. XML is a text format rather than a binary one. If you need to, you can read and write XML using an ordinary text editor, such as Notepad. An XML document consists of tags, which are delimited by less-than ( < ) and greater-than ( > ) characters , and text content.

The following is a simple XML document that describes a catalog of cakes:

 <?xmlversion="1.0" encoding="utf-8" ?> <CakeCatalogxmlns="http://www.fourthcoffee.com/CakeCatalog.xsd"> <CakeTypestyle="Celebration" filling="sponge" shape="square"> <Message>Congratulations</Message> <Description>Generalachievement</Description> <Sizes> <OptionsizeInInches="10" /> <OptionsizeInInches="12" /> </Sizes> </CakeType> <CakeTypestyle="Celebration" filling="fruit" shape="round"> <Message>Hi!</Message> <Description>Quitecasual</Description> <Sizes> <OptionsizeInInches="12" /> <OptionsizeInInches="18" /> </Sizes> </CakeType> <CakeTypestyle="Christmas" filling="fruit" shape="square"> <Message>Season'sGreetings</Message> <Description>Traditional,spicedChristmascake</Description> <Sizes> <OptionsizeInInches="15" /> <OptionsizeInInches="18" /> <OptionsizeInInches="20" /> </Sizes> </CakeType> <CakeTypestyle="Celebration" filling="sponge" shape="hexagonal"> <Message>HappyBirthday</Message> <Description>Anexcellentcake.</Description> <Sizes> <OptionsizeInInches="15" /> </Sizes> </CakeType> </CakeCatalog>

Within this document, you can see some common aspects of XML:

An XML document is made up of a series of XML elements. An element consists of a start tag ( <CakeType> , for example), an end tag ( </CakeType> , for example), and possibly some content. This content can be a combination of text and other elements.
Each CakeType element has three attributes. Attributes are name /value pairs that provide metadata relating to the text content of an element. An example of an attribute is filling="fruit". In some cases, there might be no text content at all ”simply the attribute values that are used to contain all of the relevant data. Be careful not to confuse XML attributes with the .NET attributes you've already encountered ”they are completely unrelated.
An XML document has a defined structure. It starts with an XML declaration, <?xml ?> , that states the version and encoding of the document. Most XML documents are encoded in some form of Unicode, such as UTF-8. The document then has a single root element. The root element contains all the other elements and content, except for the XML declaration and a few other document-level tags. The structure of a document is governed by simple rules, such as the requirement that elements be correctly nested and not overlap. There are also some encoding rules that govern the validity of names that can be used in an XML document. If an XML document conforms to these basic rules as defined in the XML standard, it is said to be well formed .
Parts of a document can be associated with different XML namespaces. A namespace provides a way of differentiating tags defined by different organizations or for different purposes (similar to packages or namespaces in Java and the .NET Framework). A namespace can be applied to all elements in a document or associated with a particular prefix. The namespace attribute ( xmlns= "http://www.fourthcoffee.com/CakeCatalog.xsd") defined on the CakeCatalog element shown in the example document indicates that the default namespace for the document is http://www.fourthcoffee.com/CakeCatalog.xsd . This default namespace is associated with all elements and attributes in the document. You can alter this definition to indicate that only elements and attributes annotated with a particular prefix are associated with that namespace. To use a prefix, you can alter the namespace attribute to xmlns:cakes="http://www.fourthcoffee.com/CakeCatalog.xsd" so this namespace is associated only with the cakes prefix. You can then annotate particular elements and attributes with that prefix so that, for example, the CakeType element is changed to cakes:CakeType . You can define and use multiple namespace prefixes in a document. The document author can also combine elements and attributes that use namespace prefixes with a default namespace. In this case, any elements and attributes that do not have a prefix are associated with the default namespace.

You can define the expected structure of an XML document using one of two mechanisms. The XML specification itself defines a structure definition syntax known as Document Type Definition (DTD). As you'll see later, DTDs have some drawbacks, so they have been superseded by a newer standard for document structure definition called XML Schema. An XML document can be checked against an associated DTD or XML schema ”this check is called validation . If the document conforms to the DTD or XML schema, it is said to be valid.

One thing that is not shown in our sample XML document is an entity . Entities are among the more obscure parts of the XML standard. An entity shows up in an XML document as a placeholder, and at some point during the processing of the XML document, a value is typically inserted into the placeholder. Entities can be either internal or external. The value for an internal entity is defined as part of the DTD or schema. The value for an external entity is obtained from an external source, such as a URL. The act of retrieving the value of an external entity is called resolving the entity. We will not cover entities in great detail in this chapter, but you'll learn more about them at certain points, when they're relevant.

Because XML is a text-based format, it has certain disadvantages over native data formats, such as

It is comparatively bulky.
Data generally needs to be converted back and forth between XML and an internal or native format, which adds processing overhead.

However, XML also has advantages:

XML is independent of any platform or vendor.
A text format is comparatively easy to process in almost all programming languages.
The flexibility of XML means that almost any type of data can be described.
Because XML it is text-based, there is little danger that XML documents will become obsolete and unreadable in the way that some proprietary binary formats have over the years .

Roles for XML

XML can perform various roles within an enterprise application:

XML can be used to encode data and documents that need to be exchanged between business partners . Earlier initiatives in this area, such as Electronic Data Interchange (EDI), used a fairly specialized, binary format that was comparatively difficult to process and usually required specific software. However, many XML processors are available, some of them free.
All but the simplest applications need to access and store configuration information. This has traditionally taken the form of INI files and various proprietary formats, either text or binary. XML provides a flexible way of storing such information that is easy to read and maintain.
XML can be used for the long- term persistence of data ”within an XML-aware database or as standalone files.
Because XML is machine-readable while still being very flexible, it can be used to encode information at different levels in the application architecture, including middleware and support services. The best example of this is the use of XML in SOAP and other Web service technologies to describe the data being transported and the services being offered .

What Applications Need from XML Support

As you've seen, XML is a flexible, text-based format. However, you still need facilities that make it easy to manipulate and apply XML:

You need high-level abstractions ”this means no low-level string parsing and a reduction in the amount of work involved in marshaling XML into and out of memory.
You need simple ways to validate the format of the XML documents. Data might arrive from a variety of sources, so one of the first requirements is to ensure that it has the correct structure ”that is, you need to know whether the document is both well formed and valid.
It must be easy to read XML from many sources, such as files, URLs, streams, and so forth.
XML might need to be processed in different ways. When you handle large documents, it is far more convenient to be able to process the data as a stream than having to load the whole document into memory. In contrast, some applications move back and forth (or up and down) through the data, adding, changing, and removing as they go. This implies that the document must be held in memory. Both of these processing styles must be accommodated.

Regarding the last point, you'll see throughout this chapter that two distinct techniques are used for manipulating XML data. Stream-based processing reads the data through once, presenting the data as soon as it arrives and discarding it once it has been read. This type of processing is ideal when you're dealing with large amounts of data or data with little context, and it's ideal for filtering or when no manipulation of the data is needed. The use of a stream results in comparatively fast processing and a comparatively small memory footprint. However, one problem with stream-based processing relates to context sensitivity. If the meaning of tags and text in your document is dependent on the context in which you find them, you might have to keep a track of the current context when you use stream-based processing. This can mean using many Boolean flags or building complex state models.

The alternative mechanism, in-memory processing of XML documents, tends to be slower and more memory intensive , but you have completely random access to the document and you can add, remove, or change parts of it as you see fit. In-memory processing does not have the same context issues that are inherent in the stream-based processing model. Because you can revisit any piece of the document, you can work out the context of any part when you need it rather than having to cache the current context.

Processing XML Data

Given data in an XML format, what might you do with it? There are certain tasks you will often perform when manipulating XML. These include

Accessing the data and metadata in an XML document using low-level APIs. You might also need to generate your own XML documents programmatically at the same low level. APIs such as the Document Object Model (DOM) and the Simple API for XML (SAX) provide this level of processing. (The use of APIs is covered later in this chapter.)
Importing or exporting XML data to or from relational databases. In this case, you might need to interact with ADO.NET DataSet objects to exchange such data with a relational database. (More about this later in the chapter.)
Using XML as a convenient, in-memory data format, especially in a Web browser. As such, you'll often find that XML is bound to visual user interface components to provide information for the user . (See Chapter 16.)
Converting XML from one format to another ”for example, when data is exchanged with a business partner. You can achieve this by applying XSLT (which is covered in Chapter 6).

Support for XML in Visual J# and the .NET Framework

Visual J# and the .NET Framework provide a great deal of support for the generation, consumption, and manipulation of XML when you develop applications.

Standards and Mechanisms Supported by the .NET Framework

As mentioned previously, there are two primary approaches to processing XML programmatically. The first is to perform forward-only, noncached parsing. This approach is well supported by classes in the .NET Framework. Although there is no official standard for this style of processing, it is commonly used when processing XML. (SAX provides a similar mechanism, but with a different philosophy, and it is in itself not a standard.) The second approach is to use in-memory manipulation through the DOM model. DOM is a standard defined by the World Wide Web Consortium (W3C) and is fully supported by classes in the .NET Framework.

The .NET Framework supports XML standards for document structure, namespaces, XSLT, and XPath. Other applicable XML- related standards might be supported in the future as they are formalized under the W3C.

Classes in the .NET Framework

The .NET classes for XML manipulation are split across several namespaces in the .NET Framework Class Library. These namespaces are

System.Xml , which contains the core classes for XML document manipulation and validation. This namespace also provides classes that support integration with ADO.NET DataSet objects.
System.Xml.Schema , which holds classes for the manipulation of XML schemas and support classes for performing validation.
System.Xml.Serialization , which defines classes for converting objects into an XML representation for storage or streaming.
System.Xml.XPath , which contains classes that support the navigation of XML documents in a flexible way based on XPath expressions.
System.Xml.Xsl , which comprises classes that support the transformation of XML documents using XSLT stylesheets.

In this chapter, we'll focus primarily on the document manipulation and validation capabilities provided by the classes in the System.Xml and System.Xml.Schema namespaces. Chapter 6 covers the transformational and navigational capabilities supported by the classes in System.Xml.Xsl and System.Xml.XPath . The serialization capabilities provided by the classes under System.Xml.Serialization are discussed in Chapter 10.

Manipulating XML Files in Visual J#

If your application uses XML, you might need to edit XML documents, define XML schemas, and so on. Naturally, you are free to do this manually (in Notepad) or in a specific XML-oriented tool. However, you can perform most XML-related tasks without leaving the Visual Studio .NET environment. You can add any relevant XML files or schemas to your Visual Studio .NET project by importing existing files or by creating new ones, as shown in Figure 5-1.

Figure 5-1. XML File and XML Schema options in the Add New Item dialog box

You can edit an XML document or schema in Visual Studio .NET using the XML Designer. The XML Designer allows you to view and edit an XML file in two ways. You can manipulate the raw XML as shown in Figure 5-2, or you can work with a structured data grid form of the data, as shown in Figure 5-3. You can easily switch between the views by choosing the appropriate command from the View menu. The XML Designer ensures that the two views are kept synchronized ”any changes, additions, or deletions made in one view will be reflected in the other.

Note

Data view in the XML Designer can show only regular, structured data, such as the results of a database query or a business document such as a purchase order. Other XML documents that use irregular tagging, such as marked -up text produced by an XML-based word processor, will not display correctly in Data view.

Figure 5-2. Working with raw XML in Visual Studio .NET

Figure 5-3. Working with XML in a data grid in Visual Studio .NET

The XML Designer also lets you create and manipulate XML schema documents (XSD files). Because XML schemas are themselves XML documents, the XML Designer again provides two views of the document. You can view the XML schema as raw XML or in a graphical view that shows the relationships between the different types of data in the document. (If you're familiar with databases, this will look very similar to the representation of a database schema.) You can create a schema from an existing XML document, import a preexisting schema, or create your own schema from scratch. You can then associate the XML file with a schema within the project through the XML document's properties. Once the file has a schema associated with it, it can be validated within Visual Studio .NET.

For more information about the XML Designer, see the Visual Studio .NET documentation.

I l @ ve RuBoard