XML Schemas and Serialization


So far we’ve seen that the .NET XML serializer knows how to move .NET objects into and out of XML documents, and that we can control this process by using .NET programming attributes at design time or by using code at run time. We’re still missing one piece, however. The XML documents that we deal with can be quite large and quite complex—hundreds of different element types in a document are common, and thousands are not unheard of. Writing a .NET class that properly wraps every XML element or attribute that anyone might send us or that we might send to anyone is essentially impossible. It would take far too long and cost far too much to get any useful program done, and you’d never get it fully debugged. To take full advantage of this serialization capability and make some money, we need a way of automatically generating the wrapper classes from some definition of the XML documents we expect to encounter.

We need automatic generation of .NET classes.

Fortunately .NET provides us with an easy way to do this, as long as we have a schema describing the XML documents that we will be dealing with. The schema of an XML document is the description, written in XML itself, of everything that a valid XML document in a particular problem domain vocabulary might contain. A schema will specify the names of the XML elements and attributes, their occurrence frequencies and optionalities, and their data types. A schema is conceptually similar to the collection of header files that a program compiler uses to ensure that all the variables in a program are properly named and of the right types. A schema is often produced and distributed by the standards agency that governs a particular industry, as discussed earlier.

An XML schema describes the contents that an XML document can legally contain.

The .NET Framework SDK provides a command-line utility program called XSD.exe that generates XML schemas from .NET classes and .NET classes from XML schemas. In the former case, the utility reads a .NET assembly and produces an XML schema describing the XML documents that the assembly’s objects would produce if serialized. When I ran XSD against the EXE produced in this chapter’s first example, using the command-line option to tell it to produce a schema only for the Point class, it produced the schema shown in Listing 7-10. If you thought anyone else cared, you would publish this schema so that anyone who wanted to send you a Point object serialized in XML would know how to construct their XML document. If you had decorated the classes with attributes to control their serialization, the schema would have been different. Try it on the ElementPoint, AttributePoint, and Rectangle classes from the second example and see what you get.

A command-line utility can generate an XML schema from a .NET assembly.

Listing 7-10: Schema of Point class from first example produced by XSD.exe.

start example
<xs:schema elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/ XMLSchema"> <xs:element name="Point" nillable="true" type="Point" /> <xs:complexType name="Point"> <xs:sequence> <xs:element minOccurs="1" maxOccurs="1" name="X" type="xs:int" /> <xs:element minOccurs="1" maxOccurs="1" name="Y" type="xs:int" /> </xs:sequence> </xs:complexType> </xs:schema>
end example

More useful to us is the ability of the XSD tool to read an XML schema and produce a .NET class that properly wraps a document of the type described. Suppose an industry-wide consortium of Microsoft and non- Microsoft users (i.e., me) have agreed on the XML definition of a polygon and published a schema that describes it. An XML document that conforms to this schema is shown in Listing 7-11. The schema itself is somewhat verbose, even at this level of simplicity, so I won’t show it here in the text, but you can examine it in the downloadable sample code.

A schema sample program starts here.

Listing 7-11: Industry-standard XML document describing polygon.

start example
<Polygon><Vertices> <Point> <X>1</X> <Y>1</Y> </Point> <Point> <X>3</X> <Y>3</Y> </Point> </Vertices> </Polygon>
end example

start sidebar
Tips from the Trenches

If you have an XML document but don’t have a schema, the XSD tool can produce the latter by inference from the former. Obviously, it won’t catch everything—for example, optional schema elements that the XML document could contain but doesn’t—but it will get you most of what you need very quickly.

end sidebar

We now want to write a .NET program that will properly read and write documents that conform to this schema. The sample program is shown in Figure 7-5. After I generate a standard Windows Forms project (see Chapter 5), I need to generate my wrapper class. I do this by running XSD from the command line, passing it my XML schema and specifying my choice of output language. The tool then spits out the wrapper class shown in Listing 7-12, which I add to my Visual Studio project. That’s all I have to do. I now use the wrapper class just as I did the classes that I wrote by hand in the first two examples, except that this one took me much less time to write. The code that uses the wrapper class objects is very similar to that in the first two examples, so I won’t show it here. If you’ve ever written an application that uses a generic XML parser, you’ll realize that .NET serialization combined with automatic wrapper-class generation saves as much time as writing COM programs in Visual Basic instead of C (not C++, just plain C).

click to expand
Figure 7-5: Sample program serializing polygon.

Listing 7-12: Polygon XML document wrapper class produced by XSD.exe.

start example
Public Class Point Public X As Integer Public Y As Integer End Class Public Class Polygon Public Vertices() As Point End Class 
end example

I need to mention a couple of application notes here. First, to make my sample app look better, I’ve overridden the method ToString in the Polygon class in my generated file. This produces the lines showing the point values, such as “X= 5, Y=3” strings in the list box. Modifying the wrapper class is perfectly legal since no other tools have to look at it after you generate it, and you’ll probably wind up doing this a fair amount. But if you regenerate your wrapper class, you’ll lose all of your changes, so make sure you save a copy. Alternatively, you can derive your own class from the wrapper class and make all modifications to the derived class.

Any alterations you make to the wrapper class are lost when you regenerate it.

I also encountered a funky Visual Basic-against-the-world problem. In C# and Java, arrays are zero-based, so if you declare an array of five integers, you get five elements numbered 0 through 4. In Visual Basic, arrays were originally one-based, so if you declared an array of five integers, you got elements 1 through 5. Starting with Visual Basic 6 arrays contain an extra element, so if you declare an array of five integers, you’ll actually get six, 0 through 5. The Visual Studio .NET team tried to change Visual Basic .NET arrays to be zero-based so as to match the rest of the world, as the first edition of this book reported. They had to rescind that decision under heavy fire from the Visual Basic user community, who threatened to simply add 1 to their existing array declarations rather than change their code. So if you just do the standard Visual Basic thing and have your program count 1 through 5, the element collections in your XML documents will begin with an extra element that’s empty. Some schemas don’t allow empty elements, and it will look like hell in any event. Visual Basic programmers will have to carefully write their code to handle this off-by-one problem, as I did in this sample. And C# programmers, who imagine themselves to be wearing halos, will have to teach their tech support departments how to diagnose this problem and explain it to callers who program in Visual Basic. As the French say, Plus a change, plus c’est la m me chose. (“The more things change, the more they stay the same.”) Does your dual Pentium-2000 PC with 1 gigabyte of RAM process words any faster than your 1-megabyte 286-12 did? Mine neither.

Be careful of zero-based versus one-based arrays in Visual Basic.

start sidebar
Tips from the Trenches

Generic parsers supply some capabilities that wrapper classes don’t; for example, locating elements within the document by ID attribute or through an XPath expression. Even though using a wrapper class eliminates most of the need for such operations, you will still occasionally run across them, particularly the former. The .NET Reflection API (see Chapter 11) allows you to search through an object’s metadata at run time to locate ID attributes and read their values. I’ve written a newsletter article that demonstrates the use of reflection to search through an XML wrapper class to find an object containing a specific ID attribute. It’s on line at http://www.rollthunder.com/newslv4n3.htm. The techniques described therein can be extended into a generic XPath processor for wrapper classes.

end sidebar




Introducing Microsoft. NET
Introducing Microsoft .NET (Pro-Developer)
ISBN: 0735619182
EAN: 2147483647
Year: 2003
Pages: 110

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net