Using XML for Metadata Definition | XML Programming Bible

Software and hardware components have long used metadata to identify themselves when communicating. Historically this was done in a proprietary fashion where the sender and receiver first agreed on an interchange format before they could exchange metadata. XML changed this.

Today we can build applications that find and execute remote Web Services without prior knowledge of their existence. We can also publish Web Services of our own without knowing who will use those services, where they will come from, how they will use them, or even what type of client software will make the request. These services can now seamlessly exchange rich and abstract data objects.

This magic is made possible by using XML to exchange metadata. You might ask yourself, "Why all the fuss about XML? Why is a generic markup language taking over the world?" XML has many advantages that make it the ideal carrier for data and metadata on the Internet.

It can be read and written by humans. This was one of the reasons for HTML's success and it works equally well for XML. This makes the language more accessible to novices and lowers the fear factor for everyone. You can see it, touch it, and use Notepad to create it. This is useful.
It is open. XML is defined by the W3C. No single company owns it. Microsoft has added muscle to the W3C specification by implementing standards-compliant XML parsers and placing them in the eager hands of its developer community. Nevertheless, the language is platform-independent. UNIX clients can exchange XML data with .NET Web Services and vice versa.
It is ubiquitous. XML parsers are everywhere. Getting your hands on an XML toolset and using the technology immediately is easy.
It is flexible. Perhaps the most compelling reason for using XML is that it has no prescribed use. You decide how you want to use it in your application. You determine its scope, which can range from creating small XML documents to transmitting data between applications or application layers to creating your own standards body that regulates the use of XML within an entire industry.

XML is here to stay, so let's see how we can use it to transfer metadata.

Describing an Object

Object serialization is not a new concept to experienced object-oriented programmers. The idea is to take an instantiated object in memory and convert its state, or property values, into a serialized data stream that can be written to disk or transmitted over a network. The serialized version of the object must be complete enough so that the same program, or a completely different program, can reassemble it into an identical instance. As we'll see in this section, both objects and their class definitions can be serialized.

Visual C++ and Java are two examples of widely used programming languages that allow object serialization. Each language uses a proprietary data format to store the object and its metadata. This means that applications written in different languages can find trying to share serialized objects to be difficult. Not only do these applications need to understand each other's serialization format, they also need to make sense out of the serialized object's metadata to instantiate it.

Once again, XML comes to the rescue. Developers can use XML to facilitate object sharing by creating a schema that defines how objects can be serialized into platform-independent XML documents. Applications can share data by using the new object schema as a translator. Each application needs only a conduit to help marshal native objects into XML format and back again for the two to communicate.

Not all object states can be serialized. References to transient resources, such as database connections, cannot be persisted or transmitted to another program. All languages that support object serialization provide a mechanism for programmers to specify which properties should be serialized and which should not.

The class definition for the CoreXMLPerson object is contained in Listing 8-5.

Listing 8-5 person.cs: A C# class definition for a <Person> object.

 using System; namespace CSharpWebService { /// <summary> /// A Simple Person Class Definition /// </summary> public class Person { public String firstName; public String lastName; public String birthDate; public String hairColor; public String favoriteColor; public Address address; public Person() { firstName = ""; lastName = ""; birthDate = ""; hairColor = ""; favoriteColor = ""; address = new Address(); } public Person(String _firstName, String _lastName, String _birthDate, String _hairColor, String _favoriteColor) { firstName = _firstName; lastName = _lastName; birthDate = _birthDate; hairColor = _hairColor; favoriteColor = _favoriteColor; address = new Address(); } public void setFirstName(String _firstName) { firstName = _firstName; } public void setLastName(String _lastName) { lastName = _lastName; } public void setBirthDate(String _birthDate)  { birthDate = _birthDate; } public void setHairColor(String _hairColor) { hairColor = _hairColor; } public void setFavoriteColor(String _favoriteColor) { favoriteColor = _favoriteColor; } } }

Our object definition declares properties for the fields, <firstName>, <lastName>, <birthDate>, <hairColor>, <favoriteColor>, and <address>. Now let's create an instance of this object in C#.

 Person p = new Person("John", "Smith", "11/30/1974", "brown", "blue");

The Object Instance XML Document

This call to the Person constructor will create a Person instance in memory. If we want to save this object to a file or transmit it to another program, we must first serialize it. How do we do this? The simplest solution is to construct an Object Instance XML document that holds the state of the object in memory. This document looks astonishingly similar to a previous incarnation of this data.

 <Person> <address type="home"> <address1>200 Brattle Street</address1> <city>Cambridge</city> <state>MA</state> <zip>02138</zip> </address> <firstName>John</firstName> <lastName>Smith</lastName> <hairColor>brown</hairColor> <birthDate>11/30/1973</birthDate> <favoriteColor>blue</favoriteColor> </Person>

Is this enough information to reconstruct the Person object in memory? Almost. Certainly, the program that originally created the object could parse the XML document and call the Person constructor method.

The Class Definition XML Document

Another program, however, would be unable to do so without more information about the Person class definition. In this case we need another XML document that describes the Person class definition. Listing 8-6 is an excerpt from such a document.

Listing 8-6 An excerpt from createPerson.wsdl, an XML Web Service descriptor, describing the Person class definition.

 <s:complexType name="Person"> <s:sequence> <s:element minOccurs="1" maxOccurs="1" name="firstName"  nillable="true" type="s:string" />  <s:element minOccurs="1" maxOccurs="1" name="lastName"  nillable="true" type="s:string" />  <s:element minOccurs="1" maxOccurs="1" name="birthDate"  nillable="true" type="s:string" />  <s:element minOccurs="1" maxOccurs="1" name="hairColor"  nillable="true" type="s:string" />  <s:element minOccurs="1" maxOccurs="1" name="favoriteColor"  nillable="true" type="s:string" />  <s:element minOccurs="1" maxOccurs="1" name="address"  nillable="true" type="s0:Address" /> </s:sequence> </s:complexType>

Notice that all the property information in the original class definition appears in this XML document. The most important pieces of information are the property's name and data type. You might also have noticed that the address property is listed as type s0:Address. This is because the address property is a reference to another type of object named Address. This class would require its own XML descriptor as well, but it was omitted for brevity's sake.

These two XML documents, the class document and the instance document, provide enough information for another program to generate a comparable class definition and create an instance of this class with the desired state. The process of generating these stub classes and instantiating remote objects in the .NET Developer Studio will be covered more thoroughly later in the chapter.

Describing a Service

The future of distributed computing over the Internet has been hotly contested in recent years. Many object-oriented programmers believed that, one day, distributed applications would be built from objects located all over the Internet. Computers could participate in a global object community by publishing objects that could be run internally or whose code could be transmitted over the Internet. Unfortunately, firewalls, unreliable network connections, and competing technologies have all hindered this movement and have led developers to search for a more practical way to build distributed applications. In this section we will review some existing distributed technologies and then dive into the emerging distributed paradigm of Web Services.

Although these technologies differ in approach and implementation, they all share one common problem: how to describe remote methods and services so they can be discovered and used dynamically. All distributed technologies use some sort of metadata to communicate the features and semantics of the services they host. In the past the format of this metadata has often been proprietary. Today XML is used as an ideal carrier for remote service metadata.

CORBA and IIOP

When the Common Object Request Broker Architecture (CORBA) was introduced in the early 1990s, its enthusiasts speculated that fine-grained distributed objects would one day become widely available. Applications would be able to locate and use business software components via a vast network of Object Request Brokers (ORBs), and then CORBA would allow objects implemented in C++, Java, and even COBOL to be executed remotely. CORBA objects publish their behavior using the Interface Definition Language (IDL) (more metadata!) and register this information with an ORB running on a local network.

Client programs interact with these objects via stub interfaces that remotely execute an object's methods on the machine hosting the software. The ORB sits in the middle and dispatches all intermachine communication. ORBs around the Internet can share information about the objects they manage. The Internet Inter-ORB Protocol (IIOP) was developed for this purpose. CORBA supports run-time discovery of objects and is a flexible architecture for coordinating the exchanges of fine-grained software components.

JAVA RMI

Like CORBA, Java's RMI subsystem allows objects to be distributed across a network. Unlike CORBA, Java RMI can transport an entire class, software and all, across application boundaries. This is possible because all Java applications must run on the Java Virtual Machine. This enables distributed Java applications to safely download compiled byte streams and execute them locally. Today Java allows objects to be shared by either the RMI or IIOP protocols.

Although the ability to share executable objects dynamically over the Internet opens some exciting possibilities, Java RMI does have shortcomings. The technology only works if the applications involved in the exchange are both written in Java. Programs written to use RMI also require a fair amount of preparation. Remote stub classes need to be generated statically or downloaded dynamically for remote methods to work properly. Finally RMI uses a proprietary protocol to exchange information. This means more server software and firewall tuning.

Web Services—An Introduction

The push for widely distributed software components might have had more to do with the fact that we could do it, instead of whether or not we actually needed to do it. Although fine-grained distributed software components are extremely valuable at a local level, their usefulness diminishes as the scale of the network grows. Failing networks, uncooperative firewalls, and overall complexity made assembling an application from small, widely distributed pieces prohibitively difficult.

Furthermore, as E-commerce exploded on the Internet, it became apparent that data, not code, was the most precious commodity in Internet applications. The HTTP protocol had existed for years and firewalls were configured to cope with it. XML proved to be a powerful vehicle for transporting any type of information. Surely there was some way to combine the two technologies and produce the next big advance in distributed computing.

Thus, Web Services were born. Web Services are based on two simple ideas.

"The need for distributed business services is greater than the need for distributed business components." What is the difference between a service and a component, anyway? A software component tends to be a fine-grained object, or set of objects, that can be embedded directly in your application. Examples include GUI widgets, API-to-desktop applications, and other tools. A service encompasses a broader interface and is more abstract. An airline reservation system is a good example of a potential Web Service. A corporate intranet application might communicate with this service over the Internet to provide convenient flight booking to the company's employees.
"Web Services should be easy to access." Web Services can be accessed over the Internet using three primary methods. The first is through SOAP. SOAP enables applications to send complex objects in XML format between one another. The second method is the HTTP POST protocol. This is the same POST protocol that HTML forms use. XML name-value pairs are submitted to the Web Service and the results are received through standard output. The third way to access a Web Service is by using the HTTP GET protocol. This approach sends all of the client arguments through a URL Query string to the Web Service. The XML results are sent back on standard output. All three of these approaches use HTTP to transfer data. Therefore, no fancy client software or firewall configurations are needed to use a Web Service in your application.

Describing a Web Service

Web Services use metadata to publish themselves. The metadata is stored in Web Services Description Language (WSDL) XML documents. WSDL is a contract language that defines the messaging interface to the service. IBM and Microsoft were the chief designers of this language. The WSDL specification is maintained by the W3C. You can find it at http://www.w3.org/TR/wsdl.

Any client on the Internet can request a WSDL document with an HTTP request. The WSDL document specifies the name of the service, the names and types of the arguments it expects, the names and types of the results it returns, and directions on how to access the service (that is, SOAP message, HTTP POST, HTTP GET, or all three).

Let's look at a "HelloWorld" example that performs the valuable service of saying "Hello" to you. Imagine how much brighter our offices would be if we all used this service every morning. The HelloWorld Web Service requests only a single piece of information from you: your name. Upon receiving this data the service will present you with a happy salutation. The HelloWorld service, because it is so popular, is housed on a supercomputer somewhere directly on an Internet backbone. Say you decide to write an application that accesses this service, so you download the HelloWorld WSDL XML document. A listing of this document, with descriptions, follows.

 <?xml version="1.0" encoding="utf-8" ?> <definitions xmlns:s="http://www.w3.org/2001/XMLSchema" 
xmlns:http="http://schemas.xmlsoap.org/wsdl/http/" xmlns:mime=
"http://schemas.xmlsoap.org/wsdl/mime/" xmlns:tm=
"http://microsoft.com/wsdl/mime/textMatching/" xmlns:soap=
"http://schemas.xmlsoap.org/wsdl/soap/" xmlns:soapenc=
"http://schemas.xmlsoap.org/soap/encoding/" xmlns:s0=
"http://tempuri.org/" targetNamespace="http://tempuri.org/" 
xmlns="http://schemas.xmlsoap.org/wsdl/">

WSDL documents lead off with a host of references to different namespaces. The s namespace references the W3C's XSD Schema, with which you should be familiar by now. The http namespace identifies elements that supply HTTP connection information. The mime namespace is used to specify the MIME types of data returned from the service. The tm namespace is Microsoft's text matching namespace. This is not used in the HelloWorld example. The soap namespace is used to isolate SOAP-specific elements in the WSDL document. The soapenc namespace is used for SOAP encoding elements. The s0 namespace is used to isolate service-specific data structures. The URI for this namespace points to http://tempuri.org/, which is used only for testing purposes. In a production environment the URI would most likely reference the service vendor's domain. Finally the default namespace is set to the general WSDL namespace defined at http://schemas.xmlsoap.org/wsdl.

The types section of the WSDL document tells you the name and types of the arguments and the return result of the HelloWorld service.

 <types> <s:schema attributeFormDefault="qualified" 
elementFormDefault="qualified" targetNamespace="http://tempuri.org/">

The <HelloWorld> element encapsulates all the incoming arguments to the service. We see here that the service expects a <complexType> argument. In this example the argument has the name name and a type of string. WSDL supports arguments more complex than strings. Service arguments can be a series of nested objects. Those object classes and subclasses must be fully described in this section of the WSDL document.

 <s:element name="HelloWorld"> <s:complexType> <s:sequence> <s:element minOccurs="1" maxOccurs="1" name="name"  nillable="true" type="s:string" /> </s:sequence> </s:complexType> </s:element>

The <HelloWorldResponse> element defines the data structure the client can expect to receive from the service. We see here that the service will return a single variable of type string.

 <s:element name="HelloWorldResponse"> <s:complexType> <s:sequence> <s:element minOccurs="1" maxOccurs="1" name= "HelloWorldResult" nillable="true" type="s:string" /> </s:sequence> </s:complexType> </s:element> <s:element name="string" nillable="true" type="s:string" /> </s:schema> </types>

The following section of the WSDL document describes the protocols that can be used to access the service. Notice that SOAP, GET, and POST are all covered in this section. WSDL needs to describe to potential clients the call and return semantics that the service provides. These semantics include URLs, MIME Types, and input/output message formats.

 <message name="HelloWorldSoapIn"> <part name="parameters" element="s0:HelloWorld" /> </message> <message name="HelloWorldSoapOut"> <part name="parameters" element="s0:HelloWorldResponse" /> </message> <message name="HelloWorldHttpGetIn"> <part name="name" type="s:string" /> </message> <message name="HelloWorldHttpGetOut"> <part name="Body" element="s0:string" /> </message> <message name="HelloWorldHttpPostIn"> <part name="name" type="s:string" /> </message> <message name="HelloWorldHttpPostOut"> <part name="Body" element="s0:string" /> </message> <portType name="Service1Soap"> <operation name="HelloWorld"> <input message="s0:HelloWorldSoapIn" /> <output message="s0:HelloWorldSoapOut" /> </operation> </portType> <portType name="Service1HttpGet"> <operation name="HelloWorld"> <input message="s0:HelloWorldHttpGetIn" /> <output message="s0:HelloWorldHttpGetOut" /> </operation> </portType> <portType name="Service1HttpPost"> <operation name="HelloWorld"> <input message="s0:HelloWorldHttpPostIn" /> <output message="s0:HelloWorldHttpPostOut" /> </operation> </portType> <binding name="Service1Soap" type="s0:Service1Soap"> <soap:binding transport="http://schemas.xmlsoap.org/soap/http" style="document" /> <operation name="HelloWorld"> <soap:operation soapAction="http://tempuri.org/HelloWorld" style="document" /> <input> <soap:body use="literal" /> </input> <output> <soap:body use="literal" /> </output> </operation> </binding> <binding name="Service1HttpGet" type="s0:Service1HttpGet"> <http:binding verb="GET" /> <operation name="HelloWorld"> <http:operation location="/HelloWorld" /> <input> <http:urlEncoded /> </input> <output> <mime:mimeXml part="Body" /> </output> </operation> </binding> <binding name="Service1HttpPost" type="s0:Service1HttpPost"> <http:binding verb="POST" /> <operation name="HelloWorld"> <http:operation location="/HelloWorld" /> <input> <mime:content type="application/x-www-form-urlencoded" /> </input> <output> <mime:mimeXml part="Body" /> </output> </operation> </binding> <service name="Service1"> <port name="Service1Soap" binding="s0:Service1Soap"> <soap:address location="http://localhost/HelloWorld/ HelloWorld.asmx" /> </port> <port name="Service1HttpGet" binding="s0:Service1HttpGet"> <http:address location="http://localhost/HelloWorld/ HelloWorld.asmx" /> </port> <port name="Service1HttpPost" binding="s0:Service1HttpPost"> <http:address location="http://localhost/HelloWorld/ HelloWorld.asmx" /> </port> </service> </definitions>