From Past to Present Platform | Programming MicrosoftВ® .NET XML Web Services (Pro-Developer)

In the beginning, the universe was created. This has made a lot of people very angry and been widely regarded as a bad move.

—Douglas Adams

Like all things (even the universe), the Internet and the World Wide Web were not created perfect. When Tim Berners-Lee invented HTML and HTTP in 1990, he did it to make academic papers and other texts easily available to those who wanted to read them. This humble aim is hardly recognizable in the powerful, glitzy, customized, interactive sites of today, as companies sell their wares and plumb the depths of their corporate data for thousands of clients at a time. Still, the goal of data dissemination holds true. Needless to say, HTML 1.0 wasn’t designed with online markets in mind. Only in its sixth incarnation—XHTML 2.0—which came into being while this book was being written (http://www.w3c.org/TR/xhtml2)—does it cater to the current and future needs of the Web development community. The same is true of HTTP 1.0. HTTP 1.1 plus authentication digest and cookies, along with HTTPS, provide the backbone of today’s online community.

A great deal has happened to HTML since its inception. Central to that development has been a change of thinking. In 1993, a Web site was thought of as a collection of associated pages, much like a book or a thesis. Today, a Web site is thought of as a Web- based application.

Web Applications

Faster than the adoption of the Web by users was the adoption of server-side programming for the Web by developers. Client-side scripting let us give users the impression of some personalized interaction when they visited a site, when in truth there was little, if any. The Common Gateway Interface (CGI) gave developers control over what users saw, but it was difficult to learn and left everything beyond the basic input/output connection between client and server to the programmer. It was only in 1996 when Philip Carmichael designed and implemented version 1.0 of Active Server Pages (ASP) that we had something to sink our teeth into. Then version 2.0 arrived the following year as part of the Windows NT Option Pack, and we realized just how much we could actually do.

What ASP and its contemporaries—JavaServer Pages (JSP), PHP, and ColdFusion—offered developers was the ability to glue all the resources they had for traditional server-based applications to a Web-based front end. Web sites were no longer simple collections of single pages. They had become individual applications whose appearance and content could be tailored to the user. Developers had a quick and reliable way of maintaining the state of a user’s session across the stateless HTTP protocol and could dynamically generate content tailored to that user by asking for and reacting to the user’s input. They could make use of information stored in databases and create compiled business components on the server side, leading to faster reactions to a user’s clicks and requests. They could design these components just like any other three-tier application. (See Figure 1-1.)

click to expand
Figure 1-1: A Web application as a three-tier application

That we could approach Web development much as we did a windowed application was a revelation. Very quickly, huge sites (and not just e-commerce ones) appeared on the Web, enabled by server-side technology. They kick-started the dot-com boom.

XML

Soon after he published HTML 1.0, Tim Berners-Lee realized that the Web was becoming a digital Wild West with different lawmen (browsers, companies) making their own laws and developers struggling to provide the same sites to users under different jurisdictions. In 1994, he and Michael Dertouzos founded the World Wide Web Consortium (W3C) with the purpose of creating open standards for the Web that everyone should adhere to instead of building and using their own technologies. HTML was one of the first Web technologies they published as a standard (with HTML 3.2 coming in 1997), and a little over a year later, in February 1998, they released version 1.0 of the XML specification.

XML has been a cornerstone in the development of the programmable Internet, thanks to what it does and what it doesn’t do. Instead of being designed for a specific purpose, like HTML, which is designed to identify and mark up the various elements in a Web page, XML is designed as a metalanguage. That is, it provides the foundation for marking up anything we want—record collections, books, mathematical equations, chemical formulas, the contents of our databases, and so on—in plain text. By using XML to mark up data, we can make that data available in a platform-neutral format that can be shared online regardless of the operating system, database, and firewalls the host platform uses. All that’s required of the receiving system is the ability to parse the XML data and act on it. The parsing implies a slight performance hit, of course, but the gains of universally acceptable data far outweigh the hit.

A few other problems needed getting around, of course. The first was simply that two XML grammars might define a tag with the same name. How would parsers translate the tag? Namespaces were introduced to solve that problem, offering ways to define an unambiguous context within which to place tags. Together, XML and namespaces form the basis of all XML standards.

Even text itself presented developers with a few obstacles. Base64 encoding offered one solution for sharing pictures and sounds, but what about the encoding of the text itself? Every machine running Windows presented at least two encoding schemes— MS-DOS codepages and ANSI encoding—and many localized character sets. The solution was to base XML on the then-current version of Unicode (2.0) and have XML documents declare what character set and language their text was written in. XML parsers are required to understand Unicode and its most important encodings, so data can be safely exchanged between platforms. Now that Unicode 3.1 is stable, XML also needs to be updated, and sure enough, XML 1.1 is being prepared accordingly.

XML Schemas

The biggest issue with XML is making sure that both sender and recipient agree on the grammar of the XML so they know how to mark it up at one end and parse it back into text at the other. XML itself has a few rules—for example, a document is well-formed only if no elements overlap—but only a few. The rest of the grammar was initially laid down in Document Type Definitions (DTDs), an idea inherited from XML’s parent language, Standard Generalized Markup Language (SGML). By creating a DTD, you specified the order in which data elements would appear in your document, the attributes they could be given, and the child elements they could have. An XML document could then be checked and validated against the DTD while being parsed.

Like every other standard we’ve mentioned in this chapter, however, DTDs were only the first step toward the ideal and were far from perfect. They allowed us only to define content in terms of parsed or unparsed text, and, more important, they didn’t give us a means to map content to programming language data types such as integer or Boolean. Unless the systems on each end of a data exchange use the same means to convert numbers, dates, or Boolean values to text, we can’t guarantee how well, if at all, they will interoperate.

In 2001, the W3C released a successor to DTDs called XML Schemas that addressed the limitations in their predecessor (http://www.w3.org/XML/Schema). The ability of XML Schemas to strongly type the data being passed as XML is very good. There are 44 base types (known as XSD types from the abbreviation for XML Schema Definition language) from which you can define any complex type for a piece of data and convert the data to and from text in a standardized manner. Indeed, as you read this, many W3C XML standards are being rewritten to make use of the strong typing made possible by XML Schemas.

Distributed Applications

Parallel to the development of Web-based applications, the industry also made inroads into the construction of distributed applications. Distributed in this case meant applications splitting their workload privately and securely over more than one machine using their system’s own Remote Procedure Call (RPC) protocol to call methods and send information from machine to machine to machine as if they were all the same machine.

However, not for want of trying, the architectures and frameworks on which distributed applications were built—DCOM, CORBA, Java RMI—worked exclusively with themselves. The machines were tightly coupled together, and each architecture had its own RPC protocol, message format, and message description language. That is, an application written in CORBA would not cooperate with—or, indeed, understand—the same application written in DCOM or Java RMI on another machine, and vice-versa.

More important, your machines’ setups had to be almost identical to get a distributed application to work at all. Not only did all three architectures exist on several operating systems, but you also faced an uphill struggle to reconcile different data types, security systems, and debugging environments. For Java users, this was slightly less of a problem, but in general, such applications weren’t as distributable as you might have wished.

Building the Platform

Rather than trying to build a replacement for distributed application frameworks and protocols, those working to evolve the Web application beyond n-tier applications realized they could take the idea of a distributed application’s binary calls across machine boundaries and turn them into platform-neutral calls—matching a Web application’s indifference to platform and operating system. By 1998, the term Web services had been coined by either Andrew Layman or John Montgomery of Microsoft (they both think the other guy did it) to characterize this Web-based framework, and the model for the platform that enabled this ability was given some serious thought. Like any other distributed application framework, it needed the following:

A platform-independent format language for structured data exchange
A way of describing the structure of the data being exchanged
A standard method of packaging the data for transmission over the Internet
A way for Web services to describe their public interface to clients
A framework for programmatically locating Web services via their capabilities or description

The model was sound, but of course large pieces of it didn’t exist yet. Developers already had XML and XML Schemas for the transmission and description of data, and they had HTTP as a transport protocol. All of these were common to all systems, but how could they transform RPC calls over a proprietary protocol into something any system could receive and understand?

SOAP

The answer once again was to use XML, and in 1999 version 0.1 of the Simple Object Access Protocol (SOAP) was released by Don Box, Tim Ewald, and a few others. Now in version 1.2, SOAP describes a message framework for a function call and answer from one machine to another in the manner of RPC but formatted as an XML text stream rather than as a binary call. A SOAP message, be it a request or a response, is written in plain XML text and adheres to the SOAP standard, so any system can understand what it says and act accordingly. SOAP 1.2 even caters to the serialization of complex objects into text-based XSD-typed collections of their properties. Thus you can make calls in SOAP to any remote method regardless of the complexity of the method’s parameters or return type.

SOAP has developed rapidly since its inception, but it has stuck to one of its goals—to keep things simple. You’ll see how simple a little later on.

WSDL and UDDI

With a way for client and service to converse and exchange data, the next question was how to describe the interface between the two and the required resources: the type and structure of the calls, required parameters, return values, protocol bindings, and so on. Another XML grammar called Web Services Description Language (WSDL) was developed for this task. However, unlike SOAP, WSDL is not so simple that you can write a WSDL document in a text editor. Instead, most Web service toolkits contain a tool that generates the WSDL description of a service and its methods for you. WSDL does perform the task it was designed for, but it is still evolving and becoming (we hope) more straightforward to use. We’ll look at it in detail in Chapter 3.

With WSDL able to describe the service details to potential clients, all that was needed was a way to discover that description once the service was published. In other words, Web search engines tailored specifically to locating services and their WSDL were required. The means to build them arrived in the form of Universal Description, Discovery, and Integration (UDDI).

UDDI is the standard for Web service cataloging. Much like a business directory, a UDDI server lets you store the contact locations of your Web service, a broad description of the Web service’s purpose, and the location of the service’s WSDL documents. For example, you can create a Web service to return shipping costs for DVDs, return a weather report, or return the string ‘Hello World’. Then you can create an entry on a UDDI server to let the world know about your Web service.

Developers looking for a service with certain functionality can also use UDDI servers to help them find what they seek. For these clients, the servers act as an open directory supporting idle browsing and directed searches by name, business type, or binding template. If the developer finds a matching service, he can follow the link to the service’s WSDL document for more information.

So Where Are We Now?

Every platform and development language now has some effort ongoing toward the use of Web services. It wasn’t difficult to see that Web services could affect a great many systems, be they Internet, intranet, or extranet connected. The applications are not limited to returning simple information or performing simple functions, as you can see in Figure 1-2.

click to expand
Figure 1-2: Web services can be at the heart of everything.

The continuing growth of available bandwidth and network communication speed across LANs, WANs, and the Internet means that calls to such services will not take very long. Legacy data systems can be integrated into enterprise LANs at much less cost than having their data ported to a more contemporary setup. Businesses can expose their information and expert systems to the public and other companies, and they can expose additional functionality as Web services as well. Employees working off site can use their company’s system through a Web service interface. When a company updates its system, those consuming its services can make use of the updates automatically. Unlike components in DCOM applications, which talk to each other over the wire and nothing else, these Web services have taken the concepts of application service providers, distributed applications, open standards, and platform agnosticism and rolled them all into one grand scheme. In short, they have the potential to alter the way we think about and develop applications.

With the plumbing (XML, SOAP, WSDL, UDDI, HTTP) more or less complete, the universal adoption of Web services is assured as long as those developing the applications keep to the following three tenets:

Systems are only loosely coupled together by nothing more than SOAP messages transmitted over HTTP or another open transport protocol (such as TCP or SMTP).
A service must be described in a widely supported open interface definition language (such as WSDL).
If service and client need to exchange data, the exchange must be done in a universal data format with agreement on how data types are serialized (using XML and XML Schemas, for example).

With Web service toolkits appearing for most development platforms, programmers now have the option of either working with the plumbing directly or using the APIs for boilerplate code. It’s an interesting time, and nowhere more so than in the Microsoft camp. First they gave us the SOAP Toolkit for COM developers, and now we have .NET.

.NET

When Microsoft announced .NET as the replacement for COM and DCOM, one new feature it placed front and center was its intrinsic support for Web services. Whereas the SOAP Toolkit gave COM developers the ability to create and consume Web services, the .NET Framework and Visual Studio .NET made it considerably easier to do both. The creation and transmission of a request to a Web service was reduced to a single method call if you chose to leave it as such. The .NET Web service classes are also open enough that you can alter almost any aspect of a call or XML if you want.

Web services are not just a part of the .NET Framework; they’re at the heart of Microsoft’s strategy for future application development. Microsoft’s n-tier design for enterprise applications hasn’t changed the division between data, business logic, and presentation logic, but it has refined the categorization of that logic into seven service layers. Figure 1-3 shows the .NET model for a distributed enterprise application.

click to expand
Figure 1-3: The model for a .NET distributed enterprise application

In this model, the Web services layer acts as an intermediary between the presentation and business rules tiers, receiving and returning input and data from clients and acting accordingly. Of course, this assumes that the presentation and business rules tiers are on different machines that are connectable over some type of network. If they’re both on the same machine, the Business fa ade layer takes on this middleman role.

One of the first products Microsoft tried to bring out under the .NET banner was a set of consumer-oriented Web services that replaced and enhanced the Microsoft Passport and Wallet products. However, .NET My Services, as it was known, was rejected by potential users, who were not convinced that Microsoft could provide and be trusted with a secure central data store on which the services would run.

Public trust aside, the protracted rollout of .NET-based servers and applications means that the Microsoft world will be a place where almost every document is in XML and every server can expose some of its functionality as a Web service that applications can make use of remotely. As far as Microsoft is concerned, interoperability is the key to the future.