In some of your past mainframe assignments, you may have had the pleasure of exploring the Electronic Data Interchange (EDI) format. [15] You may have had an assignment where you needed to exchange data with a business partner. After struggling with delimited and positional files, perhaps you and your business partner both agreed that EDI would be "the" format. You found an EDI specification that matched your needs. While your systems department was setting up your expensive EDI server, you began to learn the cryptic EDI document specifications. Finally, after much effort and preparation, you formatted your data accordingly , and the data exchange was then established.
Yes, legacy applications (particularly business-to-business applications) survived with EDI. However, XML has taken the idea of data packaging and data description many steps forward.
| Note | Those of you (lucky ones) who never touched EDI have probably worked with COBOL copybooks, the COBOL Data Division file descriptions (FDs) and record descriptions, and COBOL PICTURE clauses and VALUE clauses. Well, what you can do with these COBOL legacy technologies you can do with XML. However, the opposite is far from true ” let's just say that XML is almost in a league of its own. | 
You have used these COBOL legacy technologies as a way to design, package, and deliver data (even if just within your COBOL program), right? The COBOL syntax (or the EDI specification) has helped document the electronic transport of data. Likewise, XML helps you accomplish similar objectives within a program, but more important, it helps you do so outside your programs and outside your company.
You can understand the significance of XML if you first realize that Microsoft has integrated their entire .NET product offering with XML. It is virtually impossible to find a portion of .NET that does not use XML ”it is everywhere, inside and out. So, what is XML? XML is a complete set of rules for designing, representing, and delivering text-based, delimited, self-documenting , structured data. Now, that was a mouthful, which reminds me of another point: XML is referred to as being "verbose by design" (as a compliment).
| Cross-Reference | The W3C has several Recommendations as well as Proposed Recommendations in progress for the various XML related technologies. You can find more information about these Recommendations and the various kinds of W3C technical reports available for XML at http://www.w3.org/TR/#About . | 
Looking at the sample XML file in Figure 4-5, you will notice that XML looks similar to HTML when you view it in the Notepad editor.
 
  Figure 4-6 presents the same XML file in the IE browser after it's been double-clicked. Notice that the same file is displayed in two separate browser windows to demonstrate the ability to collapse and expand each XML element.
 
  You can type XML statements into a regular text file using Notepad. [16] Then, save the file with an .xml extension. That's it. When you double-click it, the XML parser that lives in IE parses it and creates the collapsible XML tree structure display shown in Figure 4-6. Although XML looks like HTML, XML differs from HTML in a couple of ways:
With XML, you have the flexibility to create your own names for your tags. For example, a valid XML tag could look like this: <createYourOwnName> <createYourOwnName/>. With HTML, however, you must use the standard, predefined tags.
With XML, you are packaging and documenting data, whereas with HTML you are formatting a screen display.
One additional difference between XML and HTML is that XML must be wellformed. This is one of the main reasons that the W3C has created XHTML, or well- formed HTML. To give you an idea of what it means for XML to be wellformed , consider the following list of rules:
The XML document must begin with the XML declaration ( <?xml version="1.0"?> ).
Any elements must not overlap, but they may nest.
Elements must have both a start and end tag if there is data in the element.
Empty elements must end with a /> if only a single tag is used.
Quote marks must be used on attribute values.
I present this list of rules just to give you a sense of what well-formed means. There are more rules that you will want to learn about. Also, the rules for well-formed HTML are slightly different. As you continue your retraining exploration, be sure to keep these points in mind, and for more information, please visit http://www.w3.org/TR/REC-xml#sec-well-formed .
| Note | In the previous chapter and this one, I reiterated the importance of "knowing" your browser. I pointed out that you really want to know the type and version of browser that sits on your desktop. Well, even with XML, your browser choice will have an impact. Your browser (type and version) choice influences what version (if any) of an XML parser and XSL processor you may already have, not to mention that it impacts the HTML, DHTML, and scripting-related technologies discussed earlier. By the way, you can install an XML parser separately, without having any specific browser to go along with it. | 
To give your XML learning experience a little kick start, I have included the following sampling of the basic XML terminology:
XML: XML, or Extensible Markup Language, is a markup language that allows you to create your own tags to document the data contained within the tags. Both humans and machines can then read the entire XML document. Additionally, XML files are text based, which makes it easy to transmit them over HTTP and through security firewalls. (For more information, visit http://www.w3.org/XML/ .)
XML declaration: <?xml version="1.0"?> is the one tag that you must have as the first line in your XML document. Beyond that, there are many other terms and syntax rules to dig into. Be on the lookout for terms such as "elements," "attributes," "data islands," "entity," "template," and "nodes." Please take advantage of the references that I have provided for your continued learning at the end of the chapter.
XML Schema and XML Schema Definition (XSD) : XML Schema is to XML as COBOL copybooks are to mainframe VSAM files and QSAM files. You can create your own customized schemas. In addition, there are "approved" schemas for most major industries (similar to the proprietary EDI specifications). XSD, simply put, is the W3C language standard that you use to define XML Schemas (which have their own W3C Recommendation). The XML Schema defines the structure and semantics of the well-formed XML documents that you will be creating. XML Schemas and XSD replace the older approach of document type definition (DTD) and XML-Data Reduced (XDR).
| Cross-Reference | You can find more information about the W3C's XML Schema Recommendation at http://www.w3.org/XML/Schema . | 
Although I referred to the topics just listed as "basic XML terminology," allow me to point out that I am using the word "basic" rather loosely. Be prepared to devote some time to these "basic"-level XML topics during your retraining effort. Later, you may need to extend your XML knowledge with other, more advanced topics. When that time comes, the following section should be useful.
The XML technologies are so vast that a person could easily specialize in this area and remain there. For your immediate purposes (to learn .NET for Windows and Web programming), I suggest learning the basics of XML and XML Schemas. Then revisit the remaining XML topics later. Considering the overwhelming presence that XML has in the industry, you will not want to exclude these additional XML-related technologies from your future training.
The upcoming sections discuss these advanced topics:
XSL, XSLT, XPath, and XQuery
DOM, SAX, and SOM
MSXML
Generally speaking, Extensible Stylesheet Language (XSL) is a W3C technology that you use to transform XML into HTML. XSL is really much more than that, though. To begin with, XSL is three interrelated technologies combined into one: XSL Transformations (XSLT), XSL Formatting Objects, and XML Path Language (XPath). Together, these XSL technologies offer a very flexible and powerful way to format, calculate, arrange, add to, delete from, and otherwise transform XML data into HTML [17] for display. The style sheet processor and formatter used with XSL are built right into your Web browser.
| Cross-Reference | For more information about XSL and XSLT, visit http://www.w3.org/Style/XSL/ and http://www.w3.org/Style/XSL/ WhatIsXSL.html, respectively. | 
Specifically, XSL is an XML-based language that you use to express style sheets. Your XML document and your XSL style sheet are processed together by the XSL style sheet processor to create the desired display result. This process involves the transformation and formatting of the XML document. The XSL Formatting Objects and formatting properties assist in the actual formatting semantics. To manipulate your XML beyond the Formatting Objects' capabilities, you can use the XPath language. You use XPath to navigate the hierarchical node structure of your XML document to perform conditional testing, expressions, functions, and logic for varying desired results.
| Cross-Reference | For more information about XPath, go to http://www.w3.org/TR/xpath . | 
On the other hand, you might use XML Query (XQuery) instead of XPath. XQuery makes it possible to query XML documents much as you would an actual relational database. It has its own syntax that some claim is easier than XPath. I liken it to the comparison between programming against a mainframe IMS database and programming against a mainframe DB2 database. With IMS, you had a hierarchical approach to querying for data. However, with DB2 you had a more direct, dynamic approach due to the relational database structure. Given this comparison, my bet is on the direct, dynamic approach.
| Cross-Reference | For more information about XQuery, go to http://www.w3.org/XML/Query . | 
It has been said that a picture is worth a thousand words. I have provided a few figures to help further explain the XML technologies. First, as shown in Figure 4-7, an XML document is displayed as-is using the Notepad editor. Following that figure, you will notice in Figure 4-8 that I have prepared the XSLT that will be used to transform the XML. It is shown here using the Notepad editor. Lastly, in Figure 4-9, the transformed HTML is shown in the Web browser. I trust that Figures 4-7 through 4-9 will help introduce the XSL technologies.
 
   
   
  To further appreciate what XSL has to offer, imagine that you have created a very informative Web site with your best HTML skills [18] (using lots of the HTML <p></p> tags for your data). Now, let's say that you want to create two versions of your HTML (static) Web site. Each Web site will display mostly the same content, but the content will be formatted very differently on each site. The format will be so different, in fact, that even the use of CSS is not a practical choice. Therefore, you go into development mode.
First, you edit the old HTML file to copy and paste your content (data). Next, you create the second page and paste in your content. You deploy both HTML pages to their respective Web sites. Great ”mission accomplished. Congratulations, you have just created a maintenance problem. When the time comes to update your "static" data, you will now need to maintain both versions.
Let's say that you later learn about XML and the XSL technologies. You open one of your HTML files and copy the data (again), except this time you paste your data into a new XML document, making the appropriate changes to meet the well-formed XML requirement. Next, you create separate XSLT for each Web site as per each Web site's formatting requirement. Then, you copy the respective XSLT to each Web site. Each distinct XSLT will read the same XML data and output uniquely formatted HTML for a customized Web site display. Now, when it is time to maintain the data, you edit your one XML document and affect the multiple Web site implementations . Later, you can read the same XML document for other Web sites (or other devices), all the while not disturbing the original two Web sites. This is just one example of the value that XSL can bring to XML. The list of benefits is not short.
Recall that the DOM is also used for HTML. When you want to dynamically access and/or update the structure, content, and style of XML (or HTML) documents, the DOM is available for you to exploit. Because the DOM is a W3C Specification, it is appropriate for multibrowser use. A similar technology is Simple API for XML (SAX). SAX is also multibrowser-friendly. However, SAX is not a W3C Specification (yet).
Both SAX and DOM have specific advantages and disadvantages. Depending on your application and design, you may want to choose one or the other. By the way, when you are done reading up on SAX and the DOM, remember to look up Schema Object Model (SOM), a complementary technology for the DOM.
| Cross-Reference | For more information about SAX, the DOM, and SOM, go to http://msdn.microsoft.com/library/en-us/xmlsdk/htm/ sax_concepts_8kaa.asp, http://msdn.microsoft.com/library/en-us/ xmlsdk/htm/sax_concepts_0v1p.asp, and http://msdn.microsoft.com/ library/en-us/xmlsdk/htm/som_devguide_overview_73g7.asp, respectively. | 
MSXML is the name of Microsoft's XML parser. This is the parser that I mentioned earlier that does all of the XML magic for you in the browser. The actual translation of the acronym MSXML used to be Microsoft XML Parser. Now, the translation is Microsoft XML Core Services. My guess is that they changed the name to more accurately reflect the impressive power and functionality of MSXML.
At the time of this writing, MSXML 4.0 is the latest version of the parser and is available for download from the MSDN site. At some future date, it is likely that MSXML will be included with a version of the IE browser. Today, the default MSXML parser that comes with IE is an older version (version 2.0). However, it is simple to upgrade to one of the newer MSXML parsers (version 2.6 or 3.0) if you have IE 5.5 or later.
| Cross-Reference | For more information about MSXML, go to http://msdn.microsoft.com/library/en-us/dnmsxml/html/dnmsxmlnewinjuly.asp . | 
You should become comfortable with XML because I discuss it further in the following chapters:
Chapter 11 covers accessing data with the new .NET technology ADO.NET. Guess what? When you use ADO.NET to return data from a Web service, it is it XML format. How convenient !
Chapter 12 explores the topics of relational data and accessing it using T-SQL, and introduces the subject of Microsoft SQLXML 3.0 technology. With SQLXML, you are able to query your database and receive data in XML format. Being comfortable with XML will certainly help you leverage the new opportunities that SQLXML offers.
Chapter 13 covers Web services and explores three technologies that Web services depend on: Simple Object Access Protocol (SOAP), Web Services Description Language (WSDL), and Universal Description, Discovery, and Integration (UDDI). Here's the catch: If you haven't established a good comfort level with XML, you'll feel very uncomfortable during the discussion about Web services and its supportive technologies.
Chapter 18 presents examples where you'll need to configure some aspect of your .NET environment ”for example, manually editing a file in your .NET development environment. You can bet that you will be editing an XML file.
As noted in the preceding section, the use of XML is apparent in several portions of the .NET platform. It was this fact that convinced me to include XML in this .NET prerequisite chapter. My most recent example of needing to know XML came as I was installing the beta version of Fujitsu's NetCOBOL for .NET [19] to integrate it into the VS .NET environment. Fujitsu's installation instructions included the following step:
". . .you must incorporate the configuration changes found in the file Examples\Web\web.config into your machine's global configuration file."
The \web.config file contained XML code similar to the following:
<?xml version="1.0" encoding="UTF-8" ?> <configuration> <system.web> <compilation debug="false" explicit="true" defaultLanguage="vb"> <compilers> <compiler language="COBOL;cob" extension=".cob" type="Fujitsu.COBOL.COBOLCodeProvider, Fujitsu.COBOL.CodeDom,version=99.9.9.9, Culture=neutral,PublicKeyToken=999999999999999" /> </compilers> </compilation> </system.web> </configuration>
 Next, I opened up my machine's global configuration file 
 ( \CONFIG\machine.config) to discover that there were a total of 793 lines of XML code. Interesting. So, where should I make the change mentioned in the Fujitsu NetCOBOL for .NET installation document? Well, if you understand XML, the answer is obvious. Do you get my point? Yes, you will need to learn XML. 
Because I felt comfortable editing XML, I proceeded to update my machine's global configuration file. For my first step, I made a complete backup copy of my machine.config file (just to be safe). I then opened the file (I used VS .NET, but I could have used Notepad) to begin the navigation down into the 793 lines of XML. The goal now was to look for the node names mentioned in Fujitsu's XML sample. For example, first I looked for <configuration>, which was on the second line. Then I looked for <system.web>, which was on line 113. Finally, I looked for <compilers>, which was on line 157. Bingo! To finish the process off, I copied and pasted the <compiler . /> XML line below the other similar lines that were there already.
My guess is that this vendor's assumption that the reader is comfortable editing an XML file is a reasonable approach to take. After all, other platforms (such as the mainframe) assume that certain technologies are known. Take JCL, for example. Recall how JCL tended to be rather ubiquitous. It was common for mainframe programmers to use JCL for everything, so it was assumed that every mainframe programmer knew JCL. Yes, JCL was used for "packaging and describing" your program execution and request for resources. XML, on the other hand, is used to package and describe your data. Nevertheless, XML is every bit as omnipresent on the .NET platform as JCL was on the mainframe.
[15] You may know the EDI standard by a different name: The American National Standards Institute (ANSI) and The Accredited Standards Committee version is "X12"; the European version is Guidelines on Trade Data Interchange (GTDI); and The International Organization for Standardization (ISO) and United Nation's version is called Electronic Data Interchange for Administration, Commerce, and Transport (EDIFACT).
[16] Later you will use VS .NET for most of your programming and editing. However, you will still find Notepad useful for some quick and dirty editing, especially in emergency situations when you need an editor and no other editor is installed on the server that you happen to be working on.
[17] XSLT can transform XML into text-based formats other than HTML.
[18] Even when you are creating powerful, dynamic Web pages with ASP.NET, there is still much value in what the XSL technologies offer.
[19] I promise I will spend almost a whole chapter (Chapter 6) talking about Fujitsu's NetCOBOL for .NET product. After all, it is one of the .NET language choices.
