We Need XML

[Previous] [Next]

With the SGML committee's reluctance to modernize the standard, a group of Internet professionals approached the W3C (a relative newcomer to the standard-setting world) and proposed a slimmed-down version of SGML that would achieve the goals of being usable on the Web while remaining compatible with HTML and SGML. The group worked mostly via e-mail through 1996 and 1997 and came up with a specification they dubbed the Xtensible Markup Language, or XML. The group designed XML with 10 goals in mind:

  1. XML shall be straightforwardly usable over the Internet.
  2. XML shall support a wide variety of applications.
  3. XML shall be compatible with SGML.
  4. It shall be easy to write programs which process XML documents.
  5. The number of optional features in XML is to be kept to the absolute minimum, ideally zero.
  6. XML documents should be human-legible and reasonably clear.
  7. The XML design should be prepared quickly.
  8. The design of XML shall be formal and concise.
  9. XML documents shall be easy to create.
  10. Terseness in XML markup is of minimal importance.

XML is a syntax that allows users to create markup languages. Languages that are used to create markup languages are commonly known as meta markup languages.

XML is a technical recommendation from the W3C. XML is owned by the W3C, not by any vendor, which means users aren't locked into any single platform or processing language.

XML is easy to learn and use. It is small, terse, and optimized for use on the Internet. As with SGML, XML files are sequential text files that can easily pass through firewalls and be sent over existing networks.

XML is also free. The W3C maintains a trademark on the term "XML" but provides the specification for free on their Web site at http://www.w3c.org. XML is just a syntax defined by the W3C specification—it doesn't actually do anything.

XML Myths

I have heard some claims about XML over the years. To explain what XML is, I find it helpful to point out what it is not—that is, to point out some commonly held misconceptions.

Myth: XML Is a Markup Language

XML is not actually a markup language, even though "markup" is part of its name. XML is a standard that specifies a syntax that allows you to create your own markup language. The markup language you create will depend on the task you are trying to accomplish.

Imagine a block of steel sitting in your driveway. It doesn't do anything sitting there in its current form. However, you can apply a process to steel and create a car. The application of these processes is unique to creating each different type of car.

You can apply a different set of processes to the same block of steel and create a bicycle. You used the same technology (steel), but a different combination of processes to achieve an alternative result. Now you have two ways to get to the store. (You can also take that same block of steel and create something else entirely—a toaster. You can't get to the store on a toaster, but you can make them fly through your screen saver.)

XML is like that block of steel. XML is the enabling technology, but it doesn't do anything by itself. HTML is a bicycle. The airline maintenance manual markup language is a car. The training course markup language is a toaster. All are applications of XML technology—just as the car, bicycle, and toaster are applications of steel technology.

Myth: XML Is Only for the Web

XML was conceived and developed as a syntax for delivering content more effectively on the Web than HTML or SGML could. In the early stages of XML development, many people thought that companies such as weapons contractors and airline manufacturers would continue to use SGML to manage their complex document assets and just translate the documents to XML to deliver the documents on the Web.

In fact, XML coeditor Tim Bray described XML as an "on-ramp" that people can use to enter the SGML highway. XML is a less threatening route than full-blown SGML and could therefore get people hooked on the joys of descriptive, hierarchical data markup. From XML, the jump to the rigors of SGML would be tolerable.

To create XML, the developers started with SGML and stripped out all the optional features. Then they made some hard decisions concerning backward compatibility with SGML so that XML users could use the tools that already existed to process SGML. And XML did start out as a proper subset of SGML. Since then, however, XML has added extensions such as namespaces and schemas that make it incompatible with SGML.

Also, since the adoption of XML, companies who use or were considering SGML found that XML does almost everything that SGML does. Most SGML users weren't using many of the optional features in SGML anyway, so the fact that those features weren't available in XML wasn't a real problem. Applications that would have been ideal for SGML are now being implemented with XML instead.

Myth: HTML Is a Subset of XML

Is a car a subset of steel? No—it is an application of steel. Similarly, HTML is not a subset of XML, but it can be a language expressed in XML syntax if it follows a set of rules called well-formedness constraints.

A couple of years ago, someone sent me an e-mail asking me to send him a list of all the XML tags. This was impossible, of course, because XML does not define a language. It defines a syntax that allows you to create your own markup language, which can describe whatever information set you want. HTML is just one of those information sets.

Myth: XML Stands for "Excellent Marketing Language"

In September 1997, Microsoft Chairman Bill Gates called XML a "breakthrough technology." The press has been all over XML ever since.

XML generates a lot of interest in the press, and putting "XML" in your press release tends to get an unexceptional release more attention than it would get otherwise. I once saw a press release excitedly announcing the release of the "Question and Answer Markup Language, QAML." It had two elements: question and answer.

Another misconception about XML is that it is a government plot to tax us more. People point out that, in Roman numerals, XML equals 1040, the despised form U.S. citizens use to report to the IRS how much of their money they need to send to the government. This Roman numeral syntax isn't quite right, but hey, I'm just reporting what I heard. I've also heard that when you ask the department of motor vehicles for the XML license plate and tell them it stands for Xtensible Markup Language, they'll send you a plate that says EML because they think you must have made a mistake.

And to clear up one last misconception: XML is not a t-shirt size.



XML and SOAP Programming for BizTalk Servers
XML and SOAP Programming for BizTalk(TM) Servers (DV-MPS Programming)
ISBN: 0735611262
EAN: 2147483647
Year: 2000
Pages: 150

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net