1.1 XML

XML is an open-specification, platform-independent, extensible, and increasingly successful profile of the Standard Generalized Markup Language (SGML) [ISO 8879].

The World Wide Web Hypertext Markup Language (HTML) is also a profile of SGML. HTML markup is primarily concerned with the appearance of the material. In addition, HTML can translate some standard types of information into a specific format (appearance) by a particular browser [HTML]. In contrast, XML markup is primarily concerned with the user-extensible structure of the data [XML].

Among the basic goals of XML was to enable the serving, receiving, and processing of general information in a simplified manner on the Web in the same way that has been possible with HTML for display information. XML design goals include ease of implementation and the ability to use SGML tools with XML. The uses of XML go well beyond Web pages, however to not just static documents, but general protocol messages between computer processes and general data storage within computers.

In many of its newly emerging roles, XML needs security. That is, it must be able to authenticate information and/or keep it confidential.


The term "markup" has its origins in the publishing industry. In traditional publishing, markup happens after the writing is complete but before the book goes to typesetting. An editor annotates the text with handwritten instructions for the typesetter. These instructions, which specify the layout, are known as markup. Many contemporary word processing programs insert electronic markup automatically as the user creates the text.

1.1.1 Origins of XML

The World Wide Web Consortium [W3C] established the XML Working Group in 1996. The W3C is the lead organization for the development and maintenance of interoperable specifications for the content of the World Wide Web. Its membership includes representatives from universities, technical organizations, and industry. The consortium is responsible for the development of the XML 1.0 W3C Recommendation [XML] and has change control over XML. The XML working group was initially known as the SGML Editorial Review Board.

The Web originated as a way to publish scientific documents with HTML as its markup language. At that time, HTML had only a few dozen tags. Today, most of the billions of Web pages in existence continue to use HTML, but the Web has grown into a full-fledged interactive medium supporting such diverse applications as e-mail, electronic banking and commerce, sophisticated search engines, streaming video, voice interaction, and multicasting.


Because of its publishing and "markup language" background, XML uses the term "document" very liberally. Do not be surprised to see the term "document" when, in other computer contexts, you might see "message," "object," "PDU" (Protocol Data Unit), or the like.

HTML has grown, along with the expanding Web uses, to well beyond its initial design. The current version of HTML, version 4.1, is a complex language containing approximately 100 tags, and there has been pressure to add more. Additionally, the browser wars between Netscape Navigator and Microsoft Internet Explorer and side skirmishes with other browsers have littered the battlefield with inconsistencies and quirks.

Despite its phenomenal success, HTML has a number of shortcomings:

  • HTML generally assumes that any additional tags will be standard and universally known, at least to browsers supporting that version or feature. No standard mechanism provides for private tag use between consenting parties.

  • It takes many tags to format a page supporting current Web technologies. Downloading and displaying such a page can be time-consuming.

  • Devices such as a Palm Pilot or smart phone are not as powerful as a modern PC. They can generally process standardized, well-formed HTML. Unfortunately, these devices cannot adequately process extended and malformed HTML or HTML with the embedded scripting languages that are in frequent use today. (Because it was a competitive advantage in the browser wars to try to do something reasonable with syntactically incorrect HTML, most browsers have such a "feature.")

  • Adding more universal tags to an already burdened language might have less than satisfactory results. Some applications would benefit from having to know about fewer tags.

You can use XML to express a diverse range of information for example, the contents of a letter or Web page or a generic piece of information such as a remote procedure call. The W3C Extensible Hypertext Markup Language (XHTML) Recommendation [XHTML] is an XML version of HTML. XHTML is a language for content that both conforms to XML and, by following some simple guidelines, operates in HTML 4-conforming user agents. XML addresses some of the shortcomings of HTML that were exposed by HTML's success. XML incorporates many features of HTML and introduces new possibilities. XML, however, does not attempt to replace HTML. The two will coexist for a long time.

1.1.2 XML Goals

The XML Recommendation [XML] specifies the following goals:

  • XML should be usable over the Internet.

  • XML should support a variety of applications for example, authoring tools, filters, and translators.

  • XML and SGML should be compatible. For example, SGML tools should be able to read and write XML data.

  • Programs to process XML documents should be easy to write. Specifically, the committee wanted to make it possible for developers to create useful XML programs that did not depend on reading a Document Type Definition (DTD). (A DTD is an SGML standard way of profiling XML by syntax specification, which we discuss extensively in Chapter 4.)

  • For compatibility between XML documents, optional features in XML are to be kept to a minimum, ideally zero.

  • An XML document should be human-legible and clear. For example, an XML document should have a textual format rather than a binary format.

  • The XML design should be ready quickly to provide open, nonproprietary, textual data formats to meet the Web's obvious need for extensibility [XML A].

  • The design of XML should be formal and concise.

  • XML documents should be easy to create.

  • Terseness in XML markup is of minimal importance.


What is missing from this list? These goals never mention security! Security is best and most simply done when it is part of the original design, not when it is a later add-on. Thus it is not surprising that we will find difficulties and complexities in securing XML, particularly in the area of canonicalization (see Chapter 9).

Many of these goals have been achieved, to a greater or less degree. Some would say XML was particularly successful in not achieving terseness.

1.1.3 Advantages and Disadvantages of XML

XML, like most languages, has both advantages and disadvantages.

XML allows the creation of individual markup tags that you can tailor to describe the specific structure of one or more documents. There is no need to rely on a generic set of tags or to wait for standards organizations to adapttags appropriate to specific applications, as with HTML. With XML, the needs of a specific industry can be met without imposing multiple industry-specific tags on everyone's browsers. In this area, XML offers greater flexibility than ASN.1 DER, for example, which typically requires precise and complete predefinition of a binary format. ([ASN.1] and [DER] provide a standard syntax and binary encoding intended for specifying protocols.)

XML tags can represent formatting rules, as is true with HTML. In addition, XML tags can represent data descriptions, data relationships, or even business rules.

Browser-oriented XML abstracts the presentation rules into separate documents, whereas HTML embeds much of the appearance within the data. For example, you might present information as a list or as a table. A decision to change the display later requires recoding documents in HTML; in XML, however, you could accomplish this goal by creating a different extensible style sheet. ASN.1 provides no such style sheet support. XML can carry arbitrary information, including structural and display information.

XML tags can be chosen to be self-describing, which makes messages and documents more readable. This technique can be used in only the most limited way with HTML and not at all in binary-encoded ASN.1. The text basis for XML and HTML also means that, while customized tools work better, you can manipulate these languages with any general text tools. By comparison, binary formats such as ASN.1 are painful to deal with and require special tools.

The major disadvantages of XML relate to its relative lack of automated processing libraries and its verbosity. However, code libraries and tools continue to improve in quality and availability and, for many applications, terseness is not important.

See Table 1-1 for a comparison of XML, HTML, and ASN.1.

Table 1-1. XML, HTML, and ASN.1
Property XML HTML ASN.1
Flexibility High Low Medium
Human Readability High Medium Low
Verbosity High Medium-High Low
Presentation Support High Medium Low

1.1.4 Uses of XML

XML is spreading rapidly. Some of its uses now include the following:

  • Electronic commerce and banking, including business-to-consumer and business-to-business dealings

  • Creation of new languages for example, Voice Extensible Markup Language [VXML]

  • Communications with handheld devices and smart phones

  • Sharing and storing of data and information exchange between incompatible systems

  • Inter-and intra-organization exchange of information

  • Integration of data from multiple sources in a single display and rearrangement of data on the fly

  • News or press releases where the content is available to multiple Web sites

  • Development of scientific applications and profession-specific markup languages, such as music notation, chemical symbols and formulae, and mathematical notation

  • Loading and unloading of databases

  • Maintenance of large Web sites so that XML tools can be used to convert the data to the most appropriate format for the client, including formats adapted to disabled users

  • Preservation of data in a human readable format

  • Court filings [Georgia, New Mexico]

  • Electronic books

Secure XML(c) The New Syntax for Signatures and Encryption
Secure XML: The New Syntax for Signatures and Encryption
ISBN: 0201756056
EAN: 2147483647
Year: 2005
Pages: 186

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net