XML is an open-specification, platform-independent, extensible, and increasingly successful profile of the Standard Generalized Markup Language (SGML) [ISO 8879]. The World Wide Web Hypertext Markup Language (HTML) is also a profile of SGML. HTML markup is primarily concerned with the appearance of the material. In addition, HTML can translate some standard types of information into a specific format (appearance) by a particular browser [HTML]. In contrast, XML markup is primarily concerned with the user-extensible structure of the data [XML]. Among the basic goals of XML was to enable the serving, receiving, and processing of general information in a simplified manner on the Web in the same way that has been possible with HTML for display information. XML design goals include ease of implementation and the ability to use SGML tools with XML. The uses of XML go well beyond Web pages, however to not just static documents, but general protocol messages between computer processes and general data storage within computers. In many of its newly emerging roles, XML needs security. That is, it must be able to authenticate information and/or keep it confidential.
1.1.1 Origins of XMLThe World Wide Web Consortium [W3C] established the XML Working Group in 1996. The W3C is the lead organization for the development and maintenance of interoperable specifications for the content of the World Wide Web. Its membership includes representatives from universities, technical organizations, and industry. The consortium is responsible for the development of the XML 1.0 W3C Recommendation [XML] and has change control over XML. The XML working group was initially known as the SGML Editorial Review Board. The Web originated as a way to publish scientific documents with HTML as its markup language. At that time, HTML had only a few dozen tags. Today, most of the billions of Web pages in existence continue to use HTML, but the Web has grown into a full-fledged interactive medium supporting such diverse applications as e-mail, electronic banking and commerce, sophisticated search engines, streaming video, voice interaction, and multicasting.
HTML has grown, along with the expanding Web uses, to well beyond its initial design. The current version of HTML, version 4.1, is a complex language containing approximately 100 tags, and there has been pressure to add more. Additionally, the browser wars between Netscape Navigator and Microsoft Internet Explorer and side skirmishes with other browsers have littered the battlefield with inconsistencies and quirks. Despite its phenomenal success, HTML has a number of shortcomings:
You can use XML to express a diverse range of information for example, the contents of a letter or Web page or a generic piece of information such as a remote procedure call. The W3C Extensible Hypertext Markup Language (XHTML) Recommendation [XHTML] is an XML version of HTML. XHTML is a language for content that both conforms to XML and, by following some simple guidelines, operates in HTML 4-conforming user agents. XML addresses some of the shortcomings of HTML that were exposed by HTML's success. XML incorporates many features of HTML and introduces new possibilities. XML, however, does not attempt to replace HTML. The two will coexist for a long time. 1.1.2 XML GoalsThe XML Recommendation [XML] specifies the following goals:
Many of these goals have been achieved, to a greater or less degree. Some would say XML was particularly successful in not achieving terseness. 1.1.3 Advantages and Disadvantages of XMLXML, like most languages, has both advantages and disadvantages. XML allows the creation of individual markup tags that you can tailor to describe the specific structure of one or more documents. There is no need to rely on a generic set of tags or to wait for standards organizations to adapttags appropriate to specific applications, as with HTML. With XML, the needs of a specific industry can be met without imposing multiple industry-specific tags on everyone's browsers. In this area, XML offers greater flexibility than ASN.1 DER, for example, which typically requires precise and complete predefinition of a binary format. ([ASN.1] and [DER] provide a standard syntax and binary encoding intended for specifying protocols.) XML tags can represent formatting rules, as is true with HTML. In addition, XML tags can represent data descriptions, data relationships, or even business rules. Browser-oriented XML abstracts the presentation rules into separate documents, whereas HTML embeds much of the appearance within the data. For example, you might present information as a list or as a table. A decision to change the display later requires recoding documents in HTML; in XML, however, you could accomplish this goal by creating a different extensible style sheet. ASN.1 provides no such style sheet support. XML can carry arbitrary information, including structural and display information. XML tags can be chosen to be self-describing, which makes messages and documents more readable. This technique can be used in only the most limited way with HTML and not at all in binary-encoded ASN.1. The text basis for XML and HTML also means that, while customized tools work better, you can manipulate these languages with any general text tools. By comparison, binary formats such as ASN.1 are painful to deal with and require special tools. The major disadvantages of XML relate to its relative lack of automated processing libraries and its verbosity. However, code libraries and tools continue to improve in quality and availability and, for many applications, terseness is not important. See Table 1-1 for a comparison of XML, HTML, and ASN.1.
1.1.4 Uses of XMLXML is spreading rapidly. Some of its uses now include the following:
|