XML is a way of defining structured documents or, in programmer terms, data. Typically, programmers store data in some type of format. Java programmers would probably store their data using Java persistence, which stores the data in a binary file specific to Java. A Java- persisted piece of data is useful to only Java, not to C++ or other languages. Using Structured Query Language (SQL) databases allows a developer to exchange data with programs written in different languages on different platforms. However, SQL databases require you to use specific Application Programming Interfaces (APIs), which adds extra programming steps. Although SQL adds complexity to the program, it is still used because a SQL database can efficiently manage large chunks of data.
If SQL works, why use XML? The answer lies in the current architecture of XML itself. Consider a stereo system. When a guy buys a stereo system, does he ever worry about various pieces not working together? The answer is no. A stereo system is comprised of a receiver and a set of speakers. To connect the receiver to the speakers, the owner uses a cable that has a red connector and a black connector on each end. The owner of the stereo system then connects one end to the receiver and another end to the speakers . And at that point, the owner can listen to music.
This sounds very obvious, but there is a bigger picture. Consider the case where the owner of the stereo system decides that the quality of the sound is not good enough. In that case, he decides to buy an equalizer. To make the equalizer work, he gets another cable with the red and black connectors. This time, however, the owner plugs the stereo-out jacks to the equalizer-in jacks, and the equalizer-out jacks to the speaker-out jacks . And again the owner can listen to music, but with better sound quality. The owner can improve this sound quality even more. By simply adding more cables he can use multiple sound channels and so on. This process allows the user to add or remove cables without buying different kinds of cables.
The sound industry has managed to build an infrastructure that revolves around plugging standard audio cables in and out. In addition, users of the stereo system can improve or alter the sound by directing the audio content into the appropriate devices.
In this analogy, XML is the audio cable in the stereo system and is moved from one device to another. An XML database is the XML storage, which is like a Digital Audio Tape (DAT) deck that is responsible for reading and writing the audio. The programs that manipulate the XML are like the equalizer that reads audio from one channel, modifies the audio, and writes the audio to another channel. XML is considered to be the stream of information that connects devices. And like a stereo system, an XML processor can be added or removed from the chain without affecting other devices. This process starkly contrasts to the way technologies today work.
Our current software industry suffers from the problem of different pieces of software being incompatible with each other because we have created a dependency on the programs that manipulate the data. For example, using today's word processors, can a person edit content from very early versions of a word processor? No. The translators have not been written since the data was stored in a proprietary format. When a piece of software writes data today, the written data becomes a legacy because there is no assurance that the program that created the data will be available tomorrow.
An argument could be made that cassettes and CDs store their data in very different formats, and hence cannot be compared to XML but are more like the early versions of the word processor. While this is true, it is not entirely correct. An XML database can store the data in a binary XML format or standard XML format. The way it is stored depends on the XML database. What is important is that the XML database can be addressed using Internet technologies and XML queries. While an XML front end could be written for the early word processors, such a front end does not exist and would require a de-emphasis on the proprietary nature of the word processing document.
The difference in an XML approach is that the focus lies in taking an XML data structure, transforming it, and sending it to the next device. The XML approach assumes that the data is more important than the application and allows a developer to plug and play with various XML processing devices.
At the simplest level, XML can be learned in a minute or two. Consider the XML in Listing 1.4.
<doc attribute="value"> <key>value</key> </doc>
In XML, the special characters < and > define blocks of XML. The blocks of XML are encapsulated one in another, as if you took smaller boxes and packed them in a larger one. In Listing 1.4, the outermost XML block is doc; nested within is the XML block key . An XML block by definition has an opening XML tag (a sequence of < and > characters) and a closing tag. In an opening XML tag, the < character is followed by a complete word, which in Listing 1.4 is either the doc or key word. A closing XML tag is the < character followed by the / character and a complete word. The complete word must match in nesting terms the opening XML tag. It is not possible to have XML blocks overlap each other as shown in Listing 1.5. However, within an XML block you can embed text like the text value in Listing 1.5.
<doc attribute="value"> <key>value </doc></key>
You use XML attributes to describe the contained XML, just like you would write something on a box to describe the contents. In Listings 1.4 and 1.5 is the XML attribute attribute , which has the value value . Specific XML attributes can occur only once within an opening XML tag. It should also be noted that XML is case sensitive, which means the XML identifiers open and Open are not identical.
This book assumes you have certain knowledge about XML. If you don't know the following XML technologies and want to learn about them, please see another text. Examples include: XML: Introduction, Second Edition (Book & CD), by Michael Hazard (ISBN: 0758048386), or Learning XML , by Erik T. Ray and Christopher R. Maden (ISBN: 0596000464).
eXtensible Stylesheet Language Template(XSLT): This XML programming language is used to transform an XML document into another type of content, which may or may not be XML.
XML Schema: This XML syntax is used to validate an XML document for correct structure.
Simple Object Access Protocol (SOAP): This XML description is used to define protocol communications between a client and server.
XML Document Object Model (DOM): This XML programming model is used to manipulate XML documents.
XML HyperText Markup Language (XHTML): This is XML data that is HTML based but conforms to the XML specification. For example, in previous HTML, the tag < br > was legal; however, XML requires a closing < br > tag.
XLink/XPointer: These XML extensions are used to reference other XML documents or resources from within an XML document.
XPath: This language is used in XSLT and other XML extensions to query and reference specific XML nodes in a document.