Example 1-1 shows a simple XML document. This particular XML document might be seen in an inventory-control system or a stock database. It marks up the data with tags and attributes describing the color , size , bar-code number, manufacturer, name of the product, and so on.
Example 1-1. An XML document
<?xml version="1.0"?> <product barcode="2394287410"> <manufacturer>Verbatim</manufacturer> <name>DataLife MF 2HD</name> <quantity>10</quantity> <size>3.5"</size> <color>black</color> <description>floppy disks</description> </product>
This document is text and can be stored in a text file. You can edit this file with any standard text editor such as BBEdit, jEdit, UltraEdit, Emacs, or vi. You do not need a special XML editor. Indeed, we find most general-purpose XML editors to be far more trouble than they're worth and much harder to use than simply editing documents in a text editor.
Programs that actually try to understand the contents of the XML documentthat is, do more than merely treat it as any other text filewill use an XML parser to read the document. The parser is responsible for dividing the document into individual elements, attributes, and other pieces. It passes the contents of the XML document to an application piece by piece. If at any point the parser detects a violation of the well- formedness rules of XML, then it reports the error to the application and stops parsing. In some cases, the parser may read further in the document, past the original error, so that it can detect and report other errors that occur later in the document. However, once it has detected the first well-formedness error, it will no longer pass along the contents of the elements and attributes it encounters.
Individual XML applications normally dictate more precise rules about exactly which elements and attributes are allowed where. For instance, you wouldn't expect to find a G_Clef element when reading a biology document. Some of these rules can be precisely specified with a schema written in any of several languages, including the W3C XML Schema Language, RELAX NG, and DTDs. A document may contain a URL indicating where the schema can be found. Some XML parsers will notice this and compare the document to its schema as they read it to see if the document satisfies the constraints specified there. Such a parser is called a validating parser . A violation of those constraints is called a validity error , and the whole process of checking a document against a schema is called validation . If a validating parser finds a validity error, it will report it to the application on whose behalf it's parsing the document. This application can then decide whether it wishes to continue parsing the document. However, validity errors are not necessarily fatal (unlike well-formedness errors), and an application may choose to ignore them. Not all parsers are validating parsers. Some merely check for well-formedness.
The application that receives data from the parser may be:
A web browser, such as Netscape Navigator or Internet Explorer, that displays the document to a reader
A word processor, such as StarOffice Writer, that loads the XML document for editing
A database, such as Microsoft SQL Server, that stores the XML data in a new record
A drawing program, such as Adobe Illustrator, that interprets the XML as two-dimensional coordinates for the contents of a picture
A spreadsheet, such as Gnumeric, that parses the XML to find numbers and functions used in a calculation
A personal finance program, such as Microsoft Money, that sees the XML as a bank statement
A syndication program that reads the XML document and extracts the headlines for today's news
A program that you yourself wrote in Java, C, Python, or some other language that does exactly what you want it to do
Almost anything else
XML is an extremely flexible format for data. It is used for all of this and a lot more. These are real examples. In theory, any data that can be stored in a computer can be stored in XML. In practice, XML is suitable for storing and exchanging any data that can plausibly be encoded as text. It's only really unsuitable for digitized data such as photographs, recorded sound, video, and other very large bit sequences.