|
You have seen several examples (for instance, in Chapters 4 and 10) of the use of property files to describe the configuration of a program. A property file contains a set of name/value pairs, such as fontname=Times Roman fontsize=12 windowsize=400 200 color=0 50 100 You can use the Properties class to read in such a file with a single method call. That's a nice feature, but it doesn't really go far enough. In many cases, the information that you want to describe has more structure than the property file format can comfortably handle. Consider the fontname/fontsize entries in the example. It would be more object oriented to have a single entry: font=Times Roman 12 But then parsing the font description gets uglyyou have to figure out when the font name ends and when the font size starts. Property files have a single flat hierarchy. You can often see programmers work around that limitation with key names such as title.fontname=Helvetica title.fontsize=36 body.fontname=Times Roman body.fontsize=12 Another shortcoming of the property file format is caused by the requirement that keys be unique. To store a sequence of values, you need another workaround, such as menu.item.1=Times Roman menu.item.2=Helvetica menu.item.3=Goudy Old Style The XML format solves these problems because it can express hierarchical structures and thus is more flexible than the flat table structure of a property file. An XML file for describing a program configuration might look like this: <configuration> <title> <font> <name>Helvetica</name> <size>36</size> </font> </title> <body> <font> <name>Times Roman</name> <size>12</size> </font> </body> <window> <width>400</width> <height>200</height> </window> <color> <red>0</red> <green>50</green> <blue>100</blue> </color> <menu> <item>Times Roman</item> <item>Helvetica</item> <item>Goudy Old Style</item> </menu> </configuration> The XML format allows you to express the structure hierarchy and repeated elements without contortions. As you can see, the format of an XML file is straightforward. It looks similar to an HTML file. There is a good reasonboth the XML and HTML formats are descendants of the venerable Standard Generalized Markup Language (SGML). SGML has been around since the 1970s for describing the structure of complex documents. It has been used with good success in some industries that require ongoing maintenance of massive documentation, in particular, the aircraft industry. However, SGML is quite complex, so it has never caught on in a big way. Much of that complexity arises because SGML has two conflicting goals. SGML wants to make sure that documents are formed according to the rules for their document type, but it also wants to make data entry easy by allowing shortcuts that reduce typing. XML was designed as a simplified version of SGML for use on the Internet. As is often true, simpler is better, and XML has enjoyed the immediate and enthusiastic reception that has eluded SGML for so long. NOTE
Even though XML and HTML have common roots, there are important differences between the two.
NOTE
The Structure of an XML DocumentAn XML document should start with a header such as <?xml version="1.0"?> or <?xml version="1.0" encoding="UTF-8"?> Strictly speaking, a header is optional, but it is highly recommended. NOTE
The header can be followed by a document type definition, such as <!DOCTYPE web-app PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.2//EN" "http://java.sun.com/j2ee/dtds/web-app_2_2.dtd"> Document type definitions are an important mechanism to ensure the correctness of a document, but they are not required. We discuss them later in this chapter. Finally, the body of the XML document contains the root element, which can contain other elements. For example, <?xml version="1.0"?> <!DOCTYPE configuration . . .> <configuration> <title> <font> <name>Helvetica</name> <size>36</size> </font> </title> . . . </configuration> An element can contain child elements, text, or both. In the example above, the font element has two child elements, name and size. The name element contains the text "Helvetica". TIP
XML elements can contain attributes, such as <size unit="pt">36</size> There is some disagreement among XML designers about when to use elements and when to use attributes. For example, it would seem easier to describe a font as <font name="Helvetica" size="36"/> than <font> <name>Helvetica</name> <size>36</size> </font> However, attributes are much less flexible. Suppose you want to add units to the size value. If you use attributes, then you must add the unit to the attribute value: <font name="Helvetica" size="36 pt"/> Ugh! Now you have to parse the string "36 pt", just the kind of hassle that XML was designed to avoid. Adding an attribute to the size element is much cleaner: <font> <name>Helvetica</name> <size unit="pt">36</size> </font> A commonly used rule of thumb is that attributes should be used only to modify the interpretation of a value, not to specify values. If you find yourself engaged in metaphysical discussions about whether a particular setting is a modification of the interpretation of a value or not, then just say "no" to attributes and use elements throughout. Many useful DTDs don't use attributes at all. NOTE
Elements and text are the "bread and butter" of XML documents. Here are a few other markup instructions that you may encounter:
|
|