XML in a Nutshell, 2nd Edition

  Reader Reviews
XML in a Nutshell, 2nd Edition
By Elliotte Rusty Harold, W. Scott Means
Publisher : O'Reilly
Pub Date : June 2002
ISBN : 0-596-00292-0
Pages : 634  

This powerful new edition provides developers with a comprehensive guide to the rapidly evolving XML space. Serious users of XML will find topics on just about everything they need, from fundamental syntax rules, to details of DTD and XML Schema creation, to XSLT transformations, to APIs used for processing XML documents. Simply put, this is the only reference of its kind among XML books.





What This Book Covers

What's New in the Second Edition

Organization of the Book

Conventions Used in This Book

Request for Comments


Part I: XML Concepts

Chapter 1. Introducing XML

1.1 The Benefits of XML

1.2 Portable Data

1.3 How XML Works

1.4 The Evolution of XML

Chapter 2. XML Fundamentals

2.1 XML Documents and XML Files

2.2 Elements, Tags, and Character Data

2.3 Attributes

2.4 XML Names

2.5 Entity References

2.6 CDATA Sections


2.8 Processing Instructions

2.9 The XML Declaration

2.10 Checking Documents for Well-Formedness

Chapter 3. Document Type Definitions (DTDs)

3.1 Validation

3.2 Element Declarations

3.3 Attribute Declarations

3.4 General Entity Declarations

3.5 External Parsed General Entities

3.6 External Unparsed Entities and Notations

3.7 Parameter Entities

3.8 Conditional Inclusion

3.9 Two DTD Examples

3.10 Locating Standard DTDs

Chapter 4. Namespaces

4.1 The Need for Namespaces

4.2 Namespace Syntax

4.3 How Parsers Handle Namespaces

4.4 Namespaces and DTDs

Chapter 5. Internationalization

5.1 Character-Set Metadata

5.2 The Encoding Declaration

5.3 Text Declarations

5.4 XML-Defined Character Sets

5.5 Unicode

5.6 ISO Character Sets

5.7 Platform-Dependent Character Sets

5.8 Converting Between Character Sets

5.9 The Default Character Set for XML Documents

5.10 Character References

5.11 xml:lang

Part II: Narrative-Centric Documents

Chapter 6. XML as a Document Format

6.1 SGML's Legacy

6.2 Narrative Document Structures

6.3 TEI

6.4 DocBook

6.5 Document Permanence

6.6 Transformation and Presentation

Chapter 7. XML on the Web


7.2 Direct Display of XML in Browsers

7.3 Authoring Compound Documents with Modular XHTML

7.4 Prospects for Improved Web-Search Methods

Chapter 8. XSL Transformations (XSLT)

8.1 An Example Input Document

8.2 xsl:stylesheet and xsl:transform

8.3 Stylesheet Processors

8.4 Templates and Template Rules

8.5 Calculating the Value of an Element with xsl:value-of

8.6 Applying Templates with xsl:apply-templates

8.7 The Built-in Template Rules

8.8 Modes

8.9 Attribute Value Templates

8.10 XSLT and Namespaces

8.11 Other XSLT Elements

Chapter 9. XPath

9.1 The Tree Structure of an XML Document

9.2 Location Paths

9.3 Compound Location Paths

9.4 Predicates

9.5 Unabbreviated Location Paths

9.6 General XPath Expressions

9.7 XPath Functions

Chapter 10. XLinks

10.1 Simple Links

10.2 Link Behavior

10.3 Link Semantics

10.4 Extended Links

10.5 Linkbases

10.6 DTDs for XLinks

Chapter 11. XPointers

11.1 XPointers on URLs

11.2 XPointers in Links

11.3 Bare Names

11.4 Child Sequences

11.5 Namespaces

11.6 Points

11.7 Ranges

Chapter 12. Cascading Style Sheets (CSS)

12.1 The Three Levels of CSS

12.2 CSS Syntax

12.3 Associating Stylesheets with XML Documents

12.4 Selectors

12.5 The Display Property

12.6 Pixels, Points, Picas, and Other Units of Length

12.7 Font Properties

12.8 Text Properties

12.9 Colors

Chapter 13. XSL Formatting Objects (XSL-FO)

13.1 XSL Formatting Objects

13.2 The Structure of an XSL-FO Document

13.3 Laying Out the Master Pages

13.4 XSL-FO Properties

13.5 Choosing Between CSS and XSL-FO

Chapter 14. Resource Directory Description Language (RDDL)

14.1 What's at the End of a Namespace URL?

14.2 RDDL Syntax

14.3 Natures

14.4 Purposes

Part III: Data-Centric XML

Chapter 15. XML as a Data Format

15.1 Why Use XML for Data?

15.2 Developing Data-Oriented XML Formats

15.3 Sharing Your XML format

Chapter 16. XML Schemas

16.1 Overview

16.2 Schema Basics

16.3 Working with Namespaces

16.4 Complex Types

16.5 Empty Elements

16.6 Simple Content

16.7 Mixed Content

16.8 Allowing Any Content

16.9 Controlling Type Derivation

Chapter 17. Programming Models

17.1 Common XML Processing Models

17.2 Common XML Processing Issues

Chapter 18. Document Object Model (DOM)

18.1 DOM Foundations

18.2 Structure of the DOM Core

18.3 Node and Other Generic Interfaces

18.4 Specific Node-Type Interfaces

18.5 The DOMImplementation Interface

18.6 Parsing a Document with DOM

18.7 A Simple DOM Application

Chapter 19. Simple API for XML (SAX)

19.1 The ContentHandler Interface

19.2 SAX Features and Properties

19.3 Filters

Part IV: Reference

Chapter 20. XML 1.0 Reference

20.1 How to Use This Reference

20.2 Annotated Sample Documents

20.3 XML Syntax

20.4 Constraints

20.5 XML Document Grammar

Chapter 21. Schemas Reference

21.1 The Schema Namespaces

21.2 Schema Elements

21.3 Primitive Types

21.4 Instance Document Attributes

Chapter 22. XPath Reference

22.1 The XPath Data Model

22.2 Data Types

22.3 Location Paths

22.4 Predicates

22.5 XPath Functions

Chapter 23. XSLT Reference

23.1 The XSLT Namespace

23.2 XSLT Elements

23.3 XSLT Functions

23.4 TrAX

Chapter 24. DOM Reference

24.1 Object Hierarchy

24.2 Object Reference

Chapter 25. SAX Reference

25.1 The org.xml.sax Package

25.2 The org.xml.sax.helpers Package

25.3 SAX Features and Properties

25.4 The org.xml.sax.ext Package

Chapter 26. Character Sets

26.1 Character Tables

26.2 HTML4 Entity Sets

26.3 Other Unicode Blocks




Copyright 2002, 2001 O'Reilly & Associates, Inc. All rights reserved.

Printed in the United States of America.

Published by O'Reilly & Associates, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.

O'Reilly & Associates books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles ( ). For more information contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com.

Nutshell Handbook, the Nutshell Handbook logo, and the O'Reilly logo are registered trademarks of O'Reilly & Associates, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O'Reilly & Associates, Inc. was aware of a trademark claim, the designations have been printed in caps or initial caps. The association between the image of a peafowl and the topic of XML is a trademark of O'Reilly & Associates, Inc. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc., in the United States and other countries. O'Reilly & Associates, Inc. is independent of Sun Microsystems.

While every precaution has been taken in the preparation of this book, the publisher and the author assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein.


XML is one of the most important developments in document syntax in the history of computing. In the last few years it has been adopted in fields as diverse as law, aeronautics, finance, insurance, robotics, multimedia, hospitality, travel, art, construction, telecommunications, software, agriculture, physics, journalism, theology, retail, and comics. XML has become the syntax of choice for newly designed document formats across almost all computer applications. It's used on Linux, Windows, Macintosh, and many other computer platforms. Mainframes on Wall Street trade stocks with one another by exchanging XML documents. Children playing games on their home PCs save their documents in XML. Sports fans receive real-time game scores on their cell phones in XML. XML is simply the most robust, reliable, and flexible document syntax ever invented.

XML in a Nutshell is a comprehensive guide to the rapidly growing world of XML. It covers all aspects of XML, from the most basic syntax rules, to the details of DTD and schema creation, to the APIs you can use to read and write XML documents in a variety of programming languages.

What This Book Covers

There are hundreds of formally established XML applications from the W3C and other standards bodies, such as OASIS and the Object Management Group. There are even more informal, unstandardized applications from individuals and corporations, such as Microsoft's Channel Definition Format and John Guajardo's Mind Reading Markup Language. This book cannot cover them all, any more than a book on Java could discuss every program that has ever been or might ever be written in Java. This book focuses primarily on XML itself. It covers the fundamental rules that all XML documents and authors must adhere to, whether a web designer uses SMIL to add animations to web pages or a C++ programmer uses SOAP to exchange serialized objects with a remote database.

This book also covers generic supporting technologies that have been layered on top of XML and are used across a wide range of XML applications. These technologies include:


An attribute-based syntax for hyperlinks between XML and non-XML documents that provide the simple, one-directional links familiar from HTML, multidirectional links between many documents, and links between documents to which you don't have write access.


An XML application that describes transformations from one document to another, in either the same or different XML vocabularies.


A syntax for URI fragment identifiers that selects particular parts of the XML document referred to by the URI often used in conjunction with an XLink.


A non-XML syntax used by both XPointer and XSLT for identifying particular pieces of XML documents. For example, an XPath can locate the third address element in the document, or all elements with an email attribute whose value is elharo@metalab.unc.edu.


A means of distinguishing between elements and attributes from different XML vocabularies that have the same name; for instance, the title of a book and the title of a web page in a web page about books.


An XML vocabulary for describing the permissible contents of XML documents from other XML vocabularies.


The Simple API for XML, an event-based application programming interface implemented by many XML parsers.


The Document Object Model, a language-neutral tree-oriented API that treats an XML document as a set of nested objects with various properties.


An XMLized version of HTML that can be extended with other XML applications such as MathML and SVG.


The Resource Directory Description Language, an XML application based on XHTML for documents placed at the end of namespace URLs.

All these technologies, whether defined in XML (XLinks, XSLT, Namespaces, Schemas, XHTML, and RDDL) or in another syntax (XPointers, XPath, SAX, and DOM), are used in many different XML applications.

This book does not specifically cover XML applications that are relevant to only some users of XML, such as:


Scalable Vector Graphics, a W3C-endorsed standard XML encoding of line art.


The Mathematical Markup Language, a W3C-endorsed standard XML application used for embedding equations in web pages and other documents.


The Resource Description Framework, a W3C-standard XML application used for describing resources, with a particular focus on the sort of metadata one might find in a library card catalog.

Occasionally we use one or more of these applications in an example, but we do not cover all aspects of the relevant vocabulary in depth. While interesting and important, these applications (and hundreds more like them) are intended primarily for use with special software that knows their format intimately. For instance, most graphic designers do not work directly with SVG. Instead, they use their customary tools, such as Adobe Illustrator, to create SVG documents. They may not even know they're using XML.

This book focuses on standards that are relevant to almost all developers working with XML. We investigate XML technologies that span a wide range of XML applications, not those that are relevant only within a few restricted domains.

What's New in the Second Edition

XML has hardly stood still in the 18 months since the first edition of XML in a Nutshell was published. To answer the most frequent request from readers of the first edition, there are now two new chapters covering schemas. Furthermore, other chapters throughout the book have been rewritten to reflect the impact of schemas on their subject matter. We added several other new topics as well, including the RDDL, the Transformations API for XML (TrAX), the Java API for XML Processing (JAXP), and SAX filters.

In addition, the treatment of many topics has been upgraded to the latest versions of various specifications, including:

  • XSL Formatting Objects 1.0

  • XLink 1.0

  • XPointer 2nd Candidate Recommendation

  • XHTML 1.1

  • Unicode 3.1.1

Finally, many small errors and omissions were corrected throughout the book.

Organization of the Book

Part I, introduces you to the fundamental standards that form the essential core of XML to which all XML applications and software must adhere. It teaches you about well-formed XML, DTDs, namespaces, and Unicode as quickly as possible.

Part II, explores technologies that are used mostly for narrative XML documents, such as web pages, books, articles, diaries, and plays. You'll learn about XSLT, CSS, XSL-FO, XLinks, XPointers, XPath, and RDDL.

One of the most unexpected developments in XML was its enthusiastic adoption for data-heavy structured documents such as spreadsheets, financial statistics, mathematical tables, and software file formats. Part III, explores the use of XML for such record-like documents. This part focuses on the tools and APIs needed to write software that processes XML, including SAX, DOM, and schemas.

Finally, Part IV, is a series of quick-reference chapters that form the core of any Nutshell Handbook. These chapters give you detailed syntax rules for the core XML technologies, including XML, DTDs, schemas, XPath, XSLT, SAX, and DOM. Turn to this section when you need to find out the precise syntax quickly for something you know you can do but don't remember exactly how to do.

Conventions Used in This Book

Constant width is used for:

  • Code examples and fragments.

  • Anything that might appear in an XML document, including element names, tags, attribute values, entity references, and processing instructions.

  • Anything that might appear in a program, including keywords, operators, method names, class names, and literals.

Constant-width bold is used for:

  • User input.

  • Signifying emphasis in code examples and fragments.

Constant-width italic is used for:

  • Replaceable elements in code statements.

Italic is used for:

  • New terms where they are defined.

  • Signifying emphasis in body text.

  • Pathnames, filenames, and program names. (However, if the program name is also the name of a Java class, it is written in constant-width font, like other class names.)

  • Host and domain names (cafeconleche.org).

This icon indicates a tip, suggestion, or general note.

This icon indicates a warning or caution.

Significant code fragments, complete programs, and documents are generally placed into a separate paragraph like this:

<?xml version="1.0"?> <?xml-stylesheet href="person.css" type="text/css"?> <person>   Alan Turing </person>

XML is case sensitive. The PERSON element is not the same thing as the person or Person element. Case-sensitive languages do not always allow authors to adhere to standard English grammar. It is usually possible to rewrite the sentence so the two do not conflict, and when possible we have endeavored to do so. However, on rare occasions when there is simply no way around the problem, we let standard English come up the loser.

Finally, although most of the examples used here are toy examples unlikely to be reused, a few have real value. Please feel free to reuse them or any parts of them in your own code. No special permission is required. As far as we are concerned, they are in the public domain (though the same is definitely not true of the explanatory text).

Request for Comments

We enjoy hearing from readers with general comments about how this book could be better, specific corrections, or topics you would like to see covered. You can reach the authors by sending email to elharo@metalab.unc.edu and smeans@enterprisewebmachines.com. Please realize, however, that we each receive several hundred pieces of email a day and cannot respond to everyone personally. For the best chance of getting a personal response, please identify yourself as a reader of this book. And please send the message from the account you want us to reply to and make sure that your reply-to address is properly set. There's nothing so frustrating as spending an hour or more carefully researching the answer to an interesting question and composing a detailed response, only to have it bounce because the correspondent sent the message from a public terminal and neglected to set the browser preferences to include his actual email address.

The information in this book has been tested and verified, but you may find that features have changed (or you may even find mistakes). We believe the old saying, "If you like this book, tell your friends. If you don't like it, tell us." We're especially interested in hearing about mistakes. As hard as the authors and editors worked on this book, inevitably there are a few mistakes and typographical errors that slipped by us. If you find a mistake or a typo, please let us know so we can correct it in a future printing. Please send any errors you find directly to the authors at the previously listed email addresses.

You can also address comments and questions concerning this book to the publisher:

O'Reilly & Associates, Inc.
1005 Gravenstein Highway North
Sebastopol, CA 95472
(800) 998-9938 (in the United States or Canada)
(707) 829-0515 (international or local)
(707) 829-0104 (fax)

We have a web site for the book, where we list errata, examples, and any additional information. You can access this site at:


Before reporting errors, please check this web site to see if we have already posted a fix. To ask technical questions or comment on the book, you can send email to the authors directly or send your questions to the publisher at:


For more information about other O'Reilly books, conferences, software, Resource Centers, and the O'Reilly Network, see the web sites at:



Many people were involved in the production of this book. The original editor, John Posner, got this book rolling and provided many helpful comments that substantially improved the book. When John moved on, Laurie Petrycki shepherded this book to its completion. The eagle-eyed Jeni Tennison read the entire manuscript from start to finish and caught many errors large and small. Without her attention, this book would not be nearly as accurate. Stephen Spainhour deserves special thanks for his work on the reference section. His efforts in organizing and reviewing material helped create a better book. We'd like to thank Matt Sergeant and Didier P. H. Martin for their thorough technical review of the manuscript and thoughtful suggestions. James Kass's Code2000 font was invaluable in producing Chapter 26.

We'd also like to thank everyone who has worked so hard to make XML such a success over the last few years and thereby given us something to write about. There are so many of these people that we can only list a few. In alphabetical order we'd like to thank Tim Berners-Lee, Jonathan Borden, Jon Bosak, Tim Bray, David Brownell, Mike Champion, James Clark, Charles Goldfarb, Jason Hunter, Arnaud Le Hors, Michael Kay, Keiron Liddle, Murato Makoto, Eve Maler, Brett McLaughlin, David Megginson, David Orchard, Walter E. Perry, Simon St.Laurent, C. M. Sperberg-McQueen, Jonathan Robie, Arved Sandstrom, James Tauber, Henry S. Thompson, B. Tommie Usdin, Daniel Veillard, Norm Walsh, Lauren Wood, and Mark Wutka. Our apologies to everyone we unintentionally omitted.

Elliotte would like to thank his agent, David Rogelberg, who convinced him that it was possible to make a living writing books like this rather than working in an office. The entire Sunsite crew (now ibiblio.org) has also helped him to communicate better with his readers in a variety of ways over the last several years. All these people deserve much thanks and credit. Finally, as always, he offers his largest thanks to his wife, Beth, without whose love and support this book would never have happened.

Scott would most like to thank his lovely wife, Celia, who has already spent way too much time as a "computer widow." He would also like to thank his daughter Selene for understanding why Daddy can't play with her when he's "working" and Skyler for just being himself. Also, he'd like to thank the team at Enterprise Web Machines for helping him make time to write. Finally, he would like to thank John Posner for getting him into this and Laurie Petrycki for working with him when things got tough.

Elliotte Rusty Harold, elharo@metalab.unc.edu

W. Scott Means, smeans@enterprisewebmachines.com

XML in a Nutshell
XML in a Nutshell, 2nd Edition
ISBN: 0596002920
EAN: 2147483647
Year: 2001
Pages: 28

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net