XSLT and XPATH: A Guide to XML Transformations

 

   

 

 

XSLT and XPATH: A Guide to XML Transformations

By John Robert Gardner, Zarella L. Rendon

 

 

Publisher

: Prentice Hall PTR

Pub Date

: July 01, 2001

ISBN

: 0-13-040446-2

Pages

: 588

 

Mastering XSLT and XPath gives you unprecedented control over your information-and helps you leverage virtually every new XML technology, from XLink to schemas. Discover XSLT's powerful vocabulary of programming-like features and learn how to build custom solutions that resist obsolesence. By the end of the first chapter,you'll be performing XML-to-HTML conversions for display in any web browser. Then build on your knowledge through a series of hands-on examples that transform you into an XSLT/XPath expert!

777

 

 

spacer.gif vspace=4>

Copyright

   

spacer.gif vspace=4>

Preface

   

 

spacer.gif vspace=4>

Why Should You Use XSLT?

   

 

spacer.gif vspace=4>

Who Is This Book For?

   

 

spacer.gif vspace=4>

Organization

   

 

spacer.gif vspace=4>

Conventions

   

 

spacer.gif vspace=4>

Versions

   

   

spacer.gif vspace=4>

Acknowledgments

   

 

spacer.gif vspace=4>

Readers and Contributors

   

   

spacer.gif vspace=4>

Chapter 1.  Anatomy of an XSLT Stylesheet

   

 

spacer.gif vspace=4>

Section 1.1.  What Is Markup?

   

 

spacer.gif vspace=4>

Section 1.2.  What Is XSLT?

   

 

spacer.gif vspace=4>

Section 1.3.  What Is XPath?

   

 

spacer.gif vspace=4>

Section 1.4.  XSLT Stylesheet Concepts

   

 

spacer.gif vspace=4>

Section 1.5.  Terminology for XSLT

   

 

spacer.gif vspace=4>

Section 1.6.  Climbing 'Round the Family Tree: Addressing in XSLT

   

   

spacer.gif vspace=4>

Chapter 2.  Fundamental Concepts of XSLT Stylesheets

   

 

spacer.gif vspace=4>

Section 2.1.  Boilerplates for XSLT Stylesheets

   

 

spacer.gif vspace=4>

Section 2.2.  Embedding Stylesheets in XML Documents

   

 

spacer.gif vspace=4>

Section 2.3.  XSLT Stylesheet Terminology

   

 

spacer.gif vspace=4>

Section 2.4.  XML Components of XSLT Stylesheets

   

   

spacer.gif vspace=4>

Chapter 3.  Advanced Stylesheet Concepts

   

 

spacer.gif vspace=4>

Section 3.1.  Templates: The Building Blocks of Transformations

   

 

spacer.gif vspace=4>

Section 3.2.  Built-in Template Rules

   

   

spacer.gif vspace=4>

Chapter 4.  XPath Expressions

   

 

spacer.gif vspace=4>

Section 4.1.  XPath Syntax and Terminology

   

 

spacer.gif vspace=4>

Section 4.2.  Abbreviations

   

   

spacer.gif vspace=4>

Chapter 5.  XPath Functions

   

 

spacer.gif vspace=4>

Section 5.1.  XPath Function Library

   

 

spacer.gif vspace=4>

Section 5.2.  The Node-set Core Function Group

   

 

spacer.gif vspace=4>

Section 5.3.  String Core Function Group

   

 

spacer.gif vspace=4>

Section 5.4.  Boolean Core Function Group

   

 

spacer.gif vspace=4>

Section 5.5.  Number Core Function Group

   

   

spacer.gif vspace=4>

Chapter 6.  Building New XML Documents with XSLT

   

 

spacer.gif vspace=4>

Section 6.1.  Creating Elements with LREs

   

 

spacer.gif vspace=4>

Section 6.2.  The <xsl:element> Instruction Element

   

 

spacer.gif vspace=4>

Section 6.3.  Creating Attributes with the <xsl:attribute> Instruction Element

   

 

spacer.gif vspace=4>

Section 6.4.  The <xsl:attribute-set> Top-Level Element

   

 

spacer.gif vspace=4>

Section 6.5.  The <xsl:text> Instruction Element

   

 

spacer.gif vspace=4>

Section 6.6.  Adding Attributes to LREs

   

 

spacer.gif vspace=4>

Section 6.7.  Comments and Processing-Instructions

   

 

spacer.gif vspace=4>

Section 6.8.  Namespace Aliases

   

   

spacer.gif vspace=4>

Chapter 7.  Using Multiple Stylesheets

   

 

spacer.gif vspace=4>

Section 7.1.  Working with External Stylesheets

   

 

spacer.gif vspace=4>

Section 7.2.  Template Rule Processing and Priorities

   

   

spacer.gif vspace=4>

Chapter 8.  Working with Variables

   

 

spacer.gif vspace=4>

Section 8.1.  Declaring and Binding Variables

   

 

spacer.gif vspace=4>

Section 8.2.  Result Tree Fragments

   

 

spacer.gif vspace=4>

Section 8.3.  Using Variable References

   

 

spacer.gif vspace=4>

Section 8.4.  Comparing <xsl:variable> and <xsl:param>

   

 

spacer.gif vspace=4>

Section 8.5.  Comparing <xsl:with-param> to <xsl:param> and <xsl:variable>

   

   

spacer.gif vspace=4>

Chapter 9.  Duplication, Iteration, and Conditional XSLT Elements

   

 

spacer.gif vspace=4>

Section 9.1.  The <xsl:copy-of> Instruction Element

   

 

spacer.gif vspace=4>

Section 9.2.  The <xsl:copy> Instruction Element

   

 

spacer.gif vspace=4>

Section 9.3.  The <xsl:for-each> Instruction Element

   

 

spacer.gif vspace=4>

Section 9.4.  The <xsl:sort> Element

   

 

spacer.gif vspace=4>

Section 9.5.  The <xsl:if> Instruction Element

   

 

spacer.gif vspace=4>

Section 9.6.  The <xsl:choose> Instruction Element

   

 

spacer.gif vspace=4>

Section 9.7.  The <xsl:number> Instruction Element

   

   

spacer.gif vspace=4>

Chapter 10.  Controlling Output Options

   

 

spacer.gif vspace=4>

Section 10.1.  The <xsl:output> Top-Level Element

   

 

spacer.gif vspace=4>

Section 10.2.  The <xsl:strip-space> and <xsl:preserve-space> Top-Level Elements

   

 

spacer.gif vspace=4>

Section 10.3.  Generating Error Messages and Logs

   

   

spacer.gif vspace=4>

Chapter 11.  XSLT Functions and Related XSLT Elements

   

 

spacer.gif vspace=4>

Section 11.1.  XSLT Function Groups

   

 

spacer.gif vspace=4>

Section 11.2.  String XSLT Functions

   

 

spacer.gif vspace=4>

Section 11.3.  The Boolean XSLT Function Group

   

   

spacer.gif vspace=4>

Chapter 12.  XSLT Processors, Extensions, and Java

   

 

spacer.gif vspace=4>

Section 12.1.  XSLT Processors

   

 

spacer.gif vspace=4>

Section 12.2.  Extension Elements and Functions

   

 

spacer.gif vspace=4>

Section 12.3.  Namespaces

   

 

spacer.gif vspace=4>

Section 12.4.  Java

   

 

spacer.gif vspace=4>

Section 12.5.  Commercial XSLT Processors

   

   

spacer.gif vspace=4>

Chapter 13.  Xalan, Saxon, and XT

   

 

spacer.gif vspace=4>

Section 13.1.  Xalan

   

 

spacer.gif vspace=4>

Section 13.2.  Saxon

   

 

spacer.gif vspace=4>

Section 13.3.  XT

   

 

spacer.gif vspace=4>

Section 13.4.  Generating Multiple Output Files Using Saxon, Xalan, or XT

   

   

spacer.gif vspace=4>

Appendix A.  Case Studies

   

 

spacer.gif vspace=4>

Section A.1.  Lists

   

 

spacer.gif vspace=4>

Section A.2.  MARC Records: The ATLAS Project from ATLA-CERTR at Emory University

   

 

spacer.gif vspace=4>

Section A.3.  The Harvard-Kyoto Classics Project with Vedic Literature

   

   

spacer.gif vspace=4>

Appendix B.  Grouping Using the Muenchian Method

   

 

spacer.gif vspace=4>

by Jeni Tennison

   

   

spacer.gif vspace=4>

Appendix C.  Using XSLT for the Artificial Intelligence "N-Queens" Problem

   

 

spacer.gif vspace=4>

by Oren Ben-Kiki

   

 

spacer.gif vspace=4>

Section C.1.  Architecture

   

 

spacer.gif vspace=4>

Section C.2.  The Stylesheet

   

 

spacer.gif vspace=4>

Section C.3.  Final notes

   

Copyright

Library of Congress Cataloging-in-Publication Data

Gardner, James Robert.

XSLT and XPath: a guide to XML transformations / James Robert Gardner and Zarella

L. Rendon.

p. cm.

Includes index.

ISBN 0-13-040446-2

1. XML (Document markup language) 2. Internet programming. I. Rendon, Zarella L.

II. Title. III. Series.

QA76.76.H94 R46 2001

005.7'2—dc21 2001016422

© 2002 Prentice Hall PTR

Prentice-Hall, Inc.

Upper Saddle River, NJ 07458

Prentice Hall books are widely used by corporations and government agencies for training, marketing, and resale.

The publisher offers discounts on this book when ordered in bulk quantities.

For more information, contact:

Corporate Sales Department,

Phone: 800-382-3419; FAX: 201-236-7141

E-mail: corpsales@prenhall.com; or write:

Prentice Hall PTR

Corp. Sales Dept.

One Lake Street

Upper Saddle River, NJ 07458

All rights reserved. No part of this book may be reproduced, in any form or by any means, without permission in writing from the publisher.

Printed in the United States of America

10 9 8 7 6 5 4 3 2 1

Pearson Education LTD.

Pearson Education Australia PTY, Limited

Pearson Education Singapore, Pte. Ltd

Pearson Education North Asia Ltd

Pearson Education Canada, Ltd.

Pearson Educación de Mexico, S.A. de C.V.

Pearson Education—Japan

Pearson Education Malaysia, Pte. Ltd

Pearson Education, Upper Saddle River, New Jersey

Credits

Editorial/Production Supervision:

Donna Cullen-Dolce

Acquisitions Editor:

Mark L. Taub

Editorial Assistant:

Sarah Hand

Marketing Manager:

Bryan Gambrel

Manufacturing Buyer:

Maura Zaldivar

Cover Design:

DesignSource

Cover Design Direction:

Jerry Votta

Interior Design:

Gail Cocker-Bogusz

Dedication

For Dale

Preface

You've heard of XML; your manager wants you to use it in your applications. Now what?

You've used HTML, and you know what a tag is; you know that it is somehow related to XML. You may even know what XML is and what it does. What you may not know is that, while XML identifies and adds structure to the content of a document, it does not tell you anything about how to process that content, or how to do anything useful with it beyond storage. This is good news, because this means your content can be used for many different purposes.

There are many things you can use to process content once it is marked up using XML. However, we have chosen to talk about the only standard application that allows you to do many different things with it. With XSLT, you can add style to XML, convert it to other XML, or simply chop it up and regenerate it in a different form.

XSLT is the power behind the throne of XML. It assures that every level of every piece of XML data is accessible and reusable across platforms and forward in time. It is not an exaggeration to say that XSLT and its companion XPath are the very glue and mortar that hold together and build the endlessly varying applications of markup data for any industry, academy, or individual. XSLT is the fastest cure for the fear of having obsolescence in a data or information architecture design.

XSLT is easy to use. In fact, XSLT itself is XML. XSLT "speaks the language," or the syntax, of XML with a powerful vocabulary of programming-like features that are nonetheless easy to use, learn, and understand.

XSLT attempts to be a bridge to nonprogrammers, bringing the easily understood syntax of XML together with a powerful scripting mechanism and simple pathing approach to document navigation.

It is our belief—and our approach in writing this book—that both the experienced programmer and the newly trained markup technologist can become more comfortable with the potent set of tools for preserving, augmenting, updating, and delivering XML data—whether it's on the Web or your corporation's intranet or B2B.

If you are constantly wishing you had just a little more control over your information, this book will deliver that—and much more. In fact, by the end of the first chapter, you will be able to perform basic conversions from XML documents to HTML that will display in any Web browser. Subsequent chapters build upon and enhance that base of knowledge, matching examples with detailed explanations and providing focus upon commonly misunderstood areas.

When you read this book, have your computer handy. Take the time to load up one of the XSLT processors and work along as you read. Learning by doing is always best, especially with XSLT and XPath. Chapter 13 will show you how to install the software included on the CD. Each example in the book is found on the CD in the examples directory, organized by chapter.

XSLT is rewarding and creative to use. Be prepared to enjoy this learning experience. You will be surprised by how quickly productive use of this technology increases.

Why Should You Use XSLT?

Browsers display HTML, not general XML tags. You have to do something with the XML once you have it. Can you print with XML? Can you send XML to the Web? Can you browse XML? Yes, but not alone.

XSLT lets you convert XML to HTML, other types of XML or just plain text. With a little creativity, and the proper knowledge of XSLT, you can generate practically any form of output from XML.

XSLT provides quick, easy solutions to all XML transformation issues. However, the designers of XSLT did not intend for you to use the specification without additional help.

"This book, along with the proper tools, is what is required for XML to succeed with the average business application."

—Sharon Adler, Co-Chair W3C XSL Working Group

The latest version of XSLT (for which this book is written) is 1.0. There are many additional features that are being considered by the W3C XSL committee, and version 2.0 promises to add some of these new features, as well as provide support for XML Schema, XML Query, and others.

Who Is This Book For?

This book is for anyone who works with electronic data and wants to enable XML transformations without a difficult programming language learning curve. If you are comfortable working with SGML, XML, or even HTML, you will benefit greatly from the common markup syntax.

Some people may find XSLT difficult because it is not a procedural programming language. Most programming languages have a very structured, concise syntax. The syntax of XSLT is XML and is designed to be human readable and easily understandable. You must have some knowledge of markup before using XSLT.

Some people may find XSLT difficult to use because it does not provide solutions to every transformation situation. For example, you cannot use XSLT to convert text to XML. There are situations when additional processing may be required. However, for most of your day-to-day XML transformations, XSLT is the tool of choice.

Organization

The book is organized to build a base of knowledge that will be added to chapter by chapter. Basic XSLT concepts and a brief overview of XML are covered in Chapter 1. The remainder of the chapters add functionality as required when creating stylesheets. The more complex the problem, the later it is covered.

Chapter 1 provides everything you need to know about XML and XSLT in a nutshell. This chapter gives a good overview with minimum syntax, and can be used by people at any level of markup experience as a review or for general information.

Chapter 2 covers stylesheet concepts that are crucial to understanding XSLT, as well as general stylesheet terminology.

Chapter 3 adds more concepts, a little more explanation and usage, and an in-depth study of templates to the basics covered in Chapters 1 and 2.

Chapter 4 defines and explains XPath expressions and patterns.

Chapter 5 covers XPath functions, which are crucial to using most of the elements in XSLT.

Chapter 6 walks through the creation of new XML elements and attributes using several different methods.

Chapter 7 discusses the use of multiple stylesheets by including and importing them, as well as a discussion on template priority.

Chapter 8 shows how to work with variables and parameters.

Chapter 9 covers anything that is in some way iterative or conditional, as well as the utilities required to copy XML from the input to the output.

Chapter 10 details the options for controlling output types, as well as stripping and preserving whitespace, and generating error messages.

Chapter 11 covers XSLT functions and their related elements, including importing external XML documents with the document() function, and using keys with <xsl:key>.

Chapter 12 discusses extensions, processors, and Java, as well as three "commercial" XSLT processors.

Chapter 13 describes three "freeware" processors: Xalan, Saxon, and XT, along with installation instructions and extension implementations.

There are three appendices that cover a variety of topics and case studies, as well as contributed material.

Conventions

XML, XSLT, and HTML elements, when discussed in the text, are always found as markup. For example <xsl:stylesheet> will always have the opening <and closing>, and will be in courier font. Any expression or function, such as count(), will also be in courier.

Each element has an element model definition, taken directly from the XSLT specification, when provided, as shown below. The element model is organized as an XML element with an optional category description (as a comment), followed by the start tag with any available attributes, the content (as a comment), and a closing tag (unless the element is empty). Attributes are bold if they are required, and their value is shown in italic if it has a special defined content type, or in quotes if it has a literal value. Elements in the content can be optional, designated with a ?, or optional and repeatable, designated with a *.

<!-- Category: instruction -->
 
<xsl:for-each
 
  select = node-set-expression>
 
  <!-- Content: (xsl:sort*, template) -->
 
</xsl:for-each>

Function prototypes are taken directly from the XPath and XSLT specifications, and are formatted with the key word Function, followed by a colon, followed by an object return type in italics, the name of the function in bold, and a parentheses containing arguments in italics, as follows.

Function: numbersum(node-set)

Versions

This book is written according to XSL Transformations (XSLT) Version 1.0, XML Path Language (XPath) Version 1.0, and Extensible Markup Language (XML) 1.0. Additional reference material came from Namespaces in XML REC-xml-names-19990114.

The version of James Clarks' XT used for the tests in this book is 19991105. The version of Michael Kay's Saxon used is 6.2.2.

Acknowledgments

We would like to acknowledge and give proper thanks to those individuals who in no small part, due to their patience and consideration, contributed to the writing of this book. In addition, there are readers, proofers, and contributors whom we acknowledge at the end of this section.

First and foremost—apart from the irreplaceable contributions and support of those to whom this book is dedicated—we want to acknowledge each other. Working together in such a pressured way and still spending more time laughing than anything else is a feat in itself. Without figuring out each other's varied ways and memes, this book would never have gotten under way. Following that, we each have our own "cast of heroes" to thank.

John Robert thanks those loved ones, friends, and colleagues who have provided much support: Dale Leeser; the Ferntheils; Jeff Leeser; Jay Semel and the great team at the University of Iowa's Obermann Center for Advanced Studies for providing the research base where this book could begin; Michael Witzel at Harvard for providing texts and challenges which made XSLT important early on; Leslie Sims and Mary Sue Coleman; Moya and Jonathan for being so kind and cool; Dr. Mikhail Gorokhov of Atlanta for a revolutionary treatment of therapy and non-narcotic pain relief for tired and sore hands; the gang at Emory University's Center for Electronic Texts in Theology and Religion from ATLA-especially John Wagner for making us prove XSLT was necessary to a relational database world; John Bagby, Russell and Elaine at UAQA.com; Deborah Norris; and Nichiren Nietszche Daishonin for tolerance and his own brand of support.

Practically speaking, the support of Sun Microsystems, specifically from Steven Butler and Karsten Riemer, proved make-or-break in allowing the time and focus to write properly. While finishing and proofing, the fine spirits, food, and folk at the Red Rock Bistro in Swampscott, MA, were both delightful and indispensable, and Ohio's Golden Lamb Inn.

Zarella would personally like to thank Sharon Adler, Carla Corkern, Ellen Campbell, Charles Goldfarb, G. Ken Holman, Steven Newcomb, Paul Prescod, and Jeremy Richman for all their contributions, suggestions, and support. Also a special thanks to all family and friends for their support and encouragement.

Thanks also to the team at ISOGEN for their support and contributions.

Readers and Contributors

Thanks to David Bertoni and David Marston of IBM/Lotus; Sharon Adler at IBM; Norm Walsh, David Hoffert, Donald Kerr, Marc Cannava, Caron Newman, Floyd Jones, and Scott Hudson of Sun Microsystems; Jonathan Marsh of Microsoft; Steve Muench of Oracle; Michael Kay of Software AG; Eric Lawson of ISOGEN; G. Ken Holman of Crane Softwrights, Ltd,.

Thanks to Jeni Tennison for her work on the Muenchian key() function; Oren Ben-Kiki for his offering on the unique use of XSLT to solve the classic N-Queens puzzle from the artificial intelligence community; and Eric Lawson, of ISOGEN, for providing a test GUI for Xalan-J.

Special thanks to the members of the W3C XSL working group for their contributions, especially Sharon Adler, Scott Boag, Michael Kay, Bob Lojek, Jonathan Marsh, Steve Muench, Norm Walsh, and of course, James Clark for making it all work.

We also extend our thanks to Deborah Norris for her work preparing the final graphics in this book.

A very special thanks goes to the patient, knowledgeable, and versatile team at Prentice Hall for their support and eagle-eyed proofing: Mark Taub, Donna Cullen-Dolce, Carol Lallier, and Camie Goffi.