XML has gained popularity as a data-exchange and message-passing format. As web services become more widespread, XML plays an even more important role in a developer's life. With the help of a few extensions, PHP lets you read and write XML for every occasion.
XML provides developers with a structured way to mark up data with tags arranged in a tree-like hierarchy. One perspective on XML is to treat it as CSV on steroids. You can use XML to store records broken into a series of fields. But instead of merely separating each field with a comma, you can include a field name, a type, and attributes alongside the data.
Another view of XML is as a document representation language. For instance, this book was written using XML. The book is divided into chapters; each chapter into recipes; and each recipe into Problem, Solution, and Discussion sections. Within any individual section, we further subdivide the text into paragraphs, tables, figures, and examples. An article on a web page can similarly be divided into the page title and headline, the authors of the piece, the story itself, and any sidebars, related links, and additional content.
XML content looks similar to HTML. Both use tags bracketed by < and > for marking up text. But XML is both stricter and looser than HTML. It's stricter because all container tags must be properly closed. No opening elements are allowed without a corresponding closing tag. It's looser because you're not forced to use a set list of tags, such as <a>, <img>, and <h1>. Instead, you have the freedom to choose a set of tag names that best describe your data.
Other key differences between XML and HTML are case sensitivity, attribute quoting, and whitespace. In HTML, <B> and <b> are the same bold tag; in XML, they're two different tags. In HTML, you can often omit quotation marks around attributes; XML, however, requires them. So you must always write:
Additionally, HTML parsers generally ignore whitespace, so a run of 20 consecutive spaces is treated the same as one space. XML parsers preserve whitespace, unless explicitly instructed otherwise. Because all elements must be closed, empty elements must end with />. For instance, in HTML, the line break is <br>, while in XHTML, which is HTML that validates as XML, it's written as <br />.
There is another restriction on XML documents. When XML documents are parsed into a tree of elements, the outermost element is known as the root element. Just as a tree has only one trunk, an XML document must have exactly one root element. In the previous book example, this means chapters must be bundled inside a book tag. If you want to place multiple books inside a document, you need to package them inside a bookcase or another container. This limitation applies only to the document root. Again, just like trees can have multiple branches off of the trunk, it's legal to store multiple books inside a bookcase.
This chapter doesn't aim to teach you XML; for an introduction to XML, see Learning XML by Erik T. Ray (O'Reilly). A solid nuts-and-bolts guide to all aspects of XML is XML in a Nutshell by Elliotte Rusty Harold and W. Scott Means (O'Reilly).
Now that we've covered the rules, here's an example: if you are a librarian and want to convert your card catalog to XML, start with this basic set of XML tags:
<book> <title>PHP Cookbook</title> <author>Sklar, David and Trachtenberg, Adam</author> <subject>PHP</subject> </book>
From there, you can add new elements or modify existing ones. For example, <author> can be divided into first and last name, or you can allow for multiple records so two authors aren't placed in one field.
PHP 5 has a completely new set of XML extensions that address major problems in PHP 4's XML extensions. While PHP 4 allows you to manipulate XML, its XML tools are only superficially related. Each tool covers one part of the XML experience, but they weren't designed to work together, and PHP 4 support for the more advanced XML features is often patchy. Not so in PHP 5. The new XML extensions:
Additionally, following the PHP tenet that creating web applications should be easy, there's a new XML extension that makes it simple to read and alter XML documents. The aptly named SimpleXML extension allows you to interact with the information in an XML document as though these pieces of information are arrays and objects, iterating through them with foreach loops and editing them in place merely by assigning new values to variables.
The first two recipes in this chapter cover parsing XML. Recipe 12.1 shows how to write XML without additional tools. To use the DOM extension to write XML in a standardized fashion, see Recipe 12.2.
The complement to writing XML is parsing XML. That's the subject of the next three recipes. They're divided based upon the complexity and size of the XML document you're trying to parse. Recipe 12.3 covers how to parse basic XML documents. If you need more sophisticated XML parsing tools, move onto Recipe 12.4. When your XML documents are extremely large and memory intensive, turn to Recipe 12.5. If this is your first time using XML, and you're unsure which recipe is right for you, try them in order, as the code becomes increasingly complex as your requirements go up.
XPath is the topic of Recipe 12.6. It's a W3C standard for extracting specific information from XML documents. We like to think of it as regular expressions for XML. XPath is one of the most useful, yet unused parts of the XML family of specifications. If you process XML on a regular basis, you should be familiar with XPath.
With XSLT, you can take an XSL stylesheet and turn XML into viewable output. By separating content from presentation, you can make one stylesheet for web browsers, another for PDAs, and a third for cell phones, all without changing the content itself. This is the subject of Recipe 12.7.
After introducing XSLT, the two recipes that follow show how to pass information back and forth between PHP and XSLT. Recipe 12.8 tells how to send data from PHP to an XSLT stylesheet; Recipe 12.9 shows how to call out to PHP from within an XSLT stylesheet.
As long as your XML document abides by the structural rules of XML, it is known as well-formed. However, unlike HTML, which has a specific set of elements and attributes that much appear in set places, XML has no such restrictions.
Yet, in some cases, such as XHTML, the XML version of HTML, it's useful to make sure your XML documents abide by a specification. This allows tools, such as web browsers, RSS readers, or your own scripts, to easily process the input. When an XML document follows all the rules set out by a specification, then it is known as valid. Recipe 12.10 covers how to validate an XML document.
One of PHP 5's major limitations is its handling of character sets and document encodings. PHP strings are not associated with a particular encoding, but all the XML extensions require UTF-8 input and emit UTF-8 output. Therefore, if you use a character set incompatible with UTF-8, you must manually convert your data both before sending it into an XML extension and after you receive it back. Recipe 12.11 explores the best ways to handle this process.
The chapter concludes with a number of recipes dedicated to reading and writing a number of common types of XML documents, specifically RSS and Atom. These are the two most popular data syndication formats, and are useful for exchanging many types of data, including blog posts, podcasts, and even mapping information.
PHP Cookbook also covers all the popular types of web services: REST, XML-RPC, and SOAP. This topic is so important, it gets two dedicated chapters of its own. Chapter 14 describes how to consume web services, while Chapter 15 tells how to can implement web services of your very own.