Chapter 3: PHP and Document Object Model | Integrating PHP and XML 2004

Download CD Content

Using the tree-based Document Object Model (DOM) approach to parse XML data, PHP loads the complete XML document in the system memory and provides standard classes to navigate and manipulate the XML document. The DOM implementation in PHP is based on the libxml library, which contains various functions that provide parsing capabilities to modify the XML document.

This chapter explains how to implement DOM in PHP to parse XML documents. This chapter also provides a comparison between the event-based , Simple API for XML (SAX) approach, and the tree-based DOM approach to parse XML documents. Finally, this chapter explains the DOM architecture, standard DOM classes, and the uses of the DOM approach to modify and query XML documents.

Introducing DOM

DOM is a programming interface that you can use to create and edit XML documents. It specifies the logical structure of the document that you can use to add, delete, or modify XML documents.

DOM parses the XML document by creating a hierarchical tree structure of standard objects. Each object encapsulates the methods and properties that you can use to manipulate and navigate the DOM tree. For example, the DomAttribute object encapsulates attribute properties, such as name and value. The basic interface in every DOM object is a node. The tree structure contains multiple nodes that are related to each other in a parent-child relationship.

DOM is platform-independent, and can interact with both HTML and XML documents. You can implement the DOM approach with languages, such as PHP, Java, Python, Visual Basic, Perl, and Delphi.

The W3C has defined DOM specifications in multiple levels. The various levels of DOM specifications are:

DOM Level 1: Contains the core features of DOM developed by the W3C in October 1998.
DOM Level 2: Contains DOM features, such as core functions, document traversal, and event handling.
DOM Level 3: Incorporates features, such as XPath and abstract schemas.

A DOM parser reads the entire XML document into memory, converts it into a hierarchical tree structure, and provides an API to access tree nodes and the contents attached to each node. For example, the DOM parser can represent the contents of emp.xml that stores employee information, such as company name and department name in a hierarchical structure.

Listing 3-1 shows how to create the emp.xml document:

Listing 3-1: Creating an XML Document

 <?xml version="1.0" encoding="UTF-8"?> <Company name="Unique Systems">  <Department>Marketing</Department> <Level>Middle-Level</Level> <EmployeeInformation> <Name first="John" middle="E" last="Williams"/> <DateOfHiring>10/01/1982</DateOfHiring> </EmployeeInformation> </Company>

The above listing creates an XML document, emp.xml, which stores employee information, such as employee names, company names, department names , and hiring dates.

The DOM tree contains various nodes interlinked to each other with the parent-child relationship. For example, in the emp.xml document, the Marketing node is the child node of the Department node, and Name is the child node of the Employee Information node. You can parse this XML document into a hierarchical structure using the DOM parser.

Figure 3-1 shows a representation of the emp.xml file as a DOM tree structure generated by the DOM parser:

click to expand: this figure shows the dom tree structure of the xml document that stores the employee information.

Figure 3-1: DOM Tree Representation of emp.xml

Comparing DOM with SAX

You can parse an XML document using both the DOM and SAX parsers. Both the DOM and SAX approaches have their advantages and disadvantages when working with XML documents.

Table 3-1 lists comparisons between the SAX and DOM parsing approaches:

Table 3-1: Comparison between DOM and SAX
Mode of Comparison	DOM	SAX
Approach	Uses a tree-based approach to parse XML documents. The DOM approach loads the whole XML document in the memory as a tree structure that contains various nodes.	Uses an event-based approach that processes the XML files in a linear manner. The SAX approach parses the XML documents in chunks without creating a DOM tree.
Memory	Consumes system memory as the DOM parser loads the whole XML document in the memory.	Consumes less system memory in comparison to DOM, as it does not load the whole XML structure in memory and parses the XML document in chunks.
Efficiency	Lets you parse XML documents in a non-sequential manner and manipulate complex XML documents, as the DOM parser stores information about how the nodes of a tree are related to each other.	Faster than the DOM approach in parsing simple XML documents, as it processes the XML documents in a sequential manner

The DOM Architecture

The DOM architecture contains modules, where each module defines various DOM API domains, such as XML, HTML, tree events, and Cascading Style Sheets (CSS).

Figure 3-2 shows the block diagram of the DOM architecture:

click to expand: this figure shows the block diagram of the dom architecture.

Figure 3-2: The DOM Architecture

The various modules in the DOM architecture are:

Core: Represents the internal tree-like structure of the document and enables you to move through the hierarchy of the DOM tree elements.
XML: Provides interface-processing instructions to the DOM parser, such as entities and CDATA.
HTML: Manipulates HTML documents.
Events: Defines events that perform XML-tree manipulation.
Cascading Style Sheets: Manipulates CSS style sheets.
Load and Save: Provides various parameters that control the load and save operations of XML documents. The load module lets you load the XML document into the DOM tree and save the DOM tree into an XML document.
Validation: Provides various methods, such as cansetAttribute and canInsertAttribute, which provide validation checks on the DOM documents.
Xpath: Provides methods to query a DOM tree.