Creating an XML Parser

   

Before you can use PHP's XML extension to read XML files, you must create a parser. Since XML documents are extensible, there is no way for PHP to know what elements you are searching for in the XML. Therefore, you must tell PHP how it should parse the document. You tell PHP how to parse the document by defining a new XML parser instance and then defining element handlers and character handlers. An element handler is simply a function that runs when an element is encountered in the XML. You need to define two element handlers, one handler for when an element is encountered by PHP and another handler for when the PHP parser leaves the current element. Additionally, you must specify a handler for the character data that exists between elements. If you are a little rusty on your XML-speak, here is a short example of an XML document:

 <?xml version="1.0"?>  <document>     <title>XML Is Easy</title>     <body>Demystifying XML. Read about it here.</ body>   </document> 

Looks simple enough, right? That's because it is. Since you are already familiar with HTML, the basics of XML should be readily apparent. Elements are enclosed by "<" and ">". Start and end elements are differentiated by the presence or absence of the "/" symbol. Think of a first-level heading tag: <h1>A Heading</h1>. The first <h1> tag starts the element, and the second </h1> tag closes the element. The characters in between are the character data. Now this is a hugely simplified example, but if you are unfamiliar with XML, then it should shed some light on what we are about to do as we create a parser.

Defining the XML Parser

You define an XML Parser by using the xml_parser_create() function:

 $parser = xml_parser_create(ENCODING);  

You assign xml_parser_create() to a variable, which is passed to the other functions required for parsing XML pages. Additionally, you can optionally assign the type of encoding that the parser should use. Encoding is the character encoding in the XML document. You can choose one of three encoding types:

  • ISO-8859-1 (default)

  • US-ASCII

  • UTF-8

Once you have defined a new parser instance, you can then create your handlers to do the actual work of reading through an XML file.

Defining the Element Handlers

Element handlers are defined using the xml_set_element_handler() function:

 xml_set_element_handler(XML_PARSER, START_FUNCTION,  END_FUNCTION); 

xml_set_handler takes three arguments:

  • XML_PARSER The variable that you created when you called the xml_create_parser() function.

  • START_FUNCTION The name of the function to call when the parser encounters a start element.

  • END_FUNCTION The name of the function to call when the parser encounters an end element.

Defining Character Handlers

Character handlers are defined using the set_character_handler() function:

 xml_set_character_handler(XML_PARSER,  CHARACTER_FUNCTION); 

xml_set_character_handler() takes two arguments:

  • XML_PARSER The variable that you created when you called the xml_create_parser() function.

  • CHARACTER_FUNCTION The name of the function to call when character data is encountered.

Starting the Parser

The final piece to the puzzle is the function that starts the whole process, xml_parse():

 xml_parse(XML_PARSER, XML);  

xml_parse() takes two arguments:

  • XML_PARSER The variable that you created when you called the xml_create_parser() function.

  • XML The XML that is to be parsed.

Cleaning Up

After you have finished parsing the document, you should free the memory holding the parser by calling the xml_parser_free() function:

 xml_parser_free(XML_PARSER);  

xml_parser_free() takes one argument, XML_PARSER, which is the variable that you created when you called the xml_create_parser() function.

Let's look at an example to solidify the principles we've just discussed. The following example uses all of the functions just discussed. It opens a simple XML file, aptly named simple.xml, and parses the XML within. Figure 10-2 displays the output. Here is simple.xml, which you'll need to create and place in the same directory as the xml1.php example:

Script 10-1 simple.xml
 1.  <?xml version="1.0"?> 2.  <document> 3.  <title>XML Exposed</title> 4.  <body>Demystifying XML. Read about it here.</body> 5.  </document> 
Script 10-2 xml1.php
  1.  <?  2.  function startElement($xml_parser, $name, $attributes) {  3.  print("<p><i>Encountered Start Element For:</i>$name\n");  4.  }  5.  6.  function endElement($xml_parser, $name) {  7.    print("<p><i>Encountered End Element For:</i>$name\n");  8.  }  9. 10.  function characterData($xml_parser, $data) { 11.    if($data != "\n") {  12.      print("<p><i>Encountered Character Data:</i>$data\n"); 13.    } 14.  } 15. 16.  function load_data($file) { 17.    $fh = fopen($file, "r") or die ("<P>COULD NOT OPEN FILE!"); 18.    $data = fread($fh, filesize($file)); 19.    return $data; 20.  } 21.  /***** MAIN *****/ 22.  $file = "simple.xml"; 23.  $xml_parser = xml_parser_create(); 24.  xml_set_element_handler($xml_parser, "startElement", "endElement"); 25.  xml_set_character_data_handler($xml_parser, "characterData"); 26.  xml_parse($xml_parser, load_data($file)) or die ("<P>ERROR PARSING XML!"); 27.  xml_parser_free($xml_parser); 28.  ?> 
Figure 10-2. xml1.php

graphics/10fig02.jpg

Script 10-2. xml1.php Line-by-Line Explanation

LINE

DESCRIPTION

2

Create a function called startElement() to handle any start elements that the script encounters as it parses the XML. The function takes the following as its arguments (required by PHP):

  • $xml_parser

  • $name

  • $attributes

3

Print a message to the screen when a start element is encountered.

4

End the function declaration.

6

Create a function called endElement() to handle any end elements that the script encounters as it parses the XML. The function takes the following as its arguments (required by PHP):

  • $xml_parser

  • $name

7

Print a message to the screen when an end element is encountered.

8

End the function declaration.

10

Create a function called characterData() to handle the data found between elements as the script parses the XML. The function takes the following as arguments (required by PHP):

  • $xml_parser

  • $data

11 13

If there is actual character data between the elements (not just a newline character), print a message to the screen when character data is encountered.

14

End the function declaration.

16

Create a function called load_data() to read the data from an XML file into the script so that it may be parsed. The function takes one argument, $file, which is the name (with an optional path) of the XML file that you want to parse.

17

Attempt to assign a file handle to the file. If unsuccessful, kill the script and print out an error message.

18

If the file opening was successful, read in the entire file into the $data variable. Note that this is wildly inefficient for large files.

19

Return the $data variable to the calling program.

20

End the function declaration.

21

Begin the main program.

22

Assign a file name to the $file variable. Note that you can use a full path to the file, such as:

  • Windows: $file = "C:\winnt\xml\myxmlfile.xml";

  • Linux: $file = "/home/me/xml/myxmlfile.xml";

23

Create a variable called $xml_parser and assign it as a PHP xml parser using the xml_parser_create() function.

24

Define the custom start and end element handler functions that you created above using the xml_set_element_handler() function. Whenever PHP encounters a start or end element, it will use the respective function.

25

Define the custom data handler function that you created above using the xml_set_character_data_handler() function. Whenever PHP encounters character data, it will use the characterData() function.

26

Begin parsing the XML with the xml_parse function. The xml_parse() function requires the name of the XML parser ($xml_parser) and the XML data that you want to parse. In this case, we provide the XML data by using the custom load_data($file) function. If the xml_parse() function fails, then kill the script and display an error message.

27

After the xml_parse() function has completed parsing the XML, free the memory associated with the parser by using the xml_parser_free() function.


   
Top


Advanced PHP for Web Professionals
Advanced PHP for Web Professionals
ISBN: 0130085391
EAN: 2147483647
Year: 2005
Pages: 92

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net