Introduction to SAX


The SAX parser is an event-based , non-validating parser that reads data from the XML document. The current version of SAX is SAX 2. SAX2 processes documents in a sequential manner. It reads a part of the XML document and generates events when it finds an XML tag. It then reads the next part of the XML document. You can use the SAX parser to modify, query, and write an XML document.

Architecture of the SAX Parser

The SAX parser checks the validity of the structure of an XML document. The SAX parser consists of various handlers that are invoked for each XML tag. The handlers are user -defined functions, which are also called callback functions.

Figure 2-1 shows the architecture of the SAX parser:

click to expand: this figure shows the sax parser. while parsing an xml document, it invokes a callback function for handling an event when it occurs.
Figure 2-1: Architecture of the SAX Parser

Working with the SAX Parser in PHP

The SAX parser invokes handlers for each opening and closing tag in an XML document. It also invokes handlers for character data and processing instructions of the XML document. To use the SAX parser:

  1. Initialize the SAX parser using the PHP function, xml_parser_create(). The code to initialize the SAX parser is:

     $xparser=xml_parser_create(); 

    The above code initializes the SAX parser and creates the xparser variable, which provides a reference to the SAX parser.

  2. Identify the events, and set the callback functions to be invoked for the events. The code to identify the events and set the callback functions is:

     xml_set_element_handler($xparser, "startingHandler", "endingHandler"); xml_set_character_data_handler($xparser, "cdataHandler"); 

    The above code shows that the xml_set_element_handler() function is the built-in function of PHP. In the above code:

    • The xml_set_element_handler() function invokes the callback functions for the opening and closing tags of an XML document.

    • The startingHandler function is the callback function that is invoked when the SAX parser finds an opening tag.

    • The endingHandler function is the callback function that is invoked when the SAX parser finds a closing tag.

    • The xml_set_character_data_handler() function is a built-in function in PHP. The function specifies the callback functions to be invoked for character data within the tags of the XML document.

    • The cdataHandler function is the callback function that is invoked when the SAX parser finds character data within the XML document.

  1. Provide the code for the callback functions, startingHandler(), endingHandler(), and cdataHandler(), in the PHP script.

  2. Open the XML document using the fopen() function, as shown in the following code:

     if(!($fp=fopen("student.xml","r"))) {    die ("File does not exist"); } 

    The above code creates the fp variable, which refers to the student.xml file. In the above code:

    • The fopen() function opens the student.xml file in the read mode. If the fp variable does not contain the pointer to the XML file, the code displays the error message, File does not exist.

    • The die() function is the built-in function of PHP that terminates the execution of the script and displays the message specified as an argument.

  1. Parse the XML document using the xml_parse() function, as shown in Listing 2-3:

    Listing 2-3: Parsing the XML Document

    start example
     while($data=fread($fp, 4096)) { if(!xml_parse($xparser,$data,feof($fp)))    {       die("XML parse error: xml_error_string(xml_get_error_code($xparser))");    } } 
    end example
     

    In the above code:

    • The SAX parser reads the content of the XML document in chunks of 4KB.

    • The xml_parse() function parses the XML document until it reaches the end of the XML document.

    • The feof() function returns the Boolean value, true, if the end of the document is reached, and notifies the parser to terminate the processing.

    • The die() function terminates the execution when an error occurs in parsing the XML document.

    • The xml_get_error_code() function returns the error code and the xml_error_string() function returns the error description corresponding to the error code.

  2. Release the resources of the XML parser using the xml_parser_free() function of PHP, as shown in the following code:

     xml_parser_free($xml_parser); 

    The above code releases the XML parser when the execution of the script ends.

Note  

To initialize the parser with other encoding schemes, use the following code:

 $xparser=xml_parser_create("UTF-16"); 

Implementing the SAX Parser

The SAX parser consists of various functions, known as handlers. Each handler is invoked when the SAX parser finds certain events, such as opening tag, closing tag, character data, processing instructions, and comments.

For parsing an XML file, you need to provide the XML data file to the SAX parser, as shown:

 $xfile="student.xml"; 

The above code creates the xfile variable that contains the name of the XML document to be parsed by the SAX parser. You can refer to the student.xml file using the $xfile variable within the PHP script.

To implement the SAX parser, you need to create callback functions for handling all events. PHP passes three parameters to the startingHandler() callback function, which are:

  • Reference to SAX parser

  • Element name

  • Element attributes

Listing 2-4 shows the startingHandler() callback function:

Listing 2-4: Handling the Opening Tag Event
start example
 function startingHandler($xparser, $element_name, $attributes) {    echo "Opening Tag:<b>$element_name</b><br>";    while (list($key,$value)=each($attributes))    {       echo "Attribute:<b><i>$key=$value</i></b><br>";    } } 
end example
 

In the above listing:

  • The parser invokes the startingHandler() function when it finds the opening tag.

  • The parser displays the name of opening tags in bold.

  • The PHP functions, list() and each(), access the array variables .

  • The parser also displays the name and value of the attributes of the tag elements in italics and bold.

Unlike the start tag handler, PHP passes two parameters to the endingHandler() callback function, because it does not contain attributes. The endingHandler() callback function is the end tag handler, which is invoked when the parser finds an end tag. The parameters passed to the end tag handler include the reference to the SAX parser and the element name.

The code to define the endingHandler() callback function, is:

 function endingHandler($xparser, $element_name) {    echo "Closing Tag:<b>$element_name</b><br>"; } 

The above code shows that the parser invokes the endingHandler() function when it finds the closing tag. It displays the names of the closing tags of the XML document in bold.

PHP passes two parameters to the character data handler. The parameters passed to the character data handler include the reference to the SAX parser and the character data.

The code to define the character data callback function, cdataHandler, is:

 function cdataHandler($xparser, $cdata) {    echo "CDATA: <i><u>$cdata</u></i><br>"; } 

The above code shows that the cdataHandler() function is invoked when the parser finds text between the opening and closing tags. The cdataHandler() function displays the text between the opening and closing tags in underlined and italicized format.

You can implement the SAX parser in a PHP script, as shown in Listing 2-5:

Listing 2-5: Implementing the SAX Parser
start example
 <html><head> <basefont face="Times New Roman"> </head> <body> <?php function startingHandler($xparser, $element_name, $attributes) {    echo "Opening Tag:<b>$element_name</b><br>";    while (list($key,$value)=each($attributes))    {       echo "Attribute:<b><i>$key=$value</i></b><br>";    } } function endingHandler($xparser, $element_name) {    echo "Closing Tag:<b>$element_name</b><br>"; } function cdataHandler($xparser, $cdata) {    echo "CDATA: <i><u>$cdata</u></i><br>"; } $xfile="student.xml"; $xparser=xml_parser_create(); xml_set_element_handler($xparser, "startingHandler","endingHandler"); xml_set_character_data_handler($xparser,"cdataHandler"); if(!($fp=fopen($xfile,"r"))) {    die("File Input/Output error: $xfile"); } while($data=fread($fp,4096)) {    if(!xml_parse($xparser,$data,feof($fp)))    {       die("XML parser error: xml_error_string(xml_get_error_code($xparser))");    } } xml_parser_free($xparser); ?> </body> </html> 
end example
 

The above listing shows that the SAX parser of PHP parses the XML document, student.xml. In the above code:

  • The $xparser variable indicates the reference to the SAX parser.

  • The xml_set_element_handler() function contains the functions, startingHandler() and endingHandler(), for handling the opening and closing tags respectively.

  • The xml_set_character_data_handler() function contains the cdataHandler() function for handling the text between the opening and closing tags.

The content of the student.xml file is, as shown:

 <?xml version="1.0"?> <studentdata><student><name id="s001">George</name><age>15</age><address>New York</address><standard>10</standard></student></studentdata> 
Note  

The cdataHandler() function also accepts any white space in the XML file as its parameter.

Figure 2-2 shows the output of Listing 2-5:

click to expand: this figure shows the opening and closing tags in bold, and attributes and their values in bold and italics. the figure shows the character data in underlined and italicized format.
Figure 2-2: Output of Listing 2-5

Using the Expat Parser

The Expat parser is a SAX parser that supports the event-driven approach of parsing a document. The Expat parser is the default parser for the PHP scripting language. This parser contains wrapper classes and filters, which you use to perform advanced processing on XML documents, such as transforming, updating, and querying XML documents.

The PHPXML class is an API that contains wrapper classes, which implement filters in the SAX parser. The class_sax_filters.php file of the SAX parser is an example of a wrapper class.

Note  

You can download the PHPXML class from http://phpxmlclasses. sourceforge .net/

The AbstractSAXParser class implements the SAX parser. This class checks the XML document and invokes events using the objects of the AbstractFilter class, called listener objects. The methods used by the AbstractSAXParser class are:

  • AbstractSAXParser() : Creates a parser by passing the XML document as an argument. It is a constructor of the AbstractSAXParser class.

  • Parse() : Parses the XML document and invokes the StartElementHandler($xml_file, $element_name, $attributes), EndElementHandler($xml_file, $element_name), and CharacterDataHandler($cdata) functions.

  • SetListener() : Assigns the listeners to a parser object.

The ExpatParser class implements the AbstractSAXParser class to parse an XML document. The constructor of the ExpatParser class accepts the XML document as an argument. The code to implement the ExpatParser class is:

 $xml_parser=new ExpatParser("file.xml"); $f1=new FilterAddStudent(); $xml_parser->SetListener(f1); $xml_parser->parse(); 

In the above code:

  • file.xml is the name of the XML document, and the xml_parser variable refers to the Expat parser.

  • FilterAddStudent is the user-defined class, which acts as a filter that is implemented by the AbstractSAXParser class.

  • The SetListener() method accepts the object of the filter class as an argument and sets the filter class as listener of the events generated by the SAX parser.

Filters are user-defined classes that accept SAX events from a parser, process it, and provide the result to other filters or Web browsers. A filter class implements the methods that are defined in the AbstractFilter class. You need to extend the AbstractFilter class to create a user-defined filter. The handlers implemented by a filter class are:

  • StartElementHandler($element_name, $attributes) : Accepts two arguments, name and attributes, of the element of the starting tag of an XML document. This handler is invoked when a parser finds the starting tag of an element of the XML document.

  • EndElementHandler($element_name) : Accepts only one argument, because the ending tag of an XML document does not contain attributes. This handler is invoked when a parser finds the ending tag.

  • CharacterDataHandler($cdata) : Is invoked when a parser finds text within the starting and ending tags.

  • SetListener($object) : Sets the listener object of the filter. It accepts the object of the filter class as an argument. You need not implement the SetListener() function in the filter class, because the AbstractFilter abstract class already contains the functionality of the SetListener() function.

Filters that do not invoke other events, but display the output on the Web browser, are called finalizers. The FilterOutput() method is a finalizer, which displays the output of the XML document on the Web browser.

Listing 2-6 shows how to implement the AbstractFilter abstract class:

Listing 2-6: Implementing the AbstractFilter Class
start example
 <?php include_once("/class_sax_filters.php"); public class FilterAddStudent extends AbstractFilter {    function StartElementHandler($element_name, $attributes)    {       // Implementation of the start tag handler    }    function EndElementHandler($element_name)    {       // Implementation of the end tag handler    }    function CharacterDataHandler($cdata)    {       // Implementation of the character data handler    } } $xml_parser=new ExpatParser("file.xml"); $f1=new FilterAddStudent(); $f2=new FilterOutput(); $f1->SetListener($f2); $xml_parser->SetListener(f1); $xml_parser->parse(); ?> 
end example
 

The above listing shows that the AbstractFilter class implements the FilterAddStudent filter class. The f2 variable refers to the object of the FilterOutput class that displays the output of the XML document on the Web browser.




Integrating PHP and XML 2004
Integrating PHP and XML 2004
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 51

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net