Controlling Parser Behavior

I l @ ve RuBoard

Currently, PHP's XML parser allows you to control the following:

  • Case folding

  • Target encoding

  • Whitespace processing

All these attributes can be controlled via the xml_set_option() function, which accepts three parameters:

  • A handle for the parser to be modified

  • The attribute name

  • The attribute value (either string or Boolean)

The sections that follow describe each of these parameters in greater detail with examples.

Case Folding

Within the context of an XML document, case folding simply involves replacing lowercase characters in element names with their uppercase equivalents. XML element names are case-sensitive; typically, you use case folding to impose consistency on mixed-case element names so that they can be handled in a predictable manner.

This option is controlled via the XML_OPTION_CASE_FOLDING attribute and is set to true by default.

In order to see how this works, take a look at Listing 2.15, which modifies Listing 2.3 to turn off case folding (element names will no longer be uppercase).

Listing 2.15 Demonstration of Case Folding
 ...  // initialize parser  $xml_parser = xml_parser_create();  // turn off case folding   xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, FALSE);  // set callback functions  xml_set_element_handler($xml_parser, "startElementHandler", "endElementHandler");  xml_set_character_data_handler($xml_parser, "characterDataHandler");  ... 

Here's the output:

 Found opening tag of element:  sentence  Found CDATA:  The  Found opening tag of element:  animal  Found attribute:  color = blue  Found CDATA:  fox  Found closing tag of element:  animal  Found CDATA:  leaped over the  Found opening tag of element:  vegetable  Found attribute:  color = green  Found CDATA:  cabbage  Found closing tag of element:  vegetable  Found CDATA:  patch and vanished into the darkness.  Found closing tag of element:  sentence  

Target Encoding

You already know that it's possible to specify a character set for document encoding when an XML parser is created with the xml_parser_create() function. (Refer to the "Speaking Different Tongues" sidebar at the beginning of this chapter.) In geek lingo, this is referred to as source encoding .

In addition, PHP also allows you to specify target encoding , which is the encoding to use when the parser passes data to a handler function.

By default, this encoding is the same as the source encoding; however, you can alter it via the XML_OPTION_TARGET_ENCODING attributes, which supports any one of the following encodings: ISO-8859-1, US-ASCII, and UTF-8.

The following example sets the target encoding for the parser to UTF-8:

 xml_parser_set_option($xml_parser, XML_OPTION_TARGET_ENCODING, "UTF-8"); 

Whitespace Processing

You can tell the parser to skip the whitespace it encounters by setting the XML_OPTION_SKIP_WHITE attribute to true . This attribute can come in handy if your XML document contains tabs or spaces that could interfere with your program logic.

The following example turns whitespace processing off:

 xml_parser_set_option($xml_parser, XML_OPTION_SKIP_WHITE, 1); 

You can obtain the current value of any of the parser's attributes with the xml_parser_get_option() function, which returns the value of the specified attribute. For example:

 xml_parser_get_option($xml_parser, XML_OPTION_CASE_FOLDING); 
I l @ ve RuBoard


XML and PHP
XML and PHP
ISBN: 0735712271
EAN: 2147483647
Year: 2002
Pages: 84

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net