Chapter 2: PHP and Simple Application Programming Interface for XML


 Download CD Content

Hypertext Preprocessor (PHP) is a server-side scripting language that runs on various platforms, such as Linux and Windows. You use PHP to read and parse an eXtensible Markup Language (XML) document using the Simple Application Programming Interface (API) for XML (SAX) parser. PHP displays the parsed XML document on a Web browser.

This chapter introduces the parsers, and explains how to implement the SAX parser to parse XML documents. It also describes the object-oriented framework of parsers.

Parsing an XML Document

Parsing validates the structure of an XML document by examining its syntax. A parser validates the structure of an XML document using Document Type Definition (DTD), which specifies the valid format for the syntax of the elements contained in an XML document. The XML parser checks the opening and closing tags within the code of an XML document, against the rules specified in DTD.

Various Web browsers, such as Mozilla, Konqueror, and Internet Explorer, contain built-in parsing capabilities to parse XML documents. To provide better readability and presentation of data, you can use PHP to parse the XML documents. PHP 3.0 or latest version supports XML parsing. To parse the XML document, you need to initialize the XML parser in the PHP code. You can use the data of the parsed XML document in other applications.

Introducing Parser

XML parser is a program that checks whether the XML document is a well-formed document. A document that satisfies the syntax and standard rules specified by the World Wide Web Consortium (W3C) for XML is called a well- formed document. There are two types of XML parsers, which are:

  • Non-validating parser: Checks if an XML document is well-formed according to the XML syntax rules.

  • Validating parser: Checks if the XML document is well-formed and valid according to the rules specified by DTD.

The parser translates the data of an XML document into platform-specific objects. The non-validating parser adopts the event-driven approach of parsing, and the validating parser adopts the tree-based approach of parsing.

In the event-driven parsing approach, the parser processes XML data and XML tags sequentially in the memory, one at a time. The XML parser generates an event that does parsing, when it finds any XML element or data within the elements of an XML document. An example of the event-driven parser is the SAX parser.

In the tree-based parsing approach, the parser processes and organizes an XML document in a hierarchical form, all at one time. It supports Document Object Model (DOM), which is a platform and language independent model. An example of the tree-based parser is the DOM parser. The DOM parser reads an XML document and divides it into various objects, such as elements, attributes, and comments. DOM creates a tree structure for each element of the XML document and stores the structure in the memory. As a result, the DOM parser performs fast searching of any element in the XML document, as compared to the SAX parser.

Creating a Parser in PHP

You can implement an XML parser by including the xml_parser_create() function in the PHP code. The syntax to create an XML parser is:

 resource xml_parser_create(string encoding) 

The above code shows that the xml_parser_create() function creates an XML parser, and specifies that the return type of the function is resource. The resource type returns the resources handled by the XML document, which are used by other functions of the document.

The parameter of the xml_parser_create() function is optional, and provides the character encoding schemes in which the XML document is parsed. The character encoding schemes supported by the XML parsers are: ISO-8859-1, UTF-8, and US-ASCII. UTF stands for Universal Character Set (UCS) Transformation Format. The default character encoding format is ISO-8859-1.

PHP 4.0 or later versions also support the creation of an XML parser with namespace support. The syntax to create an XML parser with namespace support is:

 resource xml_parser_create_ns(string encoding, string name, string namespace, string delimiter) 

The above code shows that the xml_parser_create_ns() function creates an XML parser that supports the XML namespaces. The function returns the resources handled by the XML document, which are used by other functions of the document. The xml_parser_create_ns() function consists of two optional parameters, name and namespace of the tag. The string delimiter, provided in the xml_parser_create_ns() function, separates the parameters. The default string delimiter is colon .

The xml_parse_free() function in PHP frees the resources handled by an XML parser, and releases the reference of the XML parser. You should include the function before the PHP script ends. If you do not use the xml_parse_free() function for releasing the resources, either the connection with the Web server closes or the segmentation fault error is displayed. The syntax to release the reference of an XML parser is:

 bool xml_parser_free(resource parser) 

In the above code:

  • The parser parameter specifies the reference of the XML parser to be released.

  • The function returns the Boolean values, true or false. The true value indicates the successful execution of the function, which means that the function frees the reference of an XML parser. The false value indicates the unsuccessful execution of the function, if the provided parameter, parser, is invalid.

Working with Well-formed XML Documents

An XML document starts with the XML declaration statement, which is called Processing Instruction (PI). PI specifies the encoding scheme to process an XML document. The code to use the PI statement in an XML document is:

 <?xml version="1.0" encoding="UTF-8"?> 

The above code shows the XML version that you need to use in an XML document. The encoding attribute indicates the encoding scheme that you use for creating the XML document. You use the UTF-8 encoding scheme to create Web pages in English. The UTF-8 encoding scheme uses 8 bits, which are compatible with ASCII, to represent a character. You use the UTF-16 encoding scheme when an application uses languages other than English, such as Japanese, Katana, and Cyrillic. UTF-16 uses 16 bits to represent a character.

A well-formed XML document meets the standards and rules for XML provided by W3C. Rules for creating a well-formed XML document are:

  • Ensuring that every element in the XML document must have a start and an end tag. For example you need to provide the starting and the ending tags of an element, as shown:

     <LI>First</LI> <LI>Second</LI> 
  • Closing an empty tag with a slash mark (/). Empty tags do not have closing tags because they do not contain data. The <IMG> and <BR> tags are examples of empty tags in XML, as shown:

     <IMG src="image.gif" /> <BR /> 

The above code shows that the <IMG> and <BR> tags are empty tags that contain the slash mark preceding the closing angle bracket . The <IMG> tag contains the src attribute that indicates the source of the image file. The <BR> tag inserts a new line.

  • Using quotation marks to provide attribute values, as shown:

     <P align="right"> 

The above code shows that the <P> tag contains the align attribute with the value right. The value of the align attribute is enclosed within quotation marks.

  • Closing the innermost tag before the outermost tag. For example,

     <NAME> Michael <AGE> 25 </AGE></NAME> 

The above code shows that the innermost tag, <AGE> is closed before the outermost tag, <NAME>.

  • Matching the case of the starting and the ending tags. XML tags are case-sensitive. The mismatch of the cases of the starting and ending tags generates an error, as shown:

     <NAME> Michael </name> 

The above code generates an error because the cases of the starting and ending tags are mismatched. The correct code to use the tags is:

 <NAME> Michael </name> 

The above code shows that both the starting and ending tags are in the uppercase. You can also specify the starting and ending tags in lowercase.

  • Including all other elements within the root element. Listing 2-1 shows that the root element contains all other elements of an XML document:

Listing 2-1: Using the Root Element of an XML Document
start example
 <EMPLOYEEDETAIL> <EMPLOYEE> <NAME>Michael</NAME> <AGE>25</AGE> <ADDRESS>New York</ADDRESS> </EMPLOYEE> <EMPLOYEE> <NAME>John</NAME> <AGE>26</AGE> <ADDRESS>New Jersey</ADDRESS> </EMPLOYEE> </EMPLOYEEDETAIL> 
end example
 

The above listing shows that <EMPLOYEEDETAIL> is the root element, which contains other elements of the XML document. The <EMPLOYEEDETAIL> element contains two <EMPLOYEE> elements, which also contain other elements, <NAME>, <AGE>, and <ADDRESS>.

  • Providing unique names to the attributes of the elements. You cannot create two attributes with the same name. For example:

     <TITLE heading="Employee Details" heading="Employee Information"> 

The above code shows that the <TITLE> element contains two attributes with the same name, heading. As a result, the <TITLE> element is not valid in a well-formed XML document.

A document is well-formed if it adheres to all the above rules.

Listing 2-2 shows a well-formed XML document:

Listing 2-2: Creating Well-Formed XML Document
start example
 <?xml version="1.0" encoding="UTF-8"?> <STUDENTDETAIL> <STUDENT> <NAME ID="S001">George</NAME> <AGE>15</AGE> <ADDRESS>New York</ADDRESS> <STANDARD>10</STANDARD> </STUDENT> <STUDENT> <NAME ID="S001">John</NAME> <AGE>15</AGE> <ADDRESS>New York</ADDRESS> <STANDARD>10</STANDARD> </STUDENT> </STUDENTDETAIL> 
end example
 

In the above listing:

  • The XML document is well-formed because it adheres to all the rules of a well-formed document.

  • The XML document contains the processing instruction at the starting of the document.

  • The root element, <STUDENTDETAIL>, contains other elements.

  • All tags are closed and are written in the same case.

  • The value of the ID attribute of the <NAME> tag is enclosed within quotation marks.




Integrating PHP and XML 2004
Integrating PHP and XML 2004
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 51

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net