Flylib.com

Books Software

 
 
 

How Is XML Structured?

I l @ ve RuBoard

How Is XML Structured?

When you define XML languages and create XML documents, you must follow a few rules. DTD documents help you keep your XML documents consistent, but a few more rules must be followed.

Always Remember the Header

In all the examples, you might have noticed the special XML header:

<?xml version="1.0" encoding="UTF-8"?>

This is always required. It tells the parser what version of XML and what encoding you are using so that the parser can display the XML document correctly. If you don't include the header, the parser returns an error. XML encoding is required in XML documents. XML supports two types ”UTF-8 and UTF-16. When you set XML encoding, you tell the XML parser what encoding type to use to encode the XML document into a format the computer processor will understand. (A computer processor understands only numbers , so you use encoding to state how all the characters in the XML relate to their numeric counterparts.) Encoding differs depending on what encoding type you set. In other words, the letter-to-number encoding differs depending on whether you use UTF-8 or UTF-16.

Always Close Nodes

XML languages always need a close tag. Unlike, say, HTML, which can have single tags as well as open and close tags, XML tags always require an open tag and a close tag. For example, our XML language has the NAME tag, which we open with <NAME> and close with </NAME> .

Note

If XML doesn't have tags within tags, do you always need a close tag? In theory, yes, but you can use a shorthand syntax. For example, in HTML, the <IMG> tag has no tags within it. However, the XHTML version of the tag still needs a close tag. Thus, you can use <IMG/> , a shorthand form of a close tag. Also note that XML is case-sensitive, so the correct syntax would be <img/> . Furthermore, some browsers require a space before the ending slash, or they won't recognize the tag, so it's always good form to write it as <img /> .


Never Cross Nodes

XML tags must never cross one another. For example, the following would be invalid:

<?xml version="1.0" encoding="UTF-8"?> 
<PEOPLE> 
     <PERSON> 
          <NAME>Andrew</PERSON></NAME> 
</PEOPLE>

Here, the PEOPLE node is closed before the NAME node is closed. In theory, you can get away with this without the parser's returning an error, but you will get unexpected results when you try to work with the XML document. You can enforce the rule about not crossing nodes within the DTD document ” another reason why it is important to have one.

I l @ ve RuBoard
I l @ ve RuBoard

Using PHP with XML

Now that you have an idea of what XML is and how it is made up, you can start using it with PHP. The place to start is with Expat.

PHP and Expat

As I explained earlier, PHP utilizes the Expat library to make use of its native XML support. Expat is an open -source XML parser that is used in several applications; the PHP XML extension is just one. You can find out more about Expat at http://www.jclark.com/xml/.

XML support is built into PHP, so you can use it without modifying your php.ini file. To start, load and display your XML file:

<?php 

//path to XML file 
$file_path = "C:\Book\Code\Chapter8_XML\XML\"; 

//XML file 
$xml_file = $file_path . "people.xml"; 

//display the contents of the XML file 
function display($xmlp, $data) {

     Print($data); 

} 

//create the XML parser 
$xmlp = xml_parser_create(); 

//set what function to call when you call the xml_parse method 
xml_set_character_data_handler($xmlp, 'display'); 

//open the XML file as read-only 
$file = fopen($xml_file, 'r') or die('cannot open xml file'); 

//loop through XML file contents 
while($data = fread($file, filesize($xml_file))) {

     //call xml_parse method to read XML file 
     xml_parse($xmlp, $data, feof($file)) or die ('xml error'); 

} 

//free XML parser from memory 
xml_parser_free($xmlp); 

?>

First, you define the location and filename of the XML file you want to read:

//path to XML file 
$file_path = "C:\Book\Code\Chapter8_XML\XML\"; 

//XML file 
$xml_file = $file_path . "people.xml";

Next, you define a function that you use to display the contents of the XML document when character data is encountered :

//display the contents of the XML file 
function display($xmlp, $data) {

     Print($data); 

}

Then you create an XML parser instance and return a handle to it:

$xmlp = xml_parser_create();

Then you specify how to handle data from the XML document using the xml_set_character_data_handler function:

xml_set_character_data_handler($xmlp, 'display');

Here you supply the name of the function to be called by the processor (such as display ) when character data is encountered in the parsing of the XML document.

Next you open the XML file and cycle through its contents:

//open the XML file as read-only 
$file = fopen($xml_file, 'r') or die('cannot open xml file'); 

//loop through XML file contents 
while($data = fread($file, filesize($xml_file))) {

Next you call the xml_parse function of the XML extension to parse through the XML document:

xml_parse($xmlp, $data, feof($file)) or die ('xml error');

Here the xml_parse function passes the contents of the XML file to the xml_set_character_data_handler function. If the xml_set_character_data_handler function encounters character data, it looks to see what function it has available for displaying data (which you set up when you set the xml_set_character_data_handler function earlier in the code). The display function then displays that character data, as shown in Figure 8.3.

Figure 8.3. The XML file displayed using the PHP XML extension.

graphics/08fig03.gif

If your XML file contains a lot of data, the xml_parse function lets you read the file in chunks . To do this, all you need to do is modify the fread function as follows :

while($data = fread($file, 4000)) {

     //call xml_parse method to read XML file 
     xml_parse($xmlp, $data, feof($file)) or die ('xml error'); 

}

Here the XML document is read in 4000-byte chunks rather than the whole document (as you did previously using the filesize function to pass the entire XML document to the fread function).

PHP and MSXML

PHP also lets you work with another XML parser: the Microsoft XML (MSXML) parser. Using the MSXML parser differs slightly from using the Expat parser in that the MSXML parser exists as a COM object as opposed to a native library like Expat. When using PHP with MSXML, you will encounter some issues, as shown here:

<% 

'create MSXML parser 
set source = Server.CreateObject("Microsoft.XMLDOM") 

source.async = False 

'load XML document into parser 
source.load("http://localhost/phpbook/Chapter8_XML/XML/people.xml") 

'node to find 
set tagtofind = source.getElementsByTagName("NAME") 

'how many tags we have 
taglength = tagtofind.length 
'iterate through nodes 
For i = 0 To taglength - 1 

'display data within tags 
Response.write(tagtofind(i).Text & "<BR>") 

Next 

'free MSXML parser from memory 
set source = nothing 
%>

Here, using ASP, you use the MSXML parser to load the XML document and look up the NAME node. This creates an array holding a COM reference to each NAME node. You then iterate through the array, displaying the value of each NAME node in the array. Using PHP, this looks like the following:

<?php 

//create an MSXML parser as a COM object 
$source = new COM("Microsoft.XMLDOM"); 

$source->async = false; 

//load an XML document into the parser 
$source->load("http://localhost/Book/Code/Chapter8_XML/XML/people.xml"); 

//tag to find 
$nodecoll = $source->getElementsByTagName("NAME"); 

//how many tags we have 
$numnodes = $nodecoll->length; 

//iterate through tags 
for ($i = 0; $i < $numnodes; $i++) {
    $item =& $nodecoll->item($i); 
    print "Found: {$item->xml}\n"; 
} 

?>

First, you load the MSXML parser COM object:

$source = new COM("Microsoft.XMLDOM");

Next you load the XML document into the MSXML parser:

$source->load("http://localhost/phpbook/Chapter8_XML/XML/people.xml");

Next you look up the tag you want to find:

$tagtofind = $source->getElementsByTagName("NAME");

Next you see how many tags you have:

$numnodes = $nodecoll->length;

You then iterate through each of the tags:

for ($i = 0; $i < $numnodes; $i++) {

Next you display data within each of the tags. PHP does not support anonymous object dereferencing when using COM, so the syntax is as follows:

$item =& $nodecoll->item($i); 
print "Found: {$item->xml}\n";

Although it's easy enough to use the MSXML parser directly in your code, wrapping the code directly into a COM object gives you additional speed and modularization .

<?php /* -*- mode: c++; minor-mode: font -*- */ 

// Create an MSXML parser as a COM object. 
$source = new COM("Microsoft.XMLDOM"); 

$source->async = false; 

// Load an XML document into the parser. 
$source->load("http://localhost/Book/Code/Chapter8_XML/XML/people.xml"); 

// Search for any NAME tags. 
$nodecoll = $source->getElementsByTagName("NAME"); 

// Number of matching nodes in the document. 
$numnodes = $nodecoll->length; 

echo "Number of matching tags: $numnodes.\n"; 

// Iterate over the collection of found nodes. 
for ($i = 0; $i < $numnodes; $i++) {
    $item =& $nodecoll->item($i); 
    echo "Found: {$item->xml}\n"; 
} 

?>
Creating the MSXML Wrapper COM Object

Your COM object lets you do several things. It lets you pass the XML file you want to query, along with the node you want to look up, and returns an array holding all the data with that node. If you start Visual Basic and create a new ActiveX DLL project, you can add the following code:

Public Function DisplayXML(ByRef xmlfile As Variant, ByRef TagName As 
Variant) 

'array to hold tags 
Dim Tags(10) 

'load MSXML COM object 
Dim source As DOMDocument 
Set source = New DOMDocument 

source.async = False 

'load XML file 
source.Load (xmlfile) 

'look up node 
Set TagsToLookup = source.getElementsByTagName(TagName) 

'look up number of nodes 
TagsAmount = TagsToLookup.length 

'iterate through nodes 
For i = 0 To TagsAmount - 1 

    'copy tag contents to array 
    Tags(i) = TagsToLookup(i).Text 

Next 

'return array 
DisplayXML = Tags 

'unload MSXML 
Set source = Nothing 

End Function

I set my project name to Chapter8 and class name to XML , but you can use anything you want. Before you compile the DLL, remember to add a reference to the MSXML COM object in your Visual Basic project references, as shown in Figure 8.4.

Figure 8.4. Adding the MSXML COM object to your Visual Basic project references.

graphics/08fig04.gif

The code for the COM object works much like it did before. First, you define the array that will hold the data of the tag you are looking up:

Dim Tags(10)

Visual Basic requires you to set the array size. My array size is quite small. Remember to set it large enough to accommodate all the nodes in your XML document.

Next you load the MSXML parser COM object:

Dim source As DOMDocument 
Set source = New DOMDocument

Next you load the XML document into the MSXML parser. Note that you reference the parameter (that is, the path to the XML document) that you pass to your COM object:

source.Load (xmlfile)

Next you look up the node you want to find in the XML document. Note that you reference the parameter (that is, the node to find) that you pass to your COM object:

Set TagsToLookup = source.getElementsByTagName(TagName)

Next you see how many nodes occur within the XML document:

TagsAmount = TagsToLookup.length

Next you iterate through each of these nodes:

For i = 0 To TagsAmount - 1

As you iterate through the nodes, you populate your array with the data in the nodes:

Tags(i) = TagsToLookup(i).Text

Finally, after your array is populated , you use it as a return type for your COM object:

DisplayXML = Tags
Using the MSXML Feeder COM Object in PHP

After you have built and compiled the COM object, you can use it in PHP as follows:

<?php 

$source = new COM("chapter8.xml"); 
$xmlnode = $source->DisplayXML("http://localhost/phpbook/Chapter8 _XML/XML/people.xml",
graphics/ccc.gif
"NAME"); 

for ($i = 0; $i <= count($xmlnode); $i++) 
{
     if($xmlnode[$i]) {
     print($xmlnode[$i] . "<BR>"); 
     } 

} 

?>

First, you load your COM object:

$source = new COM("chapter8.xml");

Next, to obtain an array of results, you call the DisplayXML function, remembering to pass the location of your XML file and the node to find:

$xmlnode = $source->DisplayXML("http://localhost/phpbook/Chapter8 _XML/XML/people.xml",
graphics/ccc.gif
"NAME");

Next you iterate through the array. Note that I have used the PHP count function to test how many elements are in the array:

for ($i = 0; $i <= count($xmlnode); $i++)

The array might be larger than the number of nodes in your XML document, so some elements in the array might be empty. You avoid displaying these elements as follows:

if($xmlnode[$i] <> "") {

To finally display data within your array, you use the following:

print($xmlnode[$i] . "<BR>");

If you run this script, you should see data from your NAMES nodes displayed, as shown in Figure 8.5.

Figure 8.5. The data from the NAMES node displayed using PHP and your MSXML feeder COM object.

graphics/08fig05.gif

I l @ ve RuBoard