PHP's core XML functions are built on expat , a seminal open -source XML parser published by Jim Clark. It is an event-driven parser, and it does an excellent job within its limited scope. It can unwind the syntax of an XML string, identifying each of its components and elaborating its external references. The parser does not validate XML, nor does it maintain any sense of state we would find useful. We can use one of two distinct approaches to XML parsing. In Flash ActionScript we were able to depend on a DOM-based parser (sometimes called a tree parser ). When this kind of parser is launched, it reads in the whole XML stream, parses the string entirely, and returns a single data object. The returned structure is a tree of well-defined node objects that precisely replicates the content and relationships of the XML formatted data. This static data structure is revealed all at once to the application. Another DOM-based parser is the one built into Internet Explorer. It parsed our XML data island with no effort on our part, and we could access the results of the parser in a fully realized structure. By contrast, an event-driven parser is continuously reacting to the different bits of data it encounters as the stream of XML runs through it (Figure 15.1). Unlike the DOM-based parser, this parser is in tight communication with the application throughout the parsing process. The parser and the application share the parsing process, with the parser identifying the points in time when something needs to be done and the application actually doing the something. Figure 15.1. Event-Driven vs. DOM-Based Parsers
Event-based parsers have the following advantages.
The advantages of DOM-based parsers follow.
It is worth noting that a tree parser can be built on an event-driven parser. (In fact, it is hard to imagine building a tree parser without underlying methods found in an event-driven parser.) PHP, in fact, will soon have a reliable tree parser written on top of expat. Several candidates are emerging from the clutter of complications and bugs with a reasonably full feature set. Of the three or four serious efforts, the most advanced is the DOMXML package, though it is considered difficult to install, incomplete, and incompatible with the DOM standard. Consider also the tiny but serviceable extension by Sebastian Will (Sol Folango) as a reliable alternative to a more complex parser. Parsing with expatUsers of the expat library are required to write their own functions to handle the various parts of an XML document. These functions receive a set of parameters that are established in the expat spec. Special installers are called to set the event handler for the eight events that expat recognizes:
ActionScript RelevanceMost of the handlers for these events are unlikely to be used in interpreting a message sent from ActionScript. The start and end of elements and the arrival of character data are probably the only events of interest in this context. In particular (as we noted), the processing instruction is language specific and cannot be used to send ActionScript to the client. In theory, it could be used by Flash to send instructions to execute on the server, but this is a very indirect way to get things done. More important, such a technique throws the front door wide open to the most pernicious security attacks. The three events that process entities and notations are of little use in Flash development. They are far more sophisticated than is required for the simple client-server messaging generally employed. These three are guaranteed to be meaningful only on a validating parser. Although the PHP parser is not validating, it can be made to resolve external entity invocations (including a DTD). It can do so, however, only after you write handlers that open new files and recursively invoke the parser. With enough effort on your part, the PHP parser will support these three XML features. The ActionScript parser will not, no matter what you do. It is a nonvalidating parser and a fairly minimal one. So implementing external entities will be meaningful only on the server side. It is possible for a client and server to traffic in XML objects that only one of them can read. ActionScript can be forced to generate fully valid XML code (just as it could be forced to generate meaningful LISP instructions). Generally this means stupidly echoing static strings that the developer has prepared. The purpose of XML is not served by parroting code that is understood only on one side. It is meant as a pathway in which data objects can be passed from one process to another while remaining lucid to both. Real-World RelevanceThe XML features ignored by ActionScript are not meaningless for the Flash developer. For example, in developing our game, we might choose to publish player profiles. A profile might include a photograph. Our XML document must refer to external (non-XML) files. The proper XML route is to develop notation for GIF89 and JPEG file formats and create unparsed entity references that point to photographs on a server. With considerable pain, we can do this in ActionScript. We can write functions that clumsily extend its XML generator (not its parser). They could compose unparsed entity code and inject it into the XML string. This correct XML can be quite useful. XSL transforms could be written to prettily embed the photographs in HTML player profile pages. PHP code could be created to manage the database. Even better, if a native XML database server is being used, it will hum happily when this data arrives. The unparsed entity and the notation were specifically created for this sort of use. But they are at once delicate and clumsy. The specification, unlike the sparse elegance of most XML, is sprawling and nearly ambiguous, and it is devoid of any sense of design. Proper XML photograph links require us to wade through half a dozen features that appear nowhere else in XML (such as the PublicID). The result is an XML object that would be meaningful only to the most sophisticated parsers. In particular, it would be unreadable by the Flash code that first wrote it. If the player profiles were to be eventually displayed by Flash, these exquisite XML objects would be useless. NOTE An increasingly popular alternative to this method of referring to external resources is XLINK. It provides a set of standardized link elements modeled after (and greatly extending) the HTML <A/> tag . Like the classic hypertext markup, XLINK relies on these defined elements and especially attributes (like href ) to specify linkages, in contrast to strict XML, whose system of external entities uses a variety of features but not standard elements and attributes. ActionScript will process an XLINK document, but it will choke on strict XML linkages. In the following example, ActionScript ignores the namespace reference on the second line and imperfectly resolves the use of that namespace. Rather than interpret xlink:href properly as " href as defined by the owner of www.w3.org/1999/xlink", Flash just interprets it as "some attribute called xlink:href ". In practice, this makes very little difference. <?xml version="1.0" encoding="UTF-8"?> <player xmlns:xlink="http://www.w3.org/1999/xlink"> <name> Jimmy Commender </name> <home> Staten Island, NY </home> <photo xlink:type="simple" xlink:show="onLoad" xlink:href="hotcop.jpg" width="100" height="200"/> <score>42300 </score> <favorite-url xlink:type="simple" xlink:href="http://www.commender.com/~jimmy"> My home page </favorite-url> <favorite-url xlink:type="simple" xlink:href="http://www.bigfun.net"> My favorite game site </favorite-url> </player> This is a far more appropriate mechanism for external file references than the complexity of unparsed entities. The simple XLINK mechanism fits the simple ActionScript parser and the simple PHP implementation we prefer. Since it uses only the element mechanism, not notation and unparsed entities, XLINK information passes easily through even the simplest XML parsers. It is easily preserved and easily repeated, even by parsers that are ignorant of XLINKs and incapable of using the data. If we choose to create our own ActionScript linkage functions, the choice of XLINKs as a basis jumps us forward with a powerful and clearly specified design, a simple XML format and a community of support that offers tools, content, and much greater return on our development effort. Relevance DecisionLet us limit the ambition of our PHP parsing : We want to simply match the capabilities of the ActionScript parser. Our little sigh of regret for the skipped features is overwhelmed by a big sigh of relief for the skipped work. We need to handle only element tags and text. If we ignore the exotica, we can concentrate on doing the elements correctly ”on managing their context and their contents and their attribute bundles as we build a document tree. |