1.5 Beyond Basic XML-Other Standards | FileMaker Pro 6 Developers Guide to XML/XSL (Wordware Library for FileMaker)

1.5 Beyond Basic XML—Other Standards

So far we have studied well-formed and valid documents containing data and other elements. XML is a language that allows other standards to be built upon it. Included in the list of additions to the XML family is XSL (XML Stylesheet Language). You will read more about XSL and how it can be used to transform XML data into neatly formatted output in Chapter 7.

The World Wide Web Consortium has also recommended additional standards for interconnecting documents and addressing precise locations within XML documents. Among these other XML standards are XPointer and XPath, which extend XML. This section gives an overview of each of these and the URI (Uniform Resource Identifier) standard for identifying and locating resources used by XML documents. These recommendations have been grouped together here, as they often work together. However, they can also work independently.

Keep in mind that this section is a very basic overview to help you understand these additions to XML, parsing of XML with FileMaker Pro, and how these standards work with XML and FileMaker Pro. Remember, too, that the specifications and recommendations may change, although it is unlikely that these changes will affect the current technology. The changes may enhance the current specifications just as XPath and XPointer have added to the functionality of XML. You may consult the World Wide Web Consortium for the latest information, http://www.w3.org/.

1.51 URI, URL, and URN (The Uniform Resource Standards)

Uniform Resource Identifiers (URIs) encompass all references to web files: text, images, mailboxes, and other resources. URIs include URLs (Uniform Resource Locators): ftp, gopher, http, mailto, file, news, https, and telnet, common protocols for accessing information on the Internet. Some examples of these are found in Listing 1.18. Remember that the World Wide Web is only a part of the Internet. URIs may be used in XPaths and XPointers if they refer to an address on the Internet.

Another URI type is the URN (Uniform Resource Name). The URN has globally persistent significance; only the name of the resource need be known, not the location of it as in the URL. The Uniform Resource Name can be associated with Uniform Resource Characteristics (URC), which allows descriptive information to be associated with a URN. A URN can also have a URL. A more complete URL is found in Listing 1.17.

Listing 1.17: URL with more information

 <link href="http:anyserver/documents/myPaper.txt">       <author>Me!</author>       <date>03 JAN 1999</date>       <revised>05 FEB 1999</revised>       <title>My Important Paper</title> </link>

Uniform Resource Identifiers can be absolute or relative. Relative paths assume the current document location, and every link from there builds upon the path. A document can have a BASE path specified at the beginning of the document.

Warning

While the password may be included in a URI, it is not advisable, as it may be a security risk. The URI format is:

 protocol user : password @ host : port / path document ? query # fragment

Listing 1.18: Example URIs

 http://www.mydomain.com/mypage.html ftp://username:password@server.domin.org/ file:///myDesktop/Documents/fmpxmllayout_dtd.txt urn:here://iris mailto:me@mydomain.com?subject=Inquiry%20About%20Your%20Site ftp://anonymous@server.domain.net:591/index/images/downloads/ telnet://myServer.edu/ http://myDomain.com/fmpro?-db=myDatabase&-lay=web&-format=-fmp_xml&-findall news:comp.databases.filemaker https://secureServer.net/thisLink.html#sectionThree

The Request For Comment (RFC) document number 2396 was written to specify the standards for Uniform Resource Identifiers. This document, "Uniform Resource Identifiers (URI): Generic Syntax", can be found at http://www.ietf.org/rfc/rfc2396.txt. Notable are the standards for naming these URIs. You should read this list of standards for naming.

Suggestions for naming URIs include using the alphanumeric characters: a-z, A-Z, and 0-9. Any character not within these ranges can be escaped or translated to an octet sequence consisting of "%" and the hexadecimal representation of the character. This means that the space character is often encoded as "%20" in a URL so that it may pass safely as a valid URI. There are other characters used to format a URL that are reserved to specify the format of the URL. These are: ";", "/", ":", "#", "%", "@", "&", "=", "+", "$", and ",". There are also unreserved characters that may be used for specific purposes: "-", "_", ".", "!", "∼", "'", "(", and ")". Characters listed as unwise to use include: "{", "}", "|", "\", "ˇ", "[", "]", and "‘". If you stick with the alphanumeric characters for your own naming standards, you are less likely to disrupt any usage for the URI itself.

Mailto Is a Special URL

Another document, "RFC 2368, The mailto URL scheme", http://www.ietf.org/rfc/rfc2368.txt, gives us more specifics for the mailto protocol. This particular URI is often used to send email and can easily be created from calculations in a FileMaker Pro field. The most basic form of this URI is mailto:yourEmail@yourDomain.com. It simply provides the protocol (mailto) and the Internet address. To send the same message to multiple people, you may list them all after the protocol as comma-separated values. An example mailto format is shown here:

 mailto:joe@hisDomain.com,betty@herDomain.net?body=This%20is%20a%20short% 20message.

The body of the message can be included in a mailto URI, but since the URI cannot contain spaces (or other reserved characters), these are converted. The body attribute was never intended to include a very large message. Some email cannot be sent without a subject, so that also can be included in the URI. The subject must also be converted or encoded. The space character is %20. Additional attributes are separated with the "&", so if your subject or message body contain this character, change it to "&". The "from" is implied by the email application sending the message. The mailto protocol is often used on web pages as a hyperlink. You can use double or single quotes for the link, but do not include these within the URI.

Mailto as a link:

 <a href="mailto:Joe_Brown@eddress.org?subject=Call%20Me!&body=I&apos;   ll%20be%20at%20home%20today%20&amp;%20tomorrow." >call me</a>

The link, as it appears in an email client:

 to: Joe_Brown&eddress.org from: me@myDomain.com subject: Call Me! I'll be at home today & tomorrow.

You can create this link by calculation and use the OpenURL script step in FileMaker Pro to "send" the message. It actually opens your email client if one is mapped as the default and pastes these fields into the proper location of the new email. In the process of pasting into the proper locations, any encoding is converted back. In reality, your email client may be retaining these for sending and receiving, but you do not see them. The message must still be sent by you; it may only be placed in your "outbox" by FileMaker Pro. Using the Web Companion external function Web-ToHTTP is a convenient way to convert errant characters that might need it.

The calculation:

 SendMessage = "mailto:" & ToField & "?" & External("Web-ToHTTP", subjectField) & "&" & External("Web-ToHTTP", bodyField)

The script step:

 OpenURL [ no dialog, SendMessage ]

FileMaker Pro Help will help you use the OpenURL script step correctly for each platform. If you use OpenURL to send email, it will use whatever your default email client is in the URL.DLL for Windows. On a Macintosh, the Internet Config settings will determine which email client will send the message. On Macintosh OS X, the Send Mail script step with mail.app is not supported in the first release of FileMaker Pro for OS X. Also, remember that some browsers do not process the mailto protocol properly. Several FileMaker Pro plug-ins may be used in conjunction with web-published databases for sending and receiving email.

1.52 XPath

XML Path Language (XPath), http://www.w3.org/TR/xpath, is a language for addressing parts of an XML document and is used by XPointer and XSLT (Extensible Stylesheet Language Transformations). XPath expressions often occur in attributes of elements of XML documents. XPath uses the tree-like structure of an XML document and acts upon the branches or nodes. The nodes are not merely the elements of the document, but also include the comments, processing instructions, attribute nodes, and text nodes. The human family tree has aunts, uncles, cousins, grandparents, sisters, brothers, parents, sons, and daughters. XPath uses similar designators for the branches of the XML tree. All of the branches of the tree (axes) are related to each other. We'll look again at the people.xml example, shown in Listing 1.19, to understand the XPath language.

Listing 1.19: people.xml

 <people>       <vendor>             <firstname>John</firstname>             <company>Paper Cutters</company>       </vendor>       <customer>             <firstname>Jane</firstname>             <lastname>Doe</lastname>       </customer>       <customer>             <firstname>John</firstname>             <lastname>Doe</lastname>       </customer> </people>

The child:: is a direct node from any location or the successor of a particular location source. The child node is also the default and can often be omitted from an XPath.

 <anyNode>       <child>       </child> </anyNode>

In the people.xml example, the children of people are vendor and customer. There are multiple customer children. There could also be multiple vendor children. The element firstname occurs as a child of vendor or customer; however, company is only a child of vendor. Because the child is the default node in the path, you can specify firstname with the XPath format as full or shortcut:

 people/vendor/firstname root::people/child::vendor/child::firstname root::people/child::customer/child::firstname people/customer/firstname

The descendant:: is a sub-part of a node and can be children, grand-children, or other offspring. The descendants of people are vendor, firstname, company, customer, and lastname. An example is shown here:

 <anyNode>       <descendant1>             <descendant3></descendant3>       </descendant1>       <descendant2 /> </anyNode>

The ancestor:: is the super-part of a node, so that the ancestor contains the node. If we use firstname from our example, it has the ancestor's vendor, customer, and people. Not all firstname elements have a vendor or customer ancestor.

 <ancestor>       <anyNode></anyNode> </ancestor>

The attribute:: node is relative to the referenced node and can be selected with the name of the attribute.

 <node attribute="attrName" />

The namespace:: node contains the namespace. More about the namespace will be discussed in Chapter 7 with XSL.

The self:: node is the reference node and another way to specify where you already are, but it may be used in conjunction with ancestor or descendant (ancestor-or-self:: and descendant-or-self::).

XPath expressions (statements) have one or more location steps separated by a slash ("/"). The location steps have one of the above axis items, a node test, and an optional predicate. The node test is used to determine the principal node type. Node types are root, element, text, attribute, namespace, processing instruction, and comment. For the attribute axis, the principal node type is attribute, and for the namespace axis, the principal node type is namespace. For all others, the element is the principal node type. The predicate will filter a node-set with respect to the axis to produce a new node-set. This is the real power of XPath using the syntax shortcuts, functions, and string-values as the predicate to select fragments of an XML document.

Table 1.4: XPath shortcuts
`∗`	Selects all matches. This is similar to the notation in UNIX for all, or the wildcard for zero or more characters in FileMaker Pro's find symbols. Searching people.xml for people/vendor/∗ selects the elements firstname and company. If you searched for ∗/∗/firstname, you would select every firstname element with two ancestors. In our example, this would select all matches for firstname. Should this element be the same path from the root, you could easily extract all firstnames in this document.
/	As the first character in an XPath statement, selects the root or parent of the document. A quick way to navigate back to the root is to use the "/" shortcut. Navigating the XML document starts at this root point. If you happen to end up at vendor/company, for example, and wish to navigate to customer/lastname, you can quickly get back to the root of the document with /customer/lastname because customer is a child of the root element.
//	Selects all elements that match the criteria within and including the current node. This is equivalent to the descendant-or-self::node(). Using our people.xml example again, we can quickly select all firstname elements with //firstname. Regardless of the descendant level for this element, it is selected.
@	Specifies an attribute and is equivalent to attribute::. The example <element attribute="attrName" /> can be written as element/attribute::attrName or element[@attrName].
.	Selects the context node and is equivalent to self::node(). As you address a particular location, it is convenient to include where you are rather than needing to use the full name of the element. For example, if you were at the element customer and wished to get the children of this element, you would use ./firstname and ./lastname. Since the child:: axis can be implied, "./firstname" is the same as "firstname."
..	Selects the parent of the context node and is equivalent to parent::node(). This is similar to UNIX URI paths used to go up a directory, such as <img src="/books/3/422/1/html/2/../images/mypic.gif">. If you are in the /customer/firstname element and want to return to vendor/firstname, you can go back up a level with ../firstname.
[]	Gives the position of the child in a family. child[1] is the first child. These square brackets are also used when a test of the value of the element is needed: parent[child="test"]. We have two children of people called customer. We can navigate to the second occurrence of this child with /customer[2].

XPath String-Values

Each of the nodes has a value returned by the xsl:value-of function. This is the key to getting the content of your XML document. This section explains each node's string value.

The root() node string-value is the concatenation of the string-values of all text node descendants of the root node. If you want the text of the entire document, this will give it to you. Take note that white space will be ignored and you will lose the meaning of the individual elements. One possible benefit of using this value is to search an entire document for a particular value. In our people.xml example, the root is the outermost element, <people>…</people>. The value of the root() is all the text (contents) of all the elements in the document.

The element() node string-value is the concatenation of the string-values of all text node descendants of the element node. The element can have text and other elements, so all text of a particular element is returned here. The value of vendor is John Paper Cutters. The value of customer[1] is Jane Doe.

The attribute() node string-value is the value of the attribute of the parent element. However, the attribute is not a child of the element. If you had an element, <customer preferred="yes">… </customer>, the attribute preferred has the value "yes."

The namespace() node is like the attribute node, as an element can have a namespace. The string-value of the namespace node is the URI or other link specified in the namespace. Namespaces will be discussed more fully in Chapter 7.

The processing instruction() node has the local name of the processing instruction's target. The string-value of the processing instruction node is the part of the processing instruction following the target. A common processing instruction is for an XSL stylesheet. The value of <?xml-stylesheet href="headlines.xsl" type="text/xsl" ?> is the target, headlines.xsl.

The comment() node string-value is the content of the comment not including the surrounding markup (<!– and –>). The comment <!– here is a comment –> has a string-value of "here is a comment."

The text() node contains the character data in the element that is the string-value of the text node. The value of /vendor/firstname/ text() is the same as the value of /vendor/firstname or John.

XPath Functions

There are additional functions as a part of the XPath language. These can extract more precisely the particular text you need. FileMaker Pro has similar text functions such as Left(text, number) or Middle-Words(text, start, number). These additional XPath functions are not discussed here. The standards are changing, and these new functions may not be fully supported by all XML processors at this time. Your particular choice of XML parser may allow you to use the full set of functions. See Chapter 6 for some of these XPath functions.

XPointer Related to XPath

XML Pointer Language (XPointer) is another method of extracting the content of an XML document. Some applications use XPointer or a combination of XPointer and XPath to parse the XML data tree. The notation is different from XPath and uses the locators root(), child(), descendant(), and id().

root() is similar to XPath "/" or the entire document. The paths to the elements are based off the root() with a "." dot notation. For example, root().child().child() would be similar to "/parent/child."

id() is similar to root() but is a specific element's ID attribute. Because the ID of an element is unique for each element in an XML document, it does matter what path the element is on. The XPointer request for "ID(890)" will jump right to that element and return the element and any of its descendants. Listing 1.20 is a small XML document used to explain the XML Pointer Language.

Listing 1.20: Example for XPointer references

 <elements>       <element >xyz</element>       <element  />       <element >             <element >1245</element>       </element> </elements>

The child() node has some parameters that will narrow down which child. The first parameter is a number or "all." The number is the number of the child in the document. "root().child(1).child(3)" is the same as calling "ID(890)" because the third child of the first element of the entire document has the ID attribute of 890. The parameter of "all" will return all elements in a path. "root().child(1).child(all)" returns all elements except the first element.

 child(# or all, NodeName, AttributeName="")

The descendant() node is similar to the child() node, except it can be anywhere as a reference to any element's descendants.

You can read more about XPointer at http://www.w3.org/TR/xptr. This book does not use this language in any of the examples.