Chapter 5. XPath Functions

CONTENTS
  •  5.1 XPath Function Library
  •  5.2 The Node-set Core Function Group
  •  5.3 String Core Function Group
  •  5.4 Boolean Core Function Group
  •  5.5 Number Core Function Group
  • XPath Function

  • boolean(), ceiling(), concat(), contains last(), local (), count(), false(), floor(), id(), lang(),-name(), name(), namespace-uri(), normalize-space(), not(), number(), position(), round(), starts-with(), string(), string-length(), substring(), substring-after(), substring-before(), sum(), translate(), true()

In Chapter 4 the principles underlying XPath were introduced, but the full value of XPath is limited without functions. Functions provide a "programming" aspect to XPath. Where XPath location paths and patterns are navigational tools, functions are like little programs, they do something. For example, the document() function "goes and gets" another XML instance, similar to an import macro in a word processor.

In the XPath specification, functions are declared with a function prototype, which contains a function name, a function return type, and an argument set. Actually using functions is quite simple, though. Since the processing software maintains a library of functions, you only need a function call to invoke them.

This chapter will describe the functions in the XPath function library,[1] which are divided into groups based on the four object types: node-set, string, Boolean, and number. We will begin by introducing the specific principles and terminology for understanding and explaining functions.

Note that most of the examples in this chapter include the basic XPath expression, followed by an XSLT example that can be used directly in a stylesheet to get a result. XPath expressions do not of themselves return a value in a stylesheet, they must be used within the context of an XSLT attribute, such as match or select. The XSLT element <xsl:value-of> is a good test of the XPath expression and is used in most cases to demonstrate extracting the value of the expression.

5.1 XPath Function Library

The function library for XPath includes 27 functions, called Core Functions, which are divided into four subcategories: node-set, Boolean, string, and number. The four kinds of functions within the function library are categorized not necessarily by the object they yield, but also by the object on which they act. The object on which they act is called an argument.

In the following section, each function will be discussed individually within the core function group subcategory based upon either the kind of object yielded or the kind received (i.e., "acted upon") as an argument for that function.

Functions sometimes will yield different kinds of objects than the core function type to which they belong. It is important to realize, for instance, that just because a function such as substring() is included in the string core function group, it does not mean that it cannot accept number arguments within its () as a parameter, or argument value. The following function table (Table 5-1) can be used to determine both the kind of object acted upon and the kind of object yielded for the given function. The reference number in the last column provides a quick lookup for the section of this chapter specifically applicable to a given function.

Table 5-1. XPath functions
Function Group Core Function Group Returns Arguments Argument Type Ref
boolean() Boolean Boolean Object required 5.4.1
ceiling() Number Number Number required 5.5.4
concat() String String

String

String

String

required

required

optional

5.3.3
contains() String Boolean

String

String

required

required

5.3.9
count() Node-set Number Node-set required 5.2.7
false() Boolean Boolean None none 5.4.3
floor() Number Number Number required 5.5.5
id() Node-set Node-set Object required 5.2.1
lang() Boolean Boolean String required 5.4.5
last() Node-set Number None none 5.2.5
local-name() Node-set String Node-set optional 5.2.2
name() Node-set String Node-set optional 5.2.3
namespace-uri() String String Node-set optional 5.2.4
normalize-space() String String String optional 5.3.7
not() Boolean Boolean Boolean required 5.4.6
number() Number Number Object optional 5.5.1
position() Node-set Number None none 5.2.6
round() Number Number Number required 5.5.6
starts-with() String Boolean

String

String

required

required

5.3.10
string() String String Object optional 5.3.1
string-length() String Number String optional 5.3.11
substring() String String

String

Number

Number

required

required

optional

5.3.4
substring-after() String String

String

String

required

required

5.3.5
substring-before() String String

String

String

required

required

5.3.6
sum() Number Number Node-set required 5.5.3
translate() String String

String

String

String

required

required

required

5.3.8
true() Boolean Boolean None none 5.4.4

5.1.1 XPath Function Library Terminology

There are two general categories of functions, core functions and extension functions. In this chapter, we will work with the XPath core functions, which are those functions required for implementation in any XSLT processor conforming to the XPath specification. Extension functions are those functions which have been added to the core set specific to a given software processor implementation of XSLT and XPath. Extension functions are discussed in Chapters 12 and 13.

Before presenting the functions themselves, we will discuss the terminology for the parts that make up a function.

5.1.1.1 Function Prototype

The XPath specification presents each function in a structure called a function prototype, which specifies the function return type (the specification's way of saying what kind of object node-set, string, Boolean, or number it returns), the name of the function, and the type of the arguments (if any) that are either required or optional for the function. The function prototype is not used directly in an expression, it is only the means by which functions are declared. It is possible for functions other that those in the core function library to be declared, but they must follow the same function prototype. XPath functions are declared in the specification, using the function prototype, in the following structure:

Function: return type name(arguments) 

For example:

Function: number sum(node-set) 

In this example, the key word Function: states that a function is being declared; number is the function return type, sum is the function name, and (node-set) is the kind of argument that the function will accept.

5.1.1.2 Function Return Types

There are four possible function return types, corresponding to the four possible objects that an XPath expression can yield. A function return type is the category of object that a given function yields, or returns, when evaluated. Evaluation is what happens when the XSLT stylesheet containing the XPath expression is run through a conforming XSLT processor. It is concurrent with the expected mathematical meaning of "evaluate/evaluation."

The node-set function return type describes those functions which return node-sets; the string function return type describes the functions which return strings, the boolean function return type returns a Boolean value (either true or false), and the number function return type returns a number.

evaluate: To work out the value of, to "reckon up," ascertain the amount of, to express in terms of something already known.

evaluation: The action of appraising statement of value determining estimating. (OED, 2000, V: 447)

Function return type values may vary from function to function and are not necessarily tied to the type of function. It is the kind of object that is returned which determines the function return type for any given function. The function return type does not always determine the core function group to which the function belongs.

5.1.1.3 Function Names

The function name is the name of the function as it is declared in the function prototype, and as it actually occurs in the XPath expression where the function is used. For example, in the expression predicate [position() = 1], position is the function name. It only becomes a function a part of an expression that does something when the argument space ()is added to it: position().

5.1.1.4 Function Arguments

Function arguments[2] are the objects in the function space that are evaluated by the processor to return a value. Whether implicitly or explicitly stated, the argument the object being evaluated is simply what will be acted upon by the function.

argument: A quantity from which another required quantity may be deduced, on which [the] calculation depends. (OED, 2000, I: 625)

Explicit function arguments are not necessarily required for all functions. For instance, position() maintains the argument space the () for an argument, but never is there anything placed within the parentheses. All functions will have this syntactic argument space represented by the parentheses, regardless of whether the function uses the arguments. When an argument is not explicitly stated, the object that will be evaluated by the processor is always the current node.

The argument is not limited to a numerical value; it can be any of the possible object types. For example, with the node-set function count(), which has a function return type of number, the argument is that of a node-set. In other words, the argument for count() simply specifies that the type of object it is expecting is a node-set. The value of that node-set will be expressed as a number the number of nodes. In this way, a function can effectively "convert" an object of one type to another.

A function can have optional arguments, denoted by a ? in the function's prototype, or optional and repeatable arguments, denoted by a *. When neither a ? or a * denotation follows the declared object type for an argument, then that argument is required.

Failure to provide required arguments causes the function to fail. Further evaluation of the given XPath expression containing the function, and the XSLT template within which the expression occurs, will also fail or return a null empty result. Syntactically, when more than one argument is furnished, each must be separated by a comma and a space. If strings of text are used as arguments, they must be enclosed in quote marks with the required comma separator outside those marks, as in the following example:

[function name]("some text", */my_element,@my_attribute) 

In addition to the four basic argument types, some function prototypes specify a value of object for the argument type. This allows for more than one of the four kinds of XPath object types to be used as an argument. With arguments of type object, the argument provided can be of any type.

Note

The argument of type object is used only in the four basic conversion functions, string(), boolean(),id(), and number(), which are used mainly to convert objects from one type to another.

The types and values of arguments can be prescribed, or expected in a certain order. For example, the function substring() takes arguments in the form of (string, number, number?). The third number argument is optional, as denoted by the ?; thus it can be omitted. These three arguments specify not only the order, but also the argument types that are permitted.

5.1.1.5 Function Calls

Functions are declared using the function prototype, but they are used with a function call. The function call is the name of the function, followed by the argument space, indicated by the (). For example, in the expression [position() = 1], position() is the function call. The processor evaluates the function call by looking up the function name in the function library and evaluating each of the arguments in the argument space. Each argument is converted to the proper type (indicated by the function return type) and passed back to the calling function. XPath functions can be used wherever XPath expressions are appropriate in XSLT templates.

5.1.1.6 Core Function Groups

A core function group corresponds to one of the four groupings of XPath functions, node-set, Boolean, string, or number. There are seven functions in the node-set group, ten in the string group, and five each in the Boolean and number groups.

  1. Node-set Functions - contain those functions that perform conversions involving node-set objects, always converting from or to a node-set.

    1. from a node-set, which is converted to another object type, such as a number, as with the count() function

    2. to a node-set from a string or other type, as with some arguments allowed in the id() function

  2. String Functions - contain those functions that perform conversions involving strings, always converting from or to a string.

    1. from a set of strings, which are then converted to something else, such as a Boolean, as with the starts-with() function

    2. to a string from a number, node-set, Boolean, or other object, as with the string() function

  3. Boolean Functions - those functions that perform conversions involving Booleans, always converting from or to a Boolean.

    1. from a Boolean to a Boolean, as with the not() function.

    2. to a Boolean from a node-set, string, number, or other object, as with the boolean() function itself

  4. Number Functions - those functions that perform conversions involving numbers, always converting from or to a number.

    1. from a number to a number, as with the floor(), ceiling(), and round() functions

    2. to a number from a node-set, string, Boolean, or other object, as with the number() function itself

Note

The function return type for all five functions in the Boolean core function group is always a Boolean. In other words, Boolean functions only return Booleans, i.e., only return true or false.

Note

Like the Boolean core function group, the function return type for all five functions in the number core function group is always a number. In other words, number functions only return numbers.

Boolean and number functions are restricted in their function return types; they only return Boolean or number values respectively. By contrast, the node-set and string core function groups contain functions whose function return type can be something other than the respective object type designated by the core group. Some string functions can return objects other than strings; some node-set functions can return objects other than node-sets.

Another characteristic of each of the core function groups is that each one contains a function which, in effect, converts any given object to the respective type of the function group. For instance, the string() function converts objects to a string, number() converts objects to a number, and boolean() converts objects to a Boolean. The node-set core function group, unlike the other three function groups, does not contain a function of the same name as its primary core function type. In other words, there is no "node-set()" function. The only function that returns a node-set is id().

5.2 The Node-set Core Function Group

There are seven node-set functions. Each of these operates upon the current node-set at the given stage in the evaluation of the expression in which the function is called. Our <year> example is duplicated for convenience in Example 5-1.

Consider the expression count(//harvest//month). In this expression, harvest is the ancestor node from which descendant month elements are to be counted. Notice how we have used a pattern expression comprising the argument to count() function. This is because count() is a node-set function, expecting a node-set as an argument, and pattern expressions return node-sets. The node-set acted upon in this example is the set of <month> elements descended from <harvest>. Because it acts upon a set of nodes, count() is considered a node-set function. What is returned, however, will be a number. The resulting number in this case will be the number of those <month> nodes that are descended from <harvest>, which will yield the number 6.

Example 5-1 Example of our <year>.
<?xml version="1.0"?> <year>       <planting>             <season period="spring">                   <month>March</month>                   <month>April</month>                   <month>May</month>             </season>             <season period="summer">                   <month>June</month>                   <month>July</month>                   <month>August</month>             </season>       </planting>       <harvest>             <season period="fall">                   <month>September</month>                   <month>October</month>                   <month>November</month>             </season>             <season period="winter">                   <month>December</month>                   <month>January</month>                   <month>February</month>             </season>       </harvest> </year> 

The node-set core function group, unlike the other three function groups, does not contain a function of the same name as its primary core function type. In other words, there is no "node-set()" function. If considered for a moment, this makes sense. A node-set has an implicit structure, each component of which can be of seven possible types, each with a specific syntax and hierarchy. It is not within the scope of a standard such as XSLT or XPath to have a single function that can render a node-set from, for instance, a string or a number. There are far too many subjective options in such a process that preclude making such a function possible or even viable.

There is, however, one function whose function return type object returned is a node-set. The id() function can accept most kinds of objects as an argument and returns a node-set. In spite of the length of its explanation, which concerns how the ID itself is identified, this is a relatively simple function. It has a few limits for use with XSLT processors, however, because it requires the processor to reference the ID attribute declaration in the DTD.

Presented next are three functions that return strings, all of which have optional node-set arguments, local-name(), name(), and namespace-uri(). These three functions work together to access the various parts of namespaced nodes, including the name that comes after the colon (the local name), the name that comes before the colon (the qualified, or QName), and the actual identifier, or Uniform Resource Identifier (URI), respectively. In each case, then, the name or URI returned is of the return type string.

Following those are three functions that return numbers; two with no arguments and a third, which requires a node-set (last(), position(), and count()). These functions are not unlike "inventory" functions for determining quantitatively the numerical values represented in each function name. Thus, the last() function gives the count value of the final node in the node-set, which is, accordingly, the total number of nodes in the node-set being evaluated by the expression. Slightly different is the count() function, which totals all nodes in the node-set referenced by the argument. The position() function simply gives the numerical count of where the node is with respect to its location within the node-set, in document order, based on the expression's evaluation context. No functions in the node-set core function group return or operate on Booleans.

5.2.1 The id() Function

The id() function will select an element based on the unique ID of the element. The id() function operates mainly on node-sets, but will also accept any other object, which it converts to a string prior to processing. The id() function is the only node-set core function that returns a node-set. Therefore, its function return type is that of node-set, and it has a required argument of object, as shown in the following function prototype.

Function: node-set id(object)
Function Name Core Function Group Returns Arguments Argument Type
id() Node-set Node-set Object required

The id() function specifically looks for an element with an attribute of type ID in the given node-set. IDs are a particular type of attribute in XML, which must be declared as such in the DTD. Thus the id() function can only be used with valid XML documents those conforming to a DTD.

While many XSLT processors have been designed to read attribute definitions and therefore recognize ID attribute content model types, they are not required to do so. If you use id() with a processor that does not read attibute definitions, the id() function will always return an empty result, or no object whatsoever. Note that the specification for XSLT does not require parsing of a document in relation to a DTD, so it should not be considered a fault or bug of the processing software.

Note

You are encouraged to review the XSLT key() function, presented in Chapter 11, as in many cases it can be more advantageous for processing efficiency and wider applicability than the XPath id() function.

To use the id() function, it is necessarily assumed that when calling this function, you know the ID structure in the XML data instance source. In other words, consider the following example:

id('n13-9-63') 

This function call will return the single element node with an attribute of type ID that is precisely equal to n13-9-63. It does not matter, in this case, what the element is; the only thing that will be selected for matching is an attribute with a value of n13-9-63. The result will be the node-set of the one node that corresponds to the element with that attribute having that value.

Notice in the example that the object type of the argument for the id() function in this case is a string, denoted by the quotes. If the object type of the argument is anything other than a string, the object is converted to a string prior to processing the id() function. The process essentially takes an object of a given kind node-set or non-node-set and turns it into a string order for the processor to find the ID, as follows.

  1. If the id() function's argument is not a node-set, the argument is converted to a string according to the string() function rules (see Section 5.3.1).

  2. If the id() function's argument type is a node-set, the same rules as above apply for producing a string; however, each node in the node-set is processed. Each node in the node-set is converted to a string according to the string() rules, based on the type of the node.

The resulting string is treated as a whitespace-separated string of tokens. Each of these strings is then evaluated until a matching ID value is found.

If an ID is not found, it is not an error, but the function will produce an empty node-set after being evaluated. For instance, in the example above with id('n13-9-63'), if there is no such ID, then the result node-set is empty.

In order to use the id() function properly, the following items must be provided:

  1. An XML source instance with an attribute whose content model type is specified as, and conforms to, type ID

  2. A DTD that defines the attributes as the declared attribute content model type ID

  3. An XSLT processor that is, at the very least, equipped to recognize attribute declarations

  4. A knowledge of the possible IDs and their syntax in the source document such that you have a chance of targeting a match in the id() function call.

5.2.2 The local-name() Function

The local-name() function returns the local part of the expanded name of the first node in the node-set. The function has a string as its function return type, that is, its result object is a string. It has one optional argument, which is a node-set, as shown in the following function prototype. If no argument is supplied, then the current node will be used. This function only operates on node-sets.

Function: string local-name(node-set?)
Function Name Core Function Group Returns Arguments Argument Type
local- name() Node-set String Node-set optional

In practice, this means that when you have a node for example, an attribute or element with an expanded namespace, the local-name() function returns the portion of the expanded namespace that follows the colon (:). For example, if iowa:harvest is the expanded namespace, the word harvest is the result of evaluating the function.

This function is particularly useful if you have a source node-set that has a lot of namespaced nodes (nodes with qualified names, or QNames) and the intended output will only have elementtype names from a single namespace. In some cases, it might be preferred that the output data be simplified by removing the namespace prefix (in this case, the iowa:). This function can be used to extract only the local name (in this case, the harvest), in effect removing the namespace prefix, or the portion preceding and including the colon.

The default processing model will return only the first conforming, or matching, node's name for this expression. In other words, you could say the following, and still get the same return:

local-name(../*[@period="winter"] | ../*[@period="spring")] 

Here, you are using a predicate with [], and asking by identifying the attribute (@) period whose value is winter to identify the parent (..) of any element (*) with that attribute and value. Even though this actually matches <season period="winter">, the parent of which would be <iowa:harvest> and (using the | operator) <season period="spring">, the parent of which would be <iowa:planting>, the first match in document order is <iowa:harvest>. The return would, then, still be harvest, as that is the local-name() of the first match for the expression, in document order. You could not get more than one without using some of the other XSLT elements.

If there is no colon in the matched node's name string, then the entire name is taken to be the local name, and the entire node's name is returned. For example, there might be a source document with a default namespace declared for a <figure> element, and an additional namespace such as from the MathML DTD which is <math:figure>. If you want to have your XPath expression act upon any element of the element-type name figure, regardless of its namespace, you would use the local-name() as part of an equivalence evaluation. For instance:

//*[local-name()='figure'] 

or

<xsl:value-of select="//*[local-name()='figure']" /> 

In this expression, which has a node test of any element (represented by the * abbreviation) at any level in the node-tree (represented by the //), using the function for the local name of figure returns figure elements of any type, whether or not the element has a namespace declared.

A few additional considerations are worth noting. While commonly used to select the local-name of an element or attribute node, elements and attributes are not the only possible nodes that can be supplied as the node-set argument to local-name(). In addition to attribute and element nodes, there are comment, namespace, processing-instructions (PIs), root (or document root) nodes, and text nodes. Each, in turn, is treated differently when supplied as the argument to local-name().

If the argument to the local-name() function is the root, there can be no distinction allowing for a local-name. The root is simply the document root; it does not have a name or namespace, so the result is an empty string. Text and comment nodes also return an empty string.

If the node-set referenced by the argument is the namespace itself, then the prefix is returned. If the namespace is the default namespace for the input XML document or data, nothing is returned. If you think about it, this makes more sense than perhaps does the technical description. If the node in the argument is the namespace itself, then the name that is local to that namespace is the namespace prefix itself in other words, that which comes before the colon.

If the argument is a processing-instruction, the attribute value for the "target" of the processing-instruction, which says what kind of software is supposed to be called by the PI, is returned (see Section 6.7.2 for more information on PIs). For example, in <?Pub Caret?>, the Pub is a signal to a specific publishing software (Arbortext), and Caret is the instruction (Caret is the location last edited by an author in Arbortext). Using local-name() on this PI would return Pub.

Table 5-2 indicates the various objects returned from the local-name() function, in the form of a string, when applied to different types of nodes.

Table 5-2. Summary of object types returned by the local-name()
Node Type Object Returned by local-name()
Attribute QName - Name of the attribute
Comment Empty
Element QName - Name of the element
Namespace Namespace prefix - if defined
Processing-instruction Target
Root Empty
Text Empty

The local-name() function provides the user of XPath with the ability to access the part of a node name after the colon when there is an expanded namespace. This function is part of a trio of functions that includes the name() function and the namespace-uri().

The three functions can be used to access all the various parts of an expanded namespace. There are some qualifications to this general statement; however, it is useful to have this conceptual understanding of local-name() as a frame of reference before proceeding with the name() function.

5.2.3 The name() Function

The name() function returns the QName, or the entire expanded name of any namespaced node. QNames are composed of a namespace prefix, a separator (the :), and a local name, and the name() function returns the entire string.

This function returns a string, so string is also its function return type. The only argument it accepts, which is optional, is a nodeset, as shown in the following function prototype. If no argument is supplied, the default node-set is the current node at that point in the XPath expression. The name() function operates only on node-sets.

Function: string name(node-set?)
Function Name Core Function Group Returns Arguments Argument Type
name() Node-set String Node-set optional

In the previous section we discussed getting figure elements, but in this case we are using the expanded namespace version, applying the name() function. The following expression will return the contents of the first node in the node-set of all elements (*) in any context (//), whose expanded name, or the QName, is math:figure.

//*[name()='math:figure'] 

or

<xsl:template match="/">       <xsl:value-of select="//*[name()='math:figure']" /> </xsl:template> 

The same thing would be the case with an attribute that has a namespace, the entire QName is returned. If the attribute was html:href to identify the HTML hypertext reference attribute, then name() applied to that attribute node would return html:href. The name() function is useful, then, as a means of analyzing what the various names of different nodes in a document might be, or for performing transformations that preserve the entire QName of a node.

Like the local-name() function, each of the seven kinds of nodes is treated differently when used as an argument to name()(see Table 5-3). For example, text, root, and comment nodes, when furnished as an argument to name(), return an empty string. Namespace nodes return the namespace prefix the text that comes before the colon. Since the namespace URI is null on all processing-instructions, the return value is the PI's "target," or the identity of the application being called by the PI when furnished as the argument to name().

Table 5-3. Summary of object types returned by the name() function
Node Type Name() result
Attribute QName - name of the attribute
Comment Empty
Element QName - name of the element
Namespace Namespace prefix - if defined
Processing-instruction Target
Root Empty
Text Empty

5.2.4 The namespace-uri() Function

The namespace-uri() function, together with local-name() and name(), completes the trio of namespace functions that selectively accesses the various components of namespaced nodes. Where local-name() retrieved the portion of the name following the colon, and name() retrieved the entire name, namespace-uri() accesses the URI itself.

The namespace-uri() function return type is a string, as shown in the following function prototype. The string returned is the URI that was declared as the identifier for the namespace if any in the namespace declaration. This function accepts an optional node-set argument, and only operates on node-sets.

Function: string namespace-uri(node-set?)
Function Name Core Function Group Returns Arguments Argument Type
namespace-uri String String Node-set optional

It is very important to understand that the namespace-uri() function does not return the prefix of the expanded namespace, which is found prior to the : in an expanded namespace, or QName. The result of this function is the value defined for the namespace. More specifically, the function returns a string equivalent to the attribute value for the namespace declaration of the first node, in document order, of the node-set supplied in the argument.

Note

There is no specific function to access the prefix portion of an expanded namespace, but using creative combinations of these and other functions could provide the value.

If no argument is supplied, then the current node at that point in the evaluation of the XPath expression is the node whose namespaceuri is returned. If there is no declared URI, or if it is the default namespace, an empty string is returned.

The namespace-uri() function will only return a string for an element or attribute node. This makes sense because a text, comment, or PI node does not have a URI, and a namespace prefix does not, itself, have a URI. This may sound contradictory, but the prefix represents a URI; it does not have its own URI per se. It if did, this would in fact be redundant and would amount to a namespace for a namespace! Example 5-2 shows our <year> specific to Iowa.

Example 5-2 The <year> specific to Iowa.
<?xml version="1.0"?> <year xmlns:iowa="http://www.iowa_climate.org/almanac/">       <iowa:planting>             <iowa:season period="spring">                   <month>March</month>                   <month>April</month>                   <month>May</month>             </iowa:season>             <iowa:season period="summer">                   <month>June</month>                   <month>July</month>                   <month>August</month>             </iowa:season>       </iowa:planting>       <iowa:harvest>             <iowa:season period="fall">                   <month>September</month>                   <month>October</month>                   <month>November</month>             </iowa:season>             <iowa:season period="winter">                   <month>December</month>                   <month>January</month>                   <month>February</month>             </iowa:season>       </iowa:harvest> </year> 

Since our example has declared the iowa namespace for the element <harvest>, the returned value using namespace-uri() would be the attribute value for the xmlns:iowa attribute, or http://www.iowa_climate.org/almanac/. However, you can't reference a namespaced node directly in an XSLT stylesheet, so you have to do a little manipulating to get the value. For example, to match on a <harvest> element, use the following XSLT template rule:

<xsl:template match="//*[local-name() = 'harvest']">       <xsl:value-of select="namespace-uri()"/> </xsl:template> 

Note

Be aware that even though we have used the http:// URL format for the URI, it is not necessary that it actually point to a specific place on the Web. The URI is intended primarily to provide a method for following up on the source of a particular set of element-type names or attribute names.

If we have more than one namespace, the node-set argument enables us to be more specific, as in Example 5-3.

The result is http://www.us_geological.service.gov/, or, the value for the xmlns:usgeo attribute defined for the usgeo namespace on the <year> element, selected from the <usgeo:planting> element's namespace.

If we change this example slightly, we can take advantage of the default document order processing, using wildcards to get the first child node of <year>, which is <usgeo:planting>:

namespace-uri(//year//*) 

In this case, http://www.us_geological.service.gov/ will still be the URI returned. The pattern expression //year//* includes all descendants of <year>, but of these, the first in document order is usgeo:planting, so its URI is the one returned. Notice that pattern expressions can be arguments to functions, provided the function accepts a node-set as its argument.

Example 5-3 Declaration of multiple namespaces.
INPUT: <?xml version="1.0"?> <year xmlns:iowa="http://www.iowa_climate.org/almanac/"       xmlns:usgeo="http://www.us_geological.service.gov/">       <usgeo:planting>             <iowa:season period="spring">                   <month>March</month>                   <month>April</month>                   <month>May</month>             </iowa:season>             <iowa:season period="summer">                   <month>June</month>                   <month>July</month>                   <month>August</month>             </iowa:season>       </usgeo:planting>       <usgeo:harvest>             <iowa:season period="fall">                   <month>September</month>                   <month>October</month>                   <month>November</month>             </iowa:season>             <iowa:season period="winter">                   <month>December</month>                   <month>January</month>                   <month>February</month>             </iowa:season>       </usgeo:harvest> </year> TEMPLATE RULE: <xsl:template match="//*[local-name() = 'planting']">       <xsl:value-of select="namespace-uri()"/> </xsl:template> 

5.2.5 The last() Function

The last() function returns the total number of nodes in the context node-set. This number is also equivalent to the context size of the node-set, because the numerical value of the last node for the context node-set is equal to the total number of nodes in the node-set (not including descendants). The last() function is one of three node-counting functions in the node-set core function group, which return numbers. It does not accept an argument and operates only on node-sets, as shown in the following function prototype.

Function: number last()
Function Name Core Function Group Returns Arguments Argument Type
last() Node-set Number None

The last() function returns a number, however, since it doesn't accept arguments, the resulting number can only be used in another context. The function can never really return the number by itself. To actually see the value of the number, use the number() function described in Section 5.5.1.

We can, however, use this function to access the contents of the last node in a node-set:

//harvest//month[last()] 

or

<xsl:template match="/">       <xsl:value-of select="//harvest//month[last()]" /> </xsl:template> 

The last() function will give us the total number of months in the first season of the harvest period, which is 3, but the entire XSLT expression with <xsl:value-of> will actually return November, which is the contents of the last <month> in the first <season> element in the <harvest> element.

Recall from Chapter 4 that the context node is the node from which the current portion of an expression is being evaluated. Because the <harvest> element contains two <season> elements, each one is used as the context node for the evaluation of the month[last()] portion of the expression.

The last() function can also be used as a sort of "on/off" testing switch if you are performing a transformation on a number of nodes, but do not want to perform that transformation on the last node.

//harvest//month[position()!=last()] 

or

<xsl:template match="/">       <xsl:value-of select="//harvest//month[position()!=last()]"       /> </xsl:template> 

When used with another function in the node-set core function group, position() and the operator != (not equal to), the function will return the contents of any <month> that was not the last in the set specified by the current context (assuming you are processing each <month> with some sort of looping or iterative mechanism, like the <xsl:for-each> function described in Chapter 9).

Note

Predicates can change the size of the node-set by selecting or eliminating nodes, so the number that is returned by the last() function is the number of the last node of the current nodeset for the given stage in the processing of the XPath expression where the function is called. In other words, as the current context node can change in the course of a predicate's evaluation, so then can the count of the current context of nodes change.

5.2.6 The position() Function

Like the last() function, the position() function returns a number, but since it doesn't accept arguments, the function is only really useful when used in combination with other functions. For example, this function can be used in an equivalence expression like position() = 1, or in XSLT elements like <xsl:value-of> that can pull out the value of the position().

This function has a return type of number and is part of a trio with last() and count(), which together enable a range of inventory and numerical sequence-based operations on node-sets. The position() function does not accept an argument, as shown in the following function prototype. The implicit argument is the current node, which is why this function is classified as a node-set function. It only operates on node-sets.

Function: number position ()
Function Name Core Function Group Returns Arguments Argument Type
position() Node-set Number None

Using the Markup City model, slightly revised in Example 5-4, we can navigate without knowing the street names. Note that it may be necessary to remove any extra space and tabs from this example because some processors consider whitespace objects as "children" and they are counted as text nodes.

If our local expert, at whose mercy we are when asking directions, doesn't know the street names (and we've all had that frustrating occurrence) but does know that the store we're seeking is in the fifth block, it's still possible to find it. So with XPath, choosing the fifth block as we "drive" or traverse, in proper XPath terminology the <boulevard> is quite simple:

//boulevard/block[position() = 5] 

or

<xsl:template match="/">       <xsl:value-of select="//boulevard/block[position() = 5]" /> </xsl:template> 

This expression, using <xsl:value-of>, returns the text value of the fifth <block> element in the <boulevard>, or Old Chimney Road.

As demonstrated previously, position() is often used together with last(), either with the "not" operator (!=) to exclude the last node, or with equal (=) when the last node is to be used.

//boulevard/block[position() = last()] 

or

Example 5-4 Revised Markup City model.
<?xml version="1.0"?> <main>       <parkway>             <thoroughfare>Governor Drive</thoroughfare>             <thoroughfare name="Whitesburg Drive">                  <sidestreet>Bob Wallace Avenue</sidestreet>                  <sidestreet>Woodridge Street</sidestreet>             </thoroughfare>             <thoroughfare name="Bankhead">                  <sidestreet>Tollgate Road</sidestreet>                  <sidestreet>Oak Drive</sidestreet>             </thoroughfare>       </parkway>       <boulevard>             <block>Panorama Street</block>             <block>Highland Plaza</block>             <block>Hutchens Avenue</block>             <block>Wildwood Drive</block>             <block>Old Chimney Road</block>             <block>Carrol Circle</block>       </boulevard> </main> 

 

<xsl:template match="/">      <xsl:value-of select="//boulevard/block[position() =         last()]" /> </xsl:template> 

The result of this expression would be the content of the last <block> element in the <boulevard>, or Carrol Circle.

5.2.6.1 The position() Function with Regard to Context

Because nodes are counted in context, the numbering for position() resets at the context element. Context is not determined by the position() function itself, but by the XSLT element in which the position() function is used. Most XSLT elements get their context from the <xsl:template> element (see Chapter 3, Section 3.1.2 for more on the <xsl:template> element).

When matching beginning from the root, the context is the root, and position() starts at the first element with 1, and sequentially counts elements to the end, as shown in Example 5-5.

Example 5-5 Using position() in an iteration.
<xsl:template match="/">       <xsl:for-each select="//*">             <xsl:value-of select="position()"/>       </xsl:for-each> </xsl:template> 

The result of this template using our Markup City from Example 5-4 is a count of all elements starting at 1 and ending at 16: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16.

Matching from the root, the context for the following template is the root, so the position of the sidestreets is counted sequentially without regard to direct parentage:

<xsl:template match="/">       <xsl:for-each select="//sidestreet">            <xsl:value-of select="position()"/>       </xsl:for-each> </xsl:template> 

The result of this template is a sequential count of the sidestreets starting at 1, regardless of their parent: 1 2 3 4.

Changing the template to match on any sidestreet, the context is changed to the parent element of sidestreet (recall that even though the // is used to denote any sidestreet, the context of the sidestreet is still its parent node):

<xsl:template match="//sidestreet">       <xsl:value-of select="position()"/> </xsl:template> 

The result of this template is the count of each sidestreet in context of its parent, or 1 2 1 2. The processor takes each element as the context element, starting from the root (//), and looks for sidestreets within that element, first resetting the position() count to 1. Note that if you have any whitespace in your input file, it may be counted as a child of <thoroughfare> and cause your resulting numbers to be off.

5.2.7 The count() Function

The count() function simply counts the nodes not including their descendants in the node-set specified in the argument. This function is the third component of the node sequencing and numerical inventory trio of functions. Of the three count(), last(), and position() only count() has an argument, which is required, and must be a node-set. The count() function operates on node-sets and returns a number, as shown in the following function prototype.

Function: number count(node-set)
Function Name Core Function Group Returns Arguments Argument Type
count() Node-set Number Node-set required

If we wanted to know how many blocks were in the entire Markup City, we would use count() as shown in Example 5-6.

Example 5-6 Using count() to count blocks.
count(//block) or <xsl:template match="/">       <xsl:value-of select="count(//block)" /> </xsl:template> 

This would give us the total number of blocks, in this case 6, regardless their parents. Of course, in our current city, we have only <block>s along the <boulevard>. We could use the union operator | with the count() function to find the total number of either <block>s or <sidestreet>s.

count(//block | //sidestreet) 

or

<xsl:template match="/">       <xsl:value-of select="count(//block | //sidestreet)" /> </xsl:template> 

This expression would result in 10, the total of 6 <block>s and 4 <sidestreet>s. Working with more complex expressions within the argument, we could add a predicate that limits which <sidestreet>s we count.

count(//block | //*[@name='Bankhead']/sidestreet) 

or

<xsl:template match="/">       <xsl:value-of select="count(//block |       //*[@name='Bankhead']/sidestreet)" /> </xsl:template> 

This counts all <block>s, regardless from which node they are descended (//), which is equal to 6, and it also counts all <sidestreet>s whose parent (the / preceding sidestreet steps up to the parent of <sidestreet>) is any element (*) with the attribute (@) name with the value Bankhead, which results in 2, for an expression total of 8. If there was a kind of street other than <thoroughfare> with a name attribute with the value Bankhead, its <sidestreet> children would be counted too. To be more specific, but with the same result in this case, we could do the following:

count(//block | //thoroughfare[@name='Bankhead']/sidestreet) 

or

<xsl:template match="/">       <xsl:value-of select="count(//block |       //thoroughfare[@name='Bankhead']/sidestreet)" /> </xsl:template> 

5.3 String Core Function Group

There are ten functions in the string core function group, seven of which return strings. The string() function serves to convert the range of objects supplied as an argument into a string according to the string conversion rules for that object type (covered in Section 5.3.2). The concat() function concatenates the group of strings supplied in the argument. The substring(), substring-after(), and substring-before() functions operate on subsets of strings. The next two functions manipulate the characters of the strings themselves. The first, normalize-space(), equalizes the whitespace in the strings supplied in the argument. The other, translate(), interprets two sets of arguments and replaces one set with the other.

Two functions in the string core function group return Booleans. The starts-with() function returns a true/false value when a given string starts with members of another string. The contains() function simply checks if the components of one string in the supplied arguments contains the components of the second string.

The string-length() function has a number return type, and returns the total count of the characters in the supplied string.

5.3.1 The string() Function

The string() function is the basic conversion function that takes a given object of the four types specified in XPath and converts it to a string.[3] This object argument is optional, as shown in the following function prototype. The string() function operates on any type of input, designated by the argument type of object.

Function: string string(object?)
Function Name Core Function Group Returns Arguments Argument Type
string() String String Object Optional

To understand the string() function, it's best to think of it as a process of segmenting the input arguments into a form that is simply a sequence of characters with no syntactic meaning, other than the separating whitespace. In other words, when XML-processing or XSLT-processing software sees, for instance, <block>, it reads the less-than < and greater-than > in such a way that makes block an element-type name and <block> an element. Similarly, when an XSLT processor sees true or false in a Boolean context, then that literally means an answer to a query of sorts as to whether or not a given condition exists. Taken together, then, the characters that comprise <block> and those that comprise the word true have a meaning that goes beyond the simple combination of characters. The syntax, or arrangement of those characters, and the context in which they occur add meaning to what they signify.

string: A thread or file with a number of objects strung upon it. (circa 1488)

string: computers: A linear sequence of records or data. (1956) (OED, 2000, XVI: 922)

In each of the cases just described, the so-called "combination of characters," or "sequence of characters," can also be a string. In fact, they literally are strings in the everyday sense of the word string: A number of things in a line; a row, chain, range (OED, 2000, XVI: 922). However, when considered as a whole, the string of t, r, u, and e means an answer in a given context that of a Boolean which affirms the presence of a particular condition about which an inquiry has been made. All of these significations accompany the word true in a Boolean context. Converting it to a string means, among other things, removing the "sense" of being a Boolean, which that syntax and context conveys as an additional meaning to just the word constructed of those letters.

When the word true becomes a string with the conversion afforded by the string() function, it is still visible to human-reading as "true." The attendant significations are gone, however, as far as the computer software is concerned.

In a related way, when a node-set is converted to a string, the significations of the tags as well as the hierarchy of elements within the given node-set, are all removed. The result is simply the text within the node, without all the XML markup.

There are particular reasons for doing this. As seen with id(), when the various object types become strings, then it is a simple matter to match character-for-character. In addition, as will be seen with the other functions in the string core function group that take strings as arguments, if in a given operation you wish to affect an element or number in a certain way such as to translate an element-type name from uppercase to lowercase the only way to do that is to have the element-type name be a string as an argument to the translate() function.

So, much like the ubiquity of position(), the string() function is both very powerful and very essential for transforming all manner of structures in the XML data source instance. There is a great irony here, however, as the string() function is rarely, if ever, explicitly called. As in the case of the id() function as also with node-sets when given as arguments to the number() function the string() function is implicitly invoked in the course of performing other operations, still adhering to the string conversion rules. Lots of string conversions take place in the course evaluating other XPath functions, but string() itself is not often explicitly invoked.

5.3.2 String Conversion Rules

To convert objects to a string, certain rules of order must be observed so there is a predictable structure in the resulting string. When you construct an XPath expression using the string() function explicitly or implicitly it is helpful to know exactly what is going to happen to the input data in order to successfully work with the resulting output.

  1. Numbers are converted to strings as follows (refer to Section 5.5.2 on number conversions for a more detailed explanation of NaN, positives, negatives, etc.):

    1. A non-numerical number value (e.g., sequenced letters), denoted as NaN, is converted to the string NaN.

    2. Any zero number, positive or negative, is converted to the string 0.

    Note

    An integer is returned, regardless of whether the number happens to be positive, as in 0.3, or negative, as in -0.3. In each case, the number will return the nearest integer (moving towards positive infinity). In the case of 0.5, the nearest positive integer 1 is returned. Any negative number between -0.5 and 0 will return a negative 0. (-0.000000001 is still a negative 0).

    1. Infinity, positive or negative, is converted to the string Infinity or -Infinity.

    2. Numbers that are integers are represented in decimal form as numbers (according to IEEE 754) with no decimal point and no leading zeros, with a preceding minus sign if negative.

    3. Other numbers are presented as numbers (IEEE 754), including a decimal point and at least one digit after the decimal point, with leading minus sign if negative. No leading zeros other than optionally the one preceding the decimal, and only as many digits beyond the decimal as are needed to distinguish the number from any other such numbers in the XML data instance source.

  1. Boolean true or false values are converted to the strings true and false respectively.

  2. Node-sets are converted to strings using specific rules for each node type. Each type of node has a specific string conversion rule, resulting in a string-value. In some cases, a node's string-value will be part of the node. In other cases, the string-value of the node comes from the string-value of the text nodes of the node's descendants.

  3. If a string is passed to the string() function, it remains a string.

  4. Objects other than the four basic types are converted to strings according to the type of object.

Table 5-4 shows the resulting string-values for each converted node type:

Table 5-4. String values for converted node types
Node Type String-Value
Attribute value of the attribute, if specified
Comment text of the comment
Element value of the element's descendant's text nodes, concatenated together in document order
Namespace the text of the URI
Processing-instruction text of the processing-instruction
Root value of the element's descendant's text nodes, concatenated together in document order
Text text of the node

The resulting string is treated as a whitespace-separated string of tokens[4] (basically, words or groups of characters separated by spaces).

The string() function is often used to prepare a particular object for being operated upon by some other function. By itself, it has limits. This is consistent with its basic function because the output has no other inherent meaning than the sequence of characters it produces.

5.3.3 The concat() Function

The concat() function operates on strings, and returns a string that is a result of joining (or concatenating) two or more strings, which are furnished as arguments. At least two strings are required as arguments, though an unlimited number may be optionally furnished. Optional additional strings are specified in the function prototype below, using an asterisk * following the argument declaration, which technically means zero or more. A comma separates each argument. If a non-string object is specified as an argument, it will be converted to a string according to the string conversion rules (see Section 5.3.2) prior to processing.

Function: string concat(string, string, string*)
Function Name Core Function Group Returns Arguments Argument Type
concat() String String String String String* required required optional

It is useful to think of concat() as a sort of daisy-chaining function.[5] It ties the strings provided as arguments together end-upon-end (it does not loop them, however). You can supply a number of items in the arguments, which can be used to generate output text.

With concat() you can construct content with a mixture of text strings, pattern expressions, and functions. For instance, we could get the contents of the first and last <block> in Markup City, as shown in Example 5-7.

Example 5-7 : XML input from Markup City for concat() example.
<?xml version="1.0"?> <main>       <parkway>           <thoroughfare>Governor Drive</thoroughfare>           <thoroughfare name="Whitesburg Drive">                <sidestreet>Bob Wallace Avenue</sidestreet>                <sidestreet>Woodridge Street</sidestreet>           </thoroughfare>           <thoroughfare name="Bankhead">                <sidestreet>Tollgate Road</sidestreet>                <sidestreet>Oak Drive</sidestreet>           </thoroughfare>       </parkway>       <boulevard>           <block>Panorama Street</block>           <block>Highland Plaza</block>           <block>Hutchens Avenue</block>           <block>Wildwood Drive</block>           <block>Old Chimney Road</block>           <block>Carrol Circle</block>       </boulevard> </main> 

In the following example, the first argument (//block) is a simple pattern expression to get the name of the first <block>, the second (' and ') is a string to put in the word and after it, and the third (//block[last()]) is a pattern expression with a predicate:

concat(//block, ' and ', //block[last()]) 

or

<xsl:template match="/">     <xsl:value-of select="concat(//block, ' and ',     //block[last()]) " /> </xsl:template> 

As a result, we're able to get the following narrative output:

Panorama Street and Carrol Circle 

Recall from the string() function that any node-set that is converted to a string returns the value of the first node, which is why the first argument (in this case, //block) returns the text contents of the first <block> element.

The two <block> names are separated by the word and (in the English language, not operand, sense) by using a literal string as the second argument. The third argument returns the value of the last <block>, testing whether it is the last <block> by using the last() function.

Another possible use might be to determine all names of <thoroughfare>s along the <parkway> in Markup City. You will remember from the example above that some names of <thoroughfare>s were given as element content, and others as values to the name attribute.

We could put together a list in human-sensible syntax by using several arguments with concat() and the local-name() function.

Assume you knew there were three <thoroughfare>s and which ones had the attribute name instead of a name in the content.

concat('There are several ', local-name(//parkway/ *), 's, the first is called ', //parkway/*, ', the second is called ', //parkway/*[position()=2]/ @name, ', and the third is called ', //parkway/ *[position()=3]/@name, '.') 

Notice that it is necessary to include any additional text inside quote marks, including the spaces. Element and attribute names as parts of pattern expressions do not require quote marks, as is also the case with other functions, such as the use of local-name(). The output of this XPath function expression, using concat() on the input file from Markup City, is:

There are several thoroughfares, the first is called Governor Drive, the second is called Whitesburg Drive, and the third is called Bankhead. 

The parts of that sentence generated from function or pattern expressions as components are underlined. Notice, for instance, how the commas and the "s" after "thoroughfare" are attached by concat().

If you didn't know that there were three <thoroughfare>s, or which ones had the attribute name instead of a name in the content, you would need to use more advanced functions, such as the XSLT function <xsl:for-each>. See Chapter 9 for more information on the <xsl:for-each> instruction element.

5.3.4 The substring() Function

The substring() function is one of a trio of functions, including substring-after() and substring-before(), in the string core function group, which work, in some way or another, with subcomponents of strings. The substring trio functions begin with a string and then provide additional processing granularity for accessing and manipulating smaller portions of the initial string. The substring() function accepts three arguments, as shown in the following function prototype.

Function: string substring(string, number, number?)
Function Name Core Function Group Returns Arguments Argument Type
substring() String String String Number Number required required optional

The first two arguments (a string and a number) are required, and the third (a number) is optional. Although the function only operates on strings, the input string can be any object that is implicitly converted to a string with the string conversion rules (see Section 5.3.2).

The substring() function is a positional function, based on the parameters delineated by the supplied arguments. In other words, this function operates on the initial string using the position of the characters within that string to extract a substring. The first argument is always the starting, or input string. The second argument specifies which character with which to begin the substring, numerically counted from the first character in the string, using a base of 1 (not 0, as in some programming languages). The new substring created is the subset of the initial string that starts from the positional character in the second argument and continues to the end of the initial string. The substring will include all the characters to the end of the initial string unless there is a third argument to the function to specify where to stop.

If provided, the third argument specifies how many sequential characters in the input string, beginning with the character specified in the second argument, are to be included in the resulting substring. For example, given the string my string, count the characters in the string as follows (numbers below my string indicate which character, in numerical sequence, each is when counted):

my string 123456789 

Using the function substring('my string', 4, 3), the resulting substring is str, which comes from the fourth, fifth, and sixth characters. This function essentially states: Start from the fourth character in the string my string and count three characters.

Think of the second argument as saying "all sequential pieces of the input/starting string that are greater than or equal to me." Think of the third argument, if given, as saying "all sequential pieces of the input/starting string that are less than or equal to me."

5.3.4.1 Using substring() on Node-sets.

If the first argument of the substring() function happens to be a node-set, the result of the conversion of the node-set to a string would be the value of the first node in the node-set, as is shown in Example 5-8 of our Markup City.

The result of this function call would be the three characters ora from the fourth, fifth, and sixth characters in the string value of the first <block> element, or Panorama Street.

The substring() function should be considered "destructive," in that all portions of the base or starting string not selected by the second and third arguments (if used) are discarded by the XSLT processor in the subsequent evaluation of the XPath expression containing this function.

Example 5-8 XML for substring() function example.
INPUT: <?xml version="1.0"?> <boulevard>             <block>Panorama Street</block>             <block>Highland Plaza</block>             <block>Hutchens Avenue</block>             <block>Wildwood Drive</block>             <block>Old Chimney Road</block>             <block>Carrol Circle</block> </boulevard> FUNCTION: substring(//block, 4, 3) or <xsl:template match="/">       <xsl:value-of select="substring(//block, 4, 3)" /> </xsl:template> 
5.3.4.2 Using substring() on Numbers.

The substring() function becomes a bit complex when the arguments supplied are numbers. Number objects have specific properties, which affect XPath expressions that are using them in differing ways. All of these properties have a universal set of rules to which they conform, as defined in the IEEE 754 set of rules for numerical values. We will deal with these in more detail in the number core function group, but must introduce them here as different types of properties affecting numbers will, in turn, affect how the sequences supplied as arguments to the substring() function are counted.

It is likely that some of the following discussion will not be essential to extracting substrings. For example, if as in the example above-your use of substring() always has sort of "everyday, bread-and-butter whole numbers," then you can rest assured that IEEE 754 rules simply ensure that 3 will always mean a 3.

However, when the second and third arguments are decimals, negative values, or zeroes, for instance, IEEE 754 rules and the principles of rounding numbers come into play (see Section 5.5.6 for more information on the principles of the round() function). This rounding results in the following kinds of results for a simple input string of abcde, as used in the W3C XPath specification.

  1. substring('abcde', 1.5, 2.6) returns bcd, because 1.5 and 2.6 will round to 2 and 3 respectively. So, b is greater than or equal to the position value represented by 1.5 which rounds to 2 as is c and d. The sum of rounded 1.5 and 2.6, or 2 + 3, is 5. The position of d is 4, which is less than 5.

  2. substring('abcde', 0, 3) returns ab, the 0 remains 0, so there is no corresponding point in the input string to be included. Zero represents a point prior to the beginning of the string (because XPath is not zero-based as are other programming languages), and that point plus two more beyond it would only comprise a and b because the position occupied by a is greater than or equal to 0.

  3. substring('abcde', 0 div 0, 3) returns "", or an empty set. The div is the number function for "divided by." Because 0 divided by 0 is not a number (returns NaN), the result of any such division will be an empty string.

  4. substring('abcde', 1, 0 div 0) also returns "", for the same reason as 3 above.

  5. substring('abcde', -42, 1 div 0) returns abcde because the string function will always start at the first character when the number supplied for the first character is less than the starting value of 1, and 1 div 0 is Infinity, so all the characters are selected.

  6. substring('abcde', -1 div 0, 1 div 0) returns "" because -1 divided by 0 is -Infinity, and even though 1 divided by 0 is Infinity and it would make sense to return all the characters, the function cannot find the starting point.

5.3.5 The substring-after() Function

The second in the trio of functions that work with subcomponents of strings is the substring-after() function, which also returns a string. It takes two required arguments, which are themselves strings, as shown in the following function prototype:

Function: string substring-after(string, string)
Function Name Core Function Group Returns Arguments Argument Type
substring-after() String String String String required required

The substring-after() function operates on strings, or objects that are converted to strings prior to processing. The first argument to substring-after() is an input or initial string, and the second specifies the character or group of characters in the input string after which the remainder of the input string is returned.

For example, given the expression substring-after("1963/02/13", "/"), the function returns everything after the first match on the second argument (in document order), in this case the forward slash (/), and so yields 02/13.

If there is no match for the second argument, then there will be nothing returned or, more specifically, the empty string is returned. If the entire string of the second argument matches, character for character, the equivalent string in the input string provided in the first argument such that the match continues right up to the very last character of the input string, there is nothing left to return. This is because the string in the second argument specifies the point after which the resulting substring to be returned begins. Thus using the function substring-after("1963/02/13", "02/13") will return the empty string because nothing comes after 02/13.

In Example 5-9 of Markup City, it may be necessary to know what kind of <block>s avenues, streets, plazas, and so on are along the <boulevard>.

Example 5-9 XML for the following substring-before() and substring-after() functions.
<?xml version="1.0"?> <boulevard>             <block>Panorama Street</block>             <block>Highland Plaza</block>             <block>Hutchens Avenue</block>             <block>Wildwood Drive</block>             <block>Old Chimney Road</block>             <block>Carrol Circle</block> </boulevard> 

Using the substring-after() function with a pattern to extract the names of the <block> will return the string after the space in each name:

substring-after(//boulevard/block, ' ') 

or

<xsl:template match="/">       <xsl:value-of select="substring-after(//boulevard/block, '       ')" /> </xsl:template> 

The result of testing this function, because the context of the <xsl:value-of> element is the root, returns the string Street because it only returns the first <block>. You can add a position() function to test each block individually, but this may be a time consuming process:

      substring-after(//boulevard/block[position()=2], ' ')       substring-after(//boulevard/block[position()=3], ' ') ... etc. 

The best solution is to change the context of the previous test by changing the match attribute of <xsl:template>:

<xsl:template match="//boulevard/block">     <xsl:value-of select="substring-after(., ' ')" /> </xsl:template> 

Recall from Chapter 4 that the "self" token passes the value of the current node to the expression, in this case the text of each <block>.

The resulting strings from each <block> would be:

Street Plaza Avenue Drive Chimney Road Circle 

Note that the fifth <block> will not return Road for the expression above, but instead Chimney Road, as the space after Old is the first of two. We could then add an additional substring-after() function, nesting the first substring-after() function within it, to correct this occurrence.

substring-after(substring-after(//boulevard block[position()=5], ' '), ' ') 

or

<xsl:template match="/">       <xsl:value-of select="substring-after(substring       after(//boulevard/block[position()=5], ' '), ' ')"       /> </xsl:template> 

This results in the substring Road.

5.3.6 The substring-before() Function

The substring-before() function completes the substring trio of functions and performs, as might be expected, the reverse of the substring-after() function. It also returns a string and takes two required arguments, which are themselves strings, as shown in the following function prototype. The substring-before() function operates on strings or on objects that are converted to strings prior to processing.

Function: string substring-before(string, string)
Function Name Core Function Group Returns Arguments Argument Type
substring-before() String String String String required required

The first argument in substring-before() furnishes the initial or input string. The second argument will return any portion of the input string that precedes the portion represented in the second argument, but not including the second argument portion. Thus, with the previous example of dates, substring-before("1963/02/13", "/") returns whatever comes before the first match in document order to the second argument. In this case, using / for the second argument will return 1963 because this precedes the first forward slash match in the initial string.

In the case where the characters in the second argument exactly match the first characters in the first argument, the result returned is an empty string, because nothing comes before the first character of the string in the first argument. If no part of the first argument is matched by the second argument, the result is also an empty string:

substring-before("1963/02/13", "1963") 

The function will return the empty string because nothing comes before 1963.

Suppose we wanted the name of any <block>s in Markup City, as shown in Example 5-10, that were also "circle" type of roads.

Example 5-10 Using the substring-before() function.
<xsl:template match="//boulevard/block">       <xsl:value-of select="substring-before(., ' ')" /> </xsl:template> 

We could simply reverse our example from the substring-after() function to get all the names of the streets.

If used with <xsl:for-each>, this results in the names of all the <block>s (except of course, "Old Chimney," which is truncated to Old), but that would not tell us which one was a circle resulting in:

Panorama Highland Hutchens Wildwood Old Carrol 

The easiest way to get the name of the <block> that is a circle is to use the word Circle in the test string:

substring-before(., 'Circle') 

or

<xsl:template match="//boulevard/block">       <xsl:value-of select="substring-before(., 'Circle')" /> </xsl:template> 

This would result in the string 'Carrol ' (which also includes the space between Carrol and Circle). Adding a space to the second argument would, in effect, erase the space in the result:

substring-before(//boulevard/block, ' Circle') 

5.3.7 The normalize-space() Function

The normalize-space() function returns a string with the extra spaces within it removed. It takes a single optional argument, which is also a string, as shown in the following function prototype. If the argument is not supplied, the string corresponding to the current node at that point in the evaluation of the XPath expression is used.

Function: string normalize-space(string?)
Function Name Core Function Group Returns Arguments Argument Type
normalize-space() String String String optional

The normalize-space() function is used to prepare data for subsequent actions. The "normal" in normalize-space() refers to the default use of a single space between strings in a markup document. The normalize-space() function reduces any extra spaces down to the normal single space, unless the space is expressly preserved by using either the &nbsp; (nonbreaking space entity) or the xml:space attribute with the value set to preserve. See Chapter 2 for more information on the xml:space attribute.

Note

The result of preserving space using the xml:space attribute is similar to using the "pre" tag for preformatted text in HTML, except the font is not affected. The normalize-space() function could be used to override this setting in the input XML data instance if needed, for instance, to make a single space a point for substring manipulations.

If the source string has multiple spaces between words, these will be reduced to one space separating each word. Leading and trailing spaces before and after strings will also be stripped by normalize-space().

As an example, if we tried to use the example from substring-after(), where we were searching for a space and returning the string after the space, and if the input from the Markup City were a bit more sloppy, the results would be very unpredictable, as Example 5-11 shows.

Example 5-11 XML with disorderly white space to demonstrate the normalize-space() function.
<boulevard>       <block> Panorama    Street</block>       <block>Highland  Plaza </block>       <block>   Hutchens          Avenue</block>       <block>Wildwood    Drive   </block>       <block> Old    Chimney  Road</block>       <block>Carrol    Circle  </block> </boulevard> 

Using normalize-space() on this data will clean it up to assure the substring functions mentioned previously work properly.

normalize-space(//boulevard/block) 

or

<xsl:template match="//boulevard/block">       <xsl:value-of select="normalize-space(.)" /> </xsl:template> 

Once the data has been normalized, the expressions can be used as shown before, and the result will still come out right.

substring-after(normalize-space(//boulevard/block), ' ') 

or

<xsl:template match="//boulevard/block">       <xsl:value-of select="substring-after(normalize-space(.), '       ')" /> </xsl:template> 

5.3.8 The translate() Function

The translate() function is used to convert one set of strings to another.[6] It returns a string as its function return type, as shown in the following function prototype. This function requires three arguments, the first of which provides an initial or input string. The second argument specifies which characters of the first string are to be replaced. The third argument provides the characters that will be used as the replacement characters. This function operates on strings or on objects that will be converted to strings prior to processing.

Function: string translate(string, string, string)
Function Name Core Function Group Returns Arguments Argument Type
translate() String String String String String required required required

Before proceeding, the example from the XPath W3C specification serves to make this abstract notion of substitution and intersection more concrete. If you have a string such as bar and you want to capitalize only those letters that match a specific set, such as abc, then you would supply bar as the first argument as a starting or input string, abc as second argument, and ABC as the third string, indicating what is to be translated to.

So, for translate('bar', 'abc', 'ABC'), the processor will first check for which characters in the second argument match characters in the first argument in this case, a and b. Then the processor will look up the appropriate conversion for the a and b matches in the corresponding positions from the characters in the third argument in this case, A, and B. The result would then be BAr, because only b and a of the first argument were matched by the options in the second argument, and the appropriate replacements for them from the third argument were the uppercase B and the uppercase A.

Think of the first argument as you would a starting or input string. The second argument is the list of what is to be replaced from the first string. The third string is simply a lookup table of appropriate substitutions, where each position of each character corresponds to the position of the character specified in the second argument. If the first argument has characters unmatched by any from the second argument, they are output unaffected.

Working with the translate() function is always predicated on supplying a complete set for what the input string is to be translated to. There is nothing "intuitive" about the translate() function. It is better thought of as a sophisticated search and replace function, and also as a tool for changing the case from upper to lower or vice versa for many, but not all, character sets of various languages.

Because of the rules governing the syntax of the three arguments to the translate() function, it is also possible to remove characters from the first argument by matching them in the second argument and supplying no replacement for them in the third argument. This necessarily means the second argument will be longer contain more characters than the third argument.

In the example above, we could change the second argument as follows: translate('bar', 'abcr', 'ABC'). This would allow the second argument to match all characters of the first argument. However, the third argument has no fourth character with which to match the r, which comes fourth in the second argument. The return would therefore be BA. This means also that the third argument can simply be empty, meaning that the characters listed for replacement in the second argument are, in effect, deleted or removed.

For example, if our Markup City <block> names were in lowercase, you might want to use the translate() function to capitalize them. We will use a couple of other functions in combination with translate() to accomplish this in Example 5-12.

Example 5-12 XML for the translate() function examples.
<boulevard>                <block>panorama street</block>                <block>highland plaza</block>                <block>hutchens avenue</block>                <block>wildwood drive</block>                <block>old chimney road</block>                <block>carrol circle</block> </boulevard> 

The XPath function expression could be written as follows.

translate(substring(//block, 1, 1), 'abcdefghijklmnopqrstuvwxyz', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ') 

or

<xsl:template match="/">     <xsl:value-of select="translate(substring(//block, 1, 1),     'abcdefghijklmnopqrstuvwxyz',     'ABCDEFGHIJKLMNOPQRSTUVWXYZ')" /> </xsl:template> 

This function will return the first letter of the first node, converted to uppercase. In this specific example, it returns a P because panorama street is the content of the first node. Combine this with the concat() function (Section 5.1.3.2) to get the rest of the string as follows:

concat(translate(substring(//block, 1, 1), 'abcdefghijklmnopqrstuvwxyz','ABCDEFGHIJKLMNOPQRSTUVWXYZ'), substring(//block, 2)) 

or

<xsl:template match="/">     <xsl:value-of select="concat(translate(substring(//block,     1, 1), 'abcdefghijklmnopqrstuvwxyz',     'ABCDEFGHIJKLMNOPQRSTUVWXYZ'), substring(//block, 2))" /> </xsl:template> 

The result of this expression will be Panorama street. This of course doesn't address the lower case "s" in street, so we would use yet more functions, substring-after() and substring-before(), as shown in Example 5-13.

Example 5-13 Extended example of nested functions.
concat (translate (substring(//block, 1, 1), 'abcdefghijklmnopqrstuvwxyz','ABCDEFGHIJKLMNOPQRSTUVWXYZ'), substring(substring-before (//block, ' '), 2), ' ', translate (substring(substring-after (//block, ' '), 1, 1), 'abcdefghijklmnopqrstuvwxyz',   'ABCDEFGHIJKLMNOPQRSTUVWXYZ'), substring(substring-after (//block, ' '), 2)) or <xsl:template match="/">       <xsl:value-of select="concat(translate(substring(//block,       1, 1), 'abcdefghijklmnopqrstuvwxyz',       'ABCDEFGHIJKLMNOPQRSTUVWXYZ'), substring(substring       before(//block, ' '), 2), ' ',       translate(substring(substring-after(//block, ' '), 1, 1),       'abcdefghijklmnopqrstuvwxyz',       'ABCDEFGHIJKLMNOPQRSTUVWXYZ'), substring(substring       after(//block, ' '), 2))" /> </xsl:template> 

After all this trouble of using several functions, in many cases nested inside other functions, we finally get our result, Panorama Street. While this may seem a daunting task to get such a simple result, consider running this on a database of millions of names. A few lines of nested expressions actually seem worth the time spent to get the correct results. Change the XSLT example slightly to get a list of all the blocks, with the case changed in each:

<xsl:template match="//block">     <xsl:value-of select="concat(translate(substring(., 1, 1),     'abcdefghijklmnopqrstuvwxyz',     'ABCDEFGHIJKLMNOPQRSTUVWXYZ'), substring(substring-     before(., ' '), 2), ' ', translate(substring(substring-     after(., ' '), 1, 1), 'abcdefghijklmnopqrstuvwxyz',     'ABCDEFGHIJKLMNOPQRSTUVWXYZ'), substring(substring-after(.,     ' '), 2))" /> </xsl:template> 

There are many possible uses for the translate() function, the most common for uppercase/lowercase conversions. It is important to consider this function also as a way of weeding out specific characters by not supplying appropriate replacements in the third argument for those selected for action by the second argument. Other functions can be used to supply the value for the second argument, so that a single translate() function could be contingent upon a number of factors in the input XML data instance.

5.3.9 The contains() Function

The contains() function tests for the existence of a substring within an initial string. Even though the contains() function is in the string core function group, it has the return type of boolean, because it will only return a Boolean true or false answer. It requires two string arguments, as shown in the following function prototype, an input or initial string, and a second argument, which is the string being matched or tested for in the input string. This function operates on strings or objects that will be converted to strings prior to processing, using the string conversion rules as described in Section 5.3.2.

Function: boolean contains(string, string)
Function Name Core Function Group Returns Arguments Argument Type
contains() String Boolean String String required required

The contains() function is an existential function because, while it is used to match on strings, it returns a Boolean true or false value, not the string. It is actually testing for the existence of the substring within the initial string. For example:

contains('xml', 'x') 

This expression will result in a true value because the string xml actually does contain the substring x. On the other hand, the following expression using our Markup City will result in a false value:

contains(//block, 'x') 

This is because the expression is testing to see if there is an "x" in the string that is the result of the node-set conversion for each <block>, none of which contain an x.

The function can also be used in combination with other functions as predicates, for instance so that the node test that contains the predicate allows for the containing node itself to be returned. For example,

//block[contains(., 'Circle')] 

or

<xsl:template match="/">       <xsl:value-of select="//block[contains(., 'Circle')]" /> </xsl:template> 

In this use of the function, any <block> element with the word Circle in it is returned. The result would be

Carrol Circle 

As noted previously, in this example for the <block> element, the first argument to the contains() function is often the self:: axis, or a "." in abbreviated form. In the case of the expression with <block>, this means that where the expression is part of an <xsl:value-of> element, which is necessarily looking at the contents of <block>, those contents are the self:: axis to which Circle is being matched or tested for. Thus, the expanded representation of the "." would be the full contents of a given block, such as Highland Plaza (no match there), Hutchens Avenue (no match there), or Carrol Circle (match found, answer returned is true). The "." stands, in turn, for the contents of each <block> as determined by the <xsl:value-of> context. If there is no match, then the return is, of course, false and no <block> name would be output because the contains() function test in its predicate had failed.

A few little details about contains() require further explanation, as they are not, perhaps, self-evident. If there is an empty string furnished for the second argument, the return will always be true. If the first string is empty, the answer is false unless the second is also empty, in which case the return is true. You might wonder how this could be the case. For clarity in our examples, we have supplied functions in expressions as arguments to other functions; however, it is possible that a subexpression as an argument to contains() might produce an empty string, so there must be an accounting for this contingency.

In Markup City, it might be a dark and stormy night and quite late, and all we know is that there is some turn we must take (our cellphone blipped out of range while getting directions) from one of the roads in town that has "Street" in its name. We don't know anything more than this. The local expert says he doesn't know that part of town, but thinks there may be something called Street off one of the <thoroughfare>s along the <parkway>. This disquieting absence of information is not a problem for our XPath navigator, however, because as Example 5-14 shows, we need only find <thoroughfare>s with turns going off of them that have Street in their name.

Example 5-14 XML for the contains() function example.
<?xml version="1.0"?> <parkway>       <thoroughfare>Governor Drive</thoroughfare>       <thoroughfare name="Whitesburg Drive">           <sidestreet>Bob Wallace Avenue</sidestreet>           <sidestreet>Woodridge Street</sidestreet>       </thoroughfare>       <thoroughfare name="Bankhead">           <sidestreet>Tollgate Road</sidestreet>           <sidestreet>Oak Drive</sidestreet>       </thoroughfare> </parkway> 

The XPath expression using the contains() function will have a node test for a <thoroughfare> that contains a predicate testing for true if the children of that <thoroughfare> have the desired Street component in their names.

//thoroughfare//*[contains(., 'Street')] 

or

<xsl:template match="/">     <xsl:value-of select="//thoroughfare//*[contains(.,     'Street')]" /> </xsl:template> 

The result would be the string value of the node that contains Street, or Woodridge Street. We are simply looking for any descendent of <thoroughfare> whose own contents designated by the "." abbreviation given as the first argument to contains() contain Street. We could also get the name of that <thoroughfare> by selecting on its attribute value.

//thoroughfare/@name[//*[contains(., 'Street')]] 

or

<xsl:template match="/">       <xsl:value-of select="//thoroughfare/@name[//*[contains(.,       'Street')]]" /> </xsl:template> 

This gives the value of any attribute (@) called name, which is part of a <thoroughfare> whose descendent contains the word Street somewhere in it. The return would be Whitesburg Drive. Thus, with the contains() function, you can include or exclude various nodes and strings from the output result based on the presence or absence of a particular set of characters.

5.3.10 The starts-with() Function

The starts-with() function provides a way to test a string for a starting value. Although this function is in the string core function group, it has the return type of boolean. It will only return a Boolean true or false answer. It requires two string arguments, as shown in the following function prototype, an input or initial string, and a second argument, which is the string being matched or tested for in the input string. The starts-with() function operates on strings or objects that will be converted to strings prior to processing.

Function: boolean starts-with(string, string)
Function Name Core Function Group Returns Arguments Argument Type
starts-with() String Boolean String String required required

The starts-with() function is an existential function because, while it is used to match on strings, it returns a Boolean true or false value, not the string. It is actually testing for the existence of a substring within the string. It acts somewhat like a position-specific contains() function, checking to see if the string contains a substring, but specifically at the beginning of the string. For example, starts-with('xml', 'x') will result in a true value, because xml starts with an x.

On the other hand, the following expression from Markup City will result in a false value:

starts-with(//block, 'x') 

This function call returns a false because the expression is testing to see if the string that is the result of the node-set conversion, which yields Panorama Street, starts with an x, which is false.

The arguments to starts-with() are the same as for contains(). The first is a input string and the second is the required match test. The distinction, of course, is that starts-with() can only return true if the test string in the second argument is matched in exact order by the corresponding first characters of the string in the first argument. As above, if the second argument is an empty string, the result is true, but if the first argument is empty, the result is true only if the second argument is also empty.

As long as there is a match in the first argument, beginning with its first character, then the processor will continue testing character-by-character until all the characters of the second set have been matched in order, and the return is true. It stands to reason, then, if the second argument is longer than the first, there can only be a return of false, because all its characters cannot be matched.

5.3.11 The string-length() Function

The string-length() function counts the number of characters in the initial string. It will take the string supplied in the argument or the current node if no argument is supplied, converted to a string and return the number of characters the string contains. The string-length() function is the only string function that returns a number. It takes one optional string argument, as shown in the following function prototype. Like most string functions, it operates on strings or any object that will be converted to a string prior to processing.

Function: number string-length(string?)
Function Name Core Function Group Returns Arguments Argument Type
string-length() String Number String optional

The string-length() function, because it returns a number, can be used in equivalence tests, using the various number operators such as less-than, equals, greater-than, and so on. It can also be manipulated by other functions that act on numbers.

We can use the string-length() function, as shown in Example 5-15, if we want to take the names of our <block>s and automatically center them on a printed page (assuming that we don't have automatic centering on our printer). We would start with the length of the longest block name, in this case Old Chimney Road, which is the fifth <block>.

The result of this expression is 16. Normally, a page is 80 characters wide, so we would subtract 16 from 80, which would be 64, then divide by 2, which is 32. So we would need to start our centered title on the thirty-second character of the page. If we wanted to list all of them, we could work with <xsl:for-each> and use a similar expression with string-length().

Example 5-15 Using the string-length() function.
INPUT: <?xml version="1.0"?> <boulevard>              <block>Panorama Street</block>              <block>Highland Plaza</block>              <block>Hutchens Avenue</block>              <block>Wildwood Drive</block>              <block>Old Chimney Road</block>              <block>Carrol Circle</block> </boulevard> FUNCTION: string-length(//boulevard//block[5]) TEMPLATE RULE: <xsl:template match="/">     <xsl:value-of select="string-length(//boulevard//block[5])" /> </xsl:template> 

Note

Character entity references, such as &nbsp; or &lt;, for the non-breaking space and for the less-than symbol, respectively, will be counted as one character because the entities that represent those characters are resolved prior to processing; the entities themselves are not strings according to the string-length() function's argument evaluation protocol.

5.4 Boolean Core Function Group

The Boolean core function group deals with all four kinds of XPath objects, however, the results are always converted to a Boolean. The Boolean functions build upon principles that lay at the core of all programming languages, prepositional logic and artificial intelligence. Boolean functions serve as binary operators true or false that act as switches in the course of an XPath function expression. They determine whether a given context's content meets a specific criteria and, depending upon the outcome and syntax, whether the evaluation of the expression will continue.

All five Boolean functions have a function return type of Boolean. Accordingly, they are discussed in order, with respect to the type of arguments they accept, beginning with the boolean() function itself, which requires an argument of general type object, that is converted to a Boolean by the evaluation of the function. Next are treated two functions that take no arguments, the false() and true() functions. There is a unique lang() function, which simply tests if the given language of the context furnished as a required string argument matches the xml:lang attribute on the context node. The xml:lang is not treated specially by XSLT since there is an XSLT specific attribute (lang) that covers the same functionality. The XSLT lang attribute is discussed in Chapter 9. Finally, there is a Boolean-to-Boolean function, the not() function, which requires a Boolean argument.

5.4.1 The boolean() Function

The boolean() function is the fundamental conversion function for rendering a binary true or false value from any type of object furnished as its required argument. Very much like the string() function, the boolean() function is implicitly used in any function that returns a Boolean. It operates "under the hood" of many functions, such as contains() and starts-with(). It will rarely be used other than to deliberately convert an object to a Boolean in some context other than one in which a given function already does so by the nature of its specified function return type.

Function: boolean boolean(object)
Function Name Core Function Group Returns Arguments Argument Type
boolean() Boolean Boolean Object required

Since this function is rarely called explicitly, concocting a representative example is difficult and prompts a certain punchiness on the part of the authors. Listeners with either love or hate to Rush Limbaugh will notice a frequent, and arguably funky, guitar riff when he returns to the microphone from a station or sponsor break. This song's primary refrain is "I went back to Ohio, and my city was gone."

If we went back to the original Markup City (Example 5-4) on the aforementioned dark and stormy night, we might have visibility problems and simply want to know if the city was there at all. We could expeditiously test for the document element of <main> using boolean(), as shown in Example 5-16

Example 5-16 Boolean test for existance.
boolean(main) or <xsl:template match="/">       <xsl:value-of select="boolean(main)" /> </xsl:template> 

This simply asks, is the main element there? Since the city is indeed there, we get the result of true. Then we could ask if it has a given kind of road, as follows.

boolean(//block | //lane | //sidestreet) 

or

<xsl:template match="/">       <xsl:value-of select="boolean(//block | //lane |       //sidestreet)" /> </xsl:template> 

This will return true because there are blocks. Using an or operand (here abbreviated with |), as soon as a true case is found, evaluation is discontinued. We could easily make this false as follows.

boolean(//block and //lane and //sidestreet) 

or

<xsl:template match="/">       <xsl:value-of select="boolean(//block and //lane and       //sidestreet)" /> </xsl:template> 

When evaluating and operands, as soon as one turns up false, the rest of the expression is not evaluated and the answer returned is false.

5.4.2 Boolean Conversion Rules

The basic criteria for evaluating a boolean() function according to object type is as follows:

  1. Node-sets - As long as there is a node in the node-set, or it is non-empty, this argument will permit the boolean() function to return true.

  2. Strings - As long as the string is of a length other than zero, this argument will permit the boolean() function to return true.

  3. Numbers - In this case, we see some degree of interpretation of the rather esoteric positive and negative zeros: as long as the number is neither type of zero, and is not NaN, is not "not a number," or is anything other than a number, this argument will permit the boolean() function to return true.

  4. Other - Any other object is converted to a Boolean according to rules dependant upon its respective type. If this is unsettling, rest assured that the lion's share of objects in XPath will always be of the four basic types with hard-to-imagine exception.

5.4.3 The false() Function

The false() function returns a Boolean only of the type false. It accepts no arguments, as shown in the following function prototype.

Function: boolean false()
Function Name Core Function Group Returns Arguments Argument Type
false() Boolean Boolean None none

Although it is not very useful alone, this function can be used in combination with other functions and in equivalence tests. For example, the function string() expects an object as an argument, but passing in the word false results in a null string. Passing in false() results in the string value of false.

5.4.4 The true() Function

The true() function returns a Boolean only of the type true. It accepts no arguments, but applies to the current node.

Function: boolean true()
Function Name Core Function Group Returns Arguments Argument Type
true() Boolean Boolean None

Although it is not very useful alone, this function can be used in combination with other functions and in equivalence tests. For example, the function string() expects an object as an argument, but passing in the word true results in a null string. Passing in true() results in the string value of true.

5.4.5 The lang() Function

The Boolean lang() function is used to verify the current language as specified in the context node, or an ancestor of the context node, with the xml:lang attribute. The lang() function operates on a string, provided as its required argument, and returns either a true or a false Boolean when evaluated. The following function prototype describes the structure of the lang() function.

Function: boolean lang(string)
Function Name Core Function Group Returns Arguments Argument Type
lang() Boolean Boolean String required

When a node has a language declared with the xml:lang attribute, this information is available to the node and all its descendants, and can be tested for using the lang() function. The language codes tested for are either two-letter codes specified in the ISO 639 table of language abbreviations, or a code with a subcode for country specific languages, specified in ISO 3166.[7] If there is no language specified, the lang() function returns false.

Regrettably, there is nothing intuitive about this function. It does not check the actual strings of the node to see if they are, indeed, French, for instance. So, it is worth noting that just because someone declares a chapter to be in Greek (with <chapter xml:lang="gr">), the entire following text could be English, and using the function call lang("gr") will still return true. In effect, regardless what kind of string content is contained in the <chapter xml:lang="gr"> node, the XSLT processor will rightly say, "It's all Greek to me," (or, of course, more literally and less euphemistically, true).

The lang() function is case insensitive and ignores suffixes when testing the value of the language. For example, from the XSLT specification, the following will always be true for lang("en").

<para xml:lang="en"/> <div xml:lang="en"><para/></div> <para xml:lang="EN"/> <para xml:lang="en-us"/> 

5.4.6 The not() Function

The not() function returns true when its argument is false, and false if its argument is true. The single required argument is a Boolean, as shown in the following function prototype. The not() function operates on Boolean objects, or any object that can be converted to a Boolean value prior to processing.

Function: boolean not(Boolean)
Function Name Core Function Group Returns Arguments Argument Type
not() Boolean Boolean Boolean required

With the Boolean not() function, a great deal of contradictory layering can be strung together, producing amusing or infuriating, depending on your relative patience with extended contradictions expressions. For example, not(true()) returns a false value, while not(false()) returns a true value. We will use the XML in Example 5-17 for the following examples.

Example 5-17 XML for the not() function.
<?xml version="1.0"?> <parkway>       <thoroughfare>Governor Drive</thoroughfare>       <thoroughfare name="Whitesburg Drive">             <sidestreet>Bob Wallace Avenue</sidestreet>             <block>1st Street</block>             <block>2nd Street</block>             <block>3rd Street</block>             <sidestreet>Woodridge Street</sidestreet>       </thoroughfare>       <thoroughfare name="Bankhead">             <sidestreet>Tollgate Road</sidestreet>             <block>First Street</block>             <block>Second Street</block>             <block>Third Street</block>             <sidestreet>Oak Drive</sidestreet>       </thoroughfare> </parkway> 

Using the expression //thoroughfare[block] we could search for only thoroughfares that contained block elements. If we add a not() to the predicate, we could then search for only thoroughfares that did not contain blocks.

<xsl:template match="/">       <xsl:value-of select="//thoroughfare[not(block)]"/> </xsl:template> 

This would select the first thoroughfare that did not contain a <block>, or Governor Drive.

In another case, we might want to make sure we only use the <block>s that have a numerical value, for example 3rd Street, but not Third Street. The following XPath expression looks for <block> descendents, using substring()and number() functions, as well as the not() function, as follows.

number(substring(//block, 1, 1)) = not(NaN) 

or

<xsl:template match="/">     <xsl:value-of select="number(substring(., 1, 1)) =     not(NaN)" /> </xsl:template> 

This expression will return the value true for every <block> element that contains a digit as the first character. The logic behind this expression can be unpacked (inside-out) as shown in Table 5-5.

Table 5-5. Possible function returns using number operators on Markup City from Example 5-15
Function Name Object Returned Markup City Equivalent
//block Returns the string of the node 1st Street
substring(., 1, 1) Returns the 1st character of the text node "1"
number(substring(., 1, 1)) Converts the string "1" to a number 1
number(substring(., 1, 1)) = not(NaN) Tests to see if the number Sresulting from the substring() and number() functions is not equal to NaN (not a number) using the not() function true

The reversal of NaN using the not() function returns true for any number, which is the value we want if the first character actually is a number. Testing not(NaN) asks, in fact, if the argument's value is not something that is not a number. In other words, is it a number? The only other way to ask if something is a number would be to ask if the starting position, for instance, contained (using contains()) 1, 2, 3, 4, 5, 6, 7, 8, 9, or 0. This is a long and clearly inefficient way to make such determinations and is open to many possible errors. It is simplest to remember that to ask if something is a number, ask if it is not "not a number," or not(NaN).

5.5 Number Core Function Group

The number functions all return a number as their function return type. It contains five functions, one of which, number(), serves as the fundamental conversion of any object to a number.

The five functions in the number core function group are covered below, beginning with the number() conversion function, which takes an optional object argument to be converted to a number. This function is similar to string() and boolean() in that it works ubiquitously under the hood while not frequently being explicitly called itself. Following this is the sum() function, which serves to convert a required node-set object's string value to a number. The remaining three functions include a top-end test of a given numerical sequence, the ceiling() function, a bottom-end test, the floor() function, and the round() function, to which we referred earlier.

5.5.1 The number() Function

The number() function is the primary conversion function used to convert an object to a number. It has one optional argument of type object, as shown in the following function prototype. If no argument is supplied, the default is the current node.

Function: number number(object?)
Function Name Core Function Group Returns Arguments Argument Type
number() Number Number Object optional

The number() function is implicitly used in any other function that returns a number, such as the count() function in the node-set core function group or the string-length() function in the string core function group. Accordingly, number() is not often called on its own.

Because the argument to the number() function is of type object, any of the four object types, node-set, string, Boolean, or number can be used. Converting each of these object types to a number has a specific set of rules, which are covered in the next section.

5.5.2 Number Conversion Rules

The four object types, node-set, string, Boolean, or number can be converted to a number following the conversion rules specified by the W3C XPath recommendation (section 4.4) as follows.

  1. A string that consists of optional whitespace followed by an optional minus sign followed by a Number followed by whitespace is converted to the IEEE 754 number that is nearest (according to the IEEE 754 round-to-nearest rule) to the mathematical value represented by the string; any other string is converted to NaN.

    This basically says that if the string looks like a number, it is converted to that number. Whitespace is stripped, and the value is given for that number according to IEEE 754 and rounding; otherwise it is not a number, and the string is NaN in its numerical value.For example, using our Markup City, number(//block) will return the value NaN because the resolved value of //block returns a string that is not numbers, the text value of the node. If the resolved value of an expression is a string that looks like a number, it can be converted to a number for use by other functions that are expecting a number. For example, number('1235') returns the number 1235, which can then be interpreted by another function.

    A more useful example includes using number(substring('1st Street', 1, 1)) to extract the street number from the element <block>1st Street</block>, using the substring() and number() functions.

    This would return the number 1. If you did not use the number() function, the returned value would be the string "1."

  2. Boolean true is converted to 1; boolean false is converted to 0.

    This rule follows basic "bit" rules where 0 is off and 1 is on; a 1 (the existence of something) will always be true, a 0 (the absence of something) will always be false.[8] Recall from our contains() example that we had an expression that returned a true value. If we wrap that same function in the number() function, number(contains('xml', 'x')), the result will be to convert the string true to a number 1.

  3. A node-set is first converted to a string as if by a call to the string() function and then converted in the same way as a string argument.

    Node-sets are converted to a string according to the string conversion rules in Section 5.3.2, in which the node-set's first node is converted to a string and its string-value is then converted to a number according to the string-to-number conversion rules (see 1 above). Again, if the resulting string is not a number, the value will be NaN.

  4. An object of a type other than the four basic types is converted to a number in a way that is dependent on that type.

    The fact that there can be other types of objects leaves the interpretation of the conversion of those objects up to the creator of those objects. If a new object type is created and managed, for example, by an extension element, the namespace for that extension should supply the conversion rules for that object.

5.5.3 The sum() Function

The sum() function adds the value of each node in a node-set, after it has been converted to a number. This of course means that the nodes in the node-set being converted must contain numbers or digits; otherwise the result of this function will be NaN. The sum() function returns a number as a function return type. It requires a single argument of a type node-set, as shown in the following function prototype. The node-set supplied will be converted to a number according to the conversion rules for node-sets, mentioned in the previous section on number().

Function: number sum(node-set)
Function Name Core Function Group Returns Arguments Argument Type
sum() Number number Node-set required

If our Markup City had yet another set of streets, perhaps in the French Quarter, called <rue>s, we could then use the sum() function to add up the contents of the <rue> elements,[9] as shown in Example 5-18.

Example 5-18 XML for the sum() function example.
<parkway>       <thoroughfare>Governor Drive</thoroughfare>       <thoroughfare name="Whitesburg Drive">             <sidestreet>Bob Wallace Avenue</sidestreet>             <block>1st Street</block>             <block>2nd Street</block>             <block>3rd Street</block>             <sidestreet>Woodridge Street</sidestreet>       </thoroughfare>       <thoroughfare name="Concord">             <rue>47</rue>             <rue>48</rue>             <rue>49</rue>       </thoroughfare> </parkway> FUNCTION: sum(//rue) TEMPLATE RULE: <xsl:template match="/"> <xsl:value-of select="sum(//rue)"/> </xsl:template> 

Using the sum() function would return the value of the sum of 47, 48, and 49, which is equal to 144.

5.5.4 The ceiling() Function

The ceiling() function finds the nearest integer that is, not a decimal number larger than the number supplied as its argument. In effect, it works like a rounding function that always rounds up. It has one required argument, which is of type number, as shown in the following function prototype.

The ceiling() function is one of a pair with a complimentary function, the floor() function, discussed immediately below. The ceiling() function operates on numbers or objects that are converted to numbers prior to processing.

Function: number ceiling(number)
Function Name Core Function Group Returns Arguments Argument Type
ceiling() Number number Number required

In effect, the ceiling() function is like the round() function, except the ceiling() function will find the nearest possible integer that is greater than the input argument. For example, the function call ceiling(2.8), would return 3; the ceiling() for 2.1 is also 3. Negative numbers work the same as rounding, finding the nearest negative integer greater than the argument; ceiling(-36.3) would be 36, and so on.

The ceiling() function is similar to round(), except that ceiling() always rounds up, whereas the round() function rounds up or down, depending on the number.

5.5.5 The floor() Function

The floor() function provides a sort of backward rounding functionality. It finds the nearest integer that is, not a decimal number smaller than the number supplied as its argument. Where normal rounding goes upward or downward, depending on the number, floor() always rounds down, finding the smallest integer nearest to the input argument. The floor() function has one required argument, which is a number, as shown in the following function prototype. It is one of a pair with a complimentary function, the ceiling() function, and performs the reverse of ceiling().

Function: number floor(number)
Function Name Core Function Group Returns Arguments Argument Type
floor() Number Number Number required

For example, the function call floor(2.8) returns a value of 2, and floor(2.1) is also 2. Using floor() with negative numbers, the rounding goes to the nearest integer in the opposite direction. For example, floor(-36.3) gives 37, which is the nearest integer still smaller than 36.3.

The floor() function is similar to round(), except that floor() always rounds down, whereas the round() function rounds up or down, depending on the number.

5.5.6 The round() Function

The round() function returns a number that is the result of rounding according to the rules specified in IEEE 754. It requires a single number input argument and operates on numbers or objects that are converted to numbers prior to processing. The following function prototype shows the structure of the round() function.

Function: number round(number)
Function Name Core Function Group Returns Arguments Argument Type
round() Number Number Number required

In XPath, rounding functions work the way you would normally expect rounding to work. The decimal counting idea of "5 or more rounds upward" is observed consistently. Accordingly, when using the function call substring(1.5), the result would be rounded up to 2, since 1.5 is not an integer, but 2 is. The rounding is always done to the nearest integer. Obviously, then, 1.8 would also round to 2, and 1.4 would round down to 1.

Things get more complicated with negative integers, however. An input argument of 0.5 up to 0 rounds to negative zero. Positive zero is always positive zero after rounding, and the same is true of negative zero. The same also holds for positive and negative infinity, which remain positive and negative infinity, respectively, when rounded. If the input argument is not a number, or NaN, then it remains NaN.

[1] XSLT also has a set of functions described in section 12 of the W3C XSLT specification. These are introduced separately in Chapter 11.

[2] You may not immediately realize that in this usage, argument does not have a connotation of contention. It derives from the Latin connotation of argumentum, which refers to a particular quanta that influences a consideration, rather than to an act of arguing. However, this nuance is still synonymous, as the computer meaning of argument is to provide content to influence a particular process. Of course, when a given function employing one or more arguments is not yielding the desired result, you might easily feel a contentious sense of argument with the given function!

[3] A glance in the Oxford English Dictionary indicates that the meaning of string is surely not singular, even without considering its programming connotations. We could call this a moment's pondering on "string theory," but that too is still another connotation (the latest and greatest rage in the never-ending "theory of everything quest" among physicists...in terms of being a viable goal in markup terms, consider it as sort of like finding a set of semantics suitable for all possible data).

[4] Tokens are equivalent to the series of characters. If there is more than one string, it is demarcated by whitespace (spaces separating each set of characters). In effect, whether the sequence is a token or a word to the human eye, it is still nothing more than just a line-up of digital "stuff" to the computer, once it has become a string.

[5] If you're familiar with UNIX, concat() performs the same function as the UNIX "cat" function. Another example is the cons and concatenate functions in Lisp, except in Reverse-Polish Notation (RPN) order, wherein the input string in concat() is the first argument, not the last, as with cons in Lisp. Unlike Lisp concatenate, concat() does not require specification of an object type because, of course, its input arguments are predefined by the W3C specification to be of type string.

[6] Programmers may recognize translate() as a function somewhat like intersection in the Lisp programming language, which takes arguments, the results of which reflect the commonality between the arguments.

[7] See http://www.w3.org/TR/1998/REC-xml-19980210#ISO3166.

[8] What that "something" is can be potentially fuel conversations on existentialism. What does it mean to be something that signifies nothing? One has to wonder, do bits and bytes ask "why am I here?"

[9] This summing up of a node-set could get quite complicated if there were elements of the same name nested inside each other, for example if our Markup City was in Bangkok, where a side street is called a soi and a soi can get big enough to have its own branching sois.

CONTENTS


XSLT and XPATH(c) A Guide to XML Transformations
XSLT and XPATH: A Guide to XML Transformations
ISBN: 0130404462
EAN: 2147483647
Year: 2005
Pages: 18

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net