Functions and XSLT | The Official XMLSPY Handbook

In this chapter and in Chapter 6, I have used XPath functions such as position without much explanation. In XPath there are a number of functions that perform specific operations. When writing XPath, you need to know what the functions are and what they can do. The XPath functions can be broken into the following groups: node-set, string, Boolean, number, and XSLT-defined functions. These functions are discussed in the following sections.

XSLT data types and conversion

XML works because XML is based on text that can be either eight bit or Unicode. But in either case, it is still text. Text is a better approach because you don’t have to worry about computer translation errors. Not all operations in XSLT, however, are based on text. XSLT includes five different data types:

Boolean: A true or false value
Number: A double precision floating-point number
String: A sequence of ASCII or Unicode characters
Node-set: A set of XML nodes from the original XML document
External object: Anything that doesn’t fit into one of the previous four data types. This data type is specifically geared toward the integration of external objects and references. This is made possible by the XSLT specification that allows the referencing of external functions.

Depending on the function or context, XSLT automatically converts a data from one type to another.

Node-set functions

The node-set functions enable you to know what node-set is being managed. This is important when using repetitive XSLT instructions. The following list provides the instructions that you can use to figure out the properties of a node-set:

number last(): Returns the last position of the current context node-set as a number, which is the same as the node-set size.
number position(): Returns the current position of the current context node-set as a number.
number count( node-set): Returns the total number of nodes in the node-set passed in as a parameter.
node-set id( object): Retrieves the unique identifier of the individual XML nodes. However, in all the examples used thus far in the book, the id function did not work. The id function requires a DTD of the ID attribute. This means that if the ID attribute exists, but no DTD is referenced, the id function will return an empty string. If IDs are active in the document, no XML node can have the same ID. IDs must be unique; and if they are not, the XML parser will return an error.
string local-name( node-set): Retrieves the local name of the first node of the node-set.
string namespace-uri( node-set): Retrieves the namespace of the first node of the node-set. If there is no namespace, an empty string is returned.
string name( node-set): Retrieves the QName of the first node of the node-set. A QName is the combination of the namespace and name of the node. A catch, though, is that the returned name may not be the name that you expected. For example, if an XSL document is being transformed, the xsl:value-of node is not a QName because the xsl prefix has not been expanded. The QName would be www.w3.org/1999/XSL/ Transform:value-of.

String functions

The following string functions enable you to manipulate strings by extracting substrings or building new strings:

string string( object): Converts an object to a string using the following conversions:
- Nodeset: Converts the nodes into a string value using default processing rules.
- Number: Converted using the following notations:
  - Nan is converted to the string NaN.
  - Negative/positive zero is converted to string 0.
  - Positive infinity is converted to string infinity.
  - Negative infinity is converted to string –infinity.
  - Integer-based number is converted to a number with no decimal points and no leading zeros. A negative integer has a minus sign prepended.
  - All other numbers are converted to a string with decimal points and a negative sign if necessary.
- Boolean true value is converted to the string true, and false value is converted to the string false.
- Object type is converted to a string, but the exact format of the string is dependent on the implementation of the object.
string concat( string, string, string...): Combines various strings. The concat function can, in theory, have as many parameters as needed. These are then combined into one large string.
boolean starts-with( string, string): Tests the first parameter string to see if it starts with the second parameter string. A return value of true indicates that the first parameter string does contain the second parameter string. Otherwise a false value is returned.
boolean contains( string, string): Tests the first parameter string to see if it contains the second parameter string. A return value of true indicates that the first parameter string contains the second parameter string. Otherwise a false value is returned.
string substring-before( string, string): Returns a substring of the first parameter string that is before the found occurrence of the second string. For example, if the first parameter string is “hello world how are you” and the second parameter string is “world”, then the return value is “hello “ (notice the included space).
string substring-after( string, string): Returns a substring of the first parameter string that is after the found occurrence of the second string. For example, if the first parameter string is “hello world how are you” and the second parameter string is “world”, then the return value is “ how are you “ (notice the included space).
string substring( string, number, number): Returns a substring of the first parameter based on the indices of the start point of the second parameter and the length of the third parameter. For example, if the first parameter is “12345”, the second parameter is “3”, and the third parameter “3”, the return value is “345”. The second parameter is an index, where the string buffer starts with an index of “1”. If the third parameter is not specified, the buffer starting at the second parameter and ending with the end of the buffer is returned.
number string-length( string): Returns the length of string. If the first parameter is not passed in, the function defaults to the current context buffer.
string normalize-space( string): A useful function that strips extra spaces from the start of the buffer and the end of the buffer. If, within the buffer, there are multiple space sequences, they are replaced with a single space. If the first parameter is not passed in, the function defaults to the current context buffer.
string translate( string, string, string): An odd function that replaces text according to a pattern. If the first parameter is “hello” and the second parameter “l” and the third parameter “w”, the return value is “hewwo”. The function does a character-by-character comparison of the first parameter with the characters of the second parameter. If a character in the second parameter is found, the character in the first parameter is replaced by the corresponding character in the third parameter. The mapping of the second parameter to the third parameter is an index. So, if in the second parameter a character corresponds to the index of two, it is replaced with the third parameter character at the index of two. If a character is found in the second parameter buffer, which does not have a corresponding index in the third parameter, that character is deleted from the first parameter buffer. This can occur because the second parameter buffer is longer than the third parameter buffer.

Boolean functions

The following Boolean functions enable you to manipulate Boolean values.

boolean boolean( object): Converts the argument to a Boolean using the following conversions:
- number: Converted to false if the number is either positive or negative, zero or a NaN; otherwise, it is converted to true.
- node-set: Converted to true if node-set is non-empty, otherwise converted to false.
- string: Converted to true if string set is non-empty, otherwise converted to false.
- object: State of either true or false is dependent on the implementation of the object.
boolean not( boolean): Inverts the Boolean value, which means a true will be turned into a false and a false into a true.
boolean true(): Function that returns the value true. A good use of this function is when a test against a true value is needed or a true value is needed.
boolean false(): Function that returns the value false. A good use of this function is when a test against a false value is needed or a false value is needed.
boolean lang( string): Tests to see if the current context node is a specific language as specified by the xml:lang attribute. For example, if the following XML were used as a basis
```
<data xml:lang="en" />
```
and the function lang( “en”) were called, a true return value would result.

Number functions

The following number functions enable you to manipulate numbers and figure out what a number represents:

number number( object): Converts the argument to a number using the following conversions:
- string: Converted to a number if the string is a number like 1, –1.1, or 0.1. Otherwise a NaN is generated.
- boolean: A true value is converted zero and a false value is converted to the value 1.
- node-set: A double conversion occurs. The first conversion is to a string and the next is to a number, as per the string to number conversion rule.
- object: Value of the number is dependent on the implementation of the object.
number sum( node-set): Returns the sum of the nodes in the argument node-set. The numbers are generated by converting each node into a string, and the strings are converted to numbers. These numbers are added together. An example of adding numbers would be as follows:
```
<nums>   <val>2</val>   <val>2</val>   <val>2</val>   <val>2</val> </nums>
```
and the XSLT:
```
<xsl:template match="nums">   Sum (<xsl:value-of select="sum( child::*)" />) </xsl:template>
```
If any of the nodes in the XML result in a NaN, the addition is also a NaN.
number floor( number): Returns the largest integer value of a number, without being larger than the original number. If the number were 2.345, then the function floor would return 2. But if the number were –2.345, then the floor would return –3. The negative value is correct, and a quick think validates it.
number ceiling( number): Returns the smallest number that is not less than the original number and is an integer. If the number were 2.345, then the function floor would return 3. But if the number were –2.345, then the floor would return –2.
number round( number): Returns the number that is closest to the argument and is an integer. If the number is on a split (for example, 0.5), then the integer closest to positive infinity is used. Therefore, a round of 1.5 is 2, and round of –1.5 is –1. If the number is 0 a round is 0, and if the number is –0 a round is –0. If the number is –0.5, then a round is –0.

Additional XSLT functions

The additional XSLT functions are functions that are not part of the XPath specification but are part of the XSLT specification. Typically, these functions are very specific to XSLT and are shown in the following list:

node-set document( object, node-set): Allows access to XML documents other than the main source document. For example, if the first parameter were a string, the string would be treated as a URI that references another XML document.
node-set key( string, object): The XPath id function is interesting in that it makes it possible to uniquely identify a node. But as was stated in the section defining the id function, this only works if the DTD and ID specifications are present. An XSLT processor is not required to recognize DTDs. This means that using the ID identifier is prone to error. Another way of declaring unique identifiers is to use a key. An XSL key is more flexible because it does not rely on a specific attribute. If we go back to the original XML document, the elements nodes had index attributes used to represent a unique element. Using the xsl:key instruction, a definition of a key would be as follows:
```
<xsl:key name="element-identifier" match="elements" use="@index" />
```
In this declaration, there is the name of the key as defined by the name attribute. The match attribute defines which XML nodes are to be matched, and the use attributes define which item are defined to contain a unique key. The unique key is referenced using the following notation:
```
key( ‘element-identifier’, .)
```
What is returned is a node-set that contains all nodes containing a unique key, as per our key definition.
string format-number( number, string, string): The format number function converts the first argument into a string using the format specified by the second and third parameters.
node-set current(): Returns the current context node and has the same functionality as the period abbreviated operator. However, the current context node will shift as an XPath is written.
string unparsed-entity-uri( string): Returns the URI of the unparsed entity with a specific name in the document of the context node.
string generate-id( node-set): Generates a unique identifier for a specific node in the node-set. The XSLT processor is free to generate any type of unique identifier.
object system-property( string): Retrieves information specific to the XSLT processor:
- xsl:version: Returns a version number of the XSLT implemented
- xsl:vendor: Returns a string identifying the vendor. The XSLT processor for XMLSPY is identified as follows: Altova GmbH & Altova, Inc.
- xsl:vendor-url: Returns a string that contains the URL of the vendor of the XSLT processor.
boolean element-available( string): When using extensions within the XSLT processor, it is often necessary to test if an element is available. The element represents an instruction, much like XSL, except that it is specific to the extension. The parameter input is a QName that represents an instruction. If the instruction can be processed, the return value is true; otherwise, a false value is returned.
boolean function-available( string): This function is identical in purpose to the element-available function, except that the test is for a function and not for an instruction. If the function can be processed, a return value of true is generated; otherwise, false is returned.