The XPath String Operators and Functions

XPath 1.0 doesn't have any operators that are specially designed to work with strings, but it does have many string functions:

concat(string string1 , string string2 , ...) This function returns all strings you pass to it concatenated (that is, joined) together.
contains(string string1 , string string2 ) This function returns true if the first string contains the second one.
normalize-space(string string1 ) This function returns string1 after leading and trailing whitespace is stripped and multiple consecutive whitespace is replaced with a single space.
starts-with (string string1 , string string2 ) This function returns true if the first string starts with the second string.
string(string string1 ) Returns the argument you pass to it in string form.
string-length (string string1 ) This function returns the number of characters in string1 .
substring(string string1 , number offset , number length ) This function returns length characters from the string, starting at offset .
substring-after(string string1 , string string2 ) This function returns the part of string1 after the first occurrence of string2 .
substring-before (string string1 , string string2 ) This function returns the part of string1 up to the first occurrence of string2 .
translate(string string1 , string string2 , string string3 ) This function returns string1 with all occurrences of the characters that occur in string2 replaced by corresponding characters (that is, characters that occur at the same location) in string3 .

We'll take a look at these various string functions at work here.

The `concat` Function

The concat function concatenates (joins) as many strings together as you pass to it, returning the concatenated string. You can see XPath Visualiser evaluating concat("Now ", "is ", "the ", "time.") in Figure 4.4.

Figure 4.4. Using the `concat` function.

graphics/04fig04.jpg

Here are some XSLT templates that concatenate the value of various elements in ch04_01.xml with their units:

 <xsl:template match="radius">  <xsl:value-of select="concat(., ' ', @units)"/>  </xsl:template>     <xsl:template match="mass">  <xsl:value-of select="concat(., ' ', @units)"/>  </xsl:template>     <xsl:template match="day">  <xsl:value-of select="concat(., ' ', @units)"/>  </xsl:template> </xsl:stylesheet>

By concatenating the values of elements with their units, these templates display values such as 43.4 million miles , and so on.

The `contains` Function

The contains function checks to see if one string is contained inside another, and returns a value of true if so, false otherwise . Here's how you use this function: contains( container-string, contained-string ) .

Here's an example of a template using XSLT and the contains function; in this case, we'll search all attributes in the document for the word "days", and if found, will substitute the text "Why not use years instead?" in the result document:

  <xsl:template match="//*[contains(@units, 'days')]">  <xsl:text>Why not use years instead?</xsl:text> </xsl:template>

Here's the result document:

 <HTML>     <HEAD>         <TITLE>             Planetary Data         </TITLE>     </HEAD>     <BODY>         <H1>             Planetary Data         </H1>         <TABLE BORDER="2">             <TR>                 <TD>Name</TD>                 <TD>Mass</TD>                 <TD>Radius</TD>                 <TD>Day</TD>                 <TD>Distance</TD>             </TR>             <TR>                 <TD>Mercury</TD>                 <TD>.0553 (Earth = 1)</TD>                 <TD>1516 miles</TD>  <TD>Why not use years instead?</TD>  <TD>43.4 million miles</TD>             </TR>             <TR>                 <TD>Venus</TD>                 <TD>.815 (Earth = 1)</TD>                 <TD>3716 miles</TD>  <TD>Why not use years instead?</TD>  <TD>66.8 million miles</TD>             </TR>             <TR>                 <TD>Earth</TD>                 <TD>1 (Earth = 1)</TD>                 <TD>2107 miles</TD>  <TD>Why not use years instead?</TD>  <TD>124.4 million miles</TD>             </TR>         </TABLE>     </BODY> </HTML>

The `normalize-space` Function

You use the normalize-space function to remove leading and trailing whitespace and condense all internal adjacent whitespace into a single space, returning the resulting string. You can see the XPath Visualiser evaluating the expression normalize-space(" Now is the time. ") in Figure 4.5.

Figure 4.5. Using the `normalize-space` function.

graphics/04fig05.jpg

Here's an example that uses XSLT; in this case, we might start by adding extra whitespace to the units attribute in all of the ch04_01.xml document's <distance> elements:

 <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xml" href="planets.xsl"?> <planets>     <planet>         <name>Mercury</name>         <mass units="(Earth = 1)">.0553</mass>         <day units="days">54.65</day>         <radius units="miles">1516</radius>         <density units="(Earth = 1)">.983</density>  <distance units="million       miles">43.4</distance><!--At perihelion-->  </planet>         .         .         .

You can remove this extra whitespace in XSLT using the normalize-space function like this:

 <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.1" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">     <xsl:template match="/planets">         <HTML>         .         .         .         </HTML>     </xsl:template>  <xsl:template match="distance">   <xsl:value-of select="."/>   <xsl:text> </xsl:text>   <xsl:value-of select="normalize-space(@units)"/>   </xsl:template>  .         .         .

You can see in the results that the extra whitespace has indeed been removed:

 <HTML>     <HEAD>         <TITLE>             Planetary Data         </TITLE>     </HEAD>     <BODY>         <H1>             Planetary Data         </H1>         <TABLE BORDER="2">             <TR>                 <TD>Name</TD>                 <TD>Mass</TD>                 <TD>Radius</TD>                 <TD>Distance</TD>             </TR>             <TR>                 <TD>Mercury</TD>                 <TD>.0553 (Earth = 1)</TD>                 <TD>1516 miles</TD>  <TD>43.4 million miles</TD>  </TR>         .         .         .

This function is useful in string handling because when you extract text from elements, you're often left with extra spaces (as when the text is indented).

The `starts-with` Function

You use the starts-with function to determine whether one string starts with another. Here's how you use it starts-with( string-to-examine, possible-start-string ) . This function returns a Boolean value of true if string-to-examine does indeed start with possible-start-string , and false otherwise.

Here's an example using XPath Visualiser. In this case, we'll look for text nodes whose text starts with the letter "E" like this: //text()[starts-with(., "E")] (recall that . refers to the context node). You can see the results in Figure 4.6, where we've located the Earth.

Figure 4.6. Using the `starts-with` function.

graphics/04fig06.jpg

On the other hand, if we had wanted to locate elements whose text content starts with "E", we could have used this location path : //*[starts-with(., "E")] . This works because an element's string value is all its contained strings, which in this case is simply "Earth". We can make use of that fact in an XSLT template where we're matching elements whose text content starts with "E"which means the Earth's <planet> element. In this example, we'll replace that element's text with "The Home Planet" like this:

 <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.1" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">  <xsl:template match="*[starts-with(., 'E')]">   <xsl:text>The Home Planet</xsl:text>   </xsl:template>  .         .         . </xsl:stylesheet>

And here's the resultnote that the text for Earth has indeed become "The Home Planet":

 <HTML>     <HEAD>         <TITLE>             Planetary Data         </TITLE>     </HEAD>     <BODY>         <H1>             Planetary Data         </H1>         <TABLE BORDER="2">             <TR>                 <TD>Name</TD>                 <TD>Mass</TD>                 <TD>Radius</TD>                 <TD>Day</TD>             </TR>             .             .             .             <TR>  <TD>The Home Planet</TD>  <TD>1 (Earth = 1)</TD>                 <TD>2107 miles</TD>                 <TD>1 days</TD>             </TR>         </TABLE>     </BODY> </HTML>

The `string` Function

The string function just converts the item you pass it to a string, and returns that string. In fact, you don't usually need to use this function, because conversions like this are made automatically. Even when an object is returned by an XPath function, it's converted automatically into a string if you want to display its value.

SHOP TALK : THE `STRING` FUNCTION

The truth is that the string function has almost no uses in XPath 1.0. The only use for it I've ever found is when you want to check a string value but node tests would give you a node-set instead.

Here's how that might work, using XSLT. Say that you wanted to keep track of the order of the singers at various performances , and so had <name> elements for the opera stars Mike, Todd, and Songlin, giving their singing order at various performances like this:

 <?xml version="1.0" encoding="UTF-8"?> <performances>     <performance>  <name>Mike</name>   <name>Todd</name>   <name>Songlin</name>  <month>12</month>         <day>24</day>     </performance>     <performance>  <name>Todd</name>   <name>Songlin</name>   <name>Mike</name>  <month>12</month>         <day>24</day>     </performance>     <performance>  <name>Songlin</name>   <name>Mike</name>   <name>Todd</name>  <month>12</month>         <day>24</day>     </performance> </performances>

Now what if you wanted to find the performance where Songlin was to sing first? A test like the following won't work, because name returns a node-set of all the context node's <name> children, and because every <performance> element has a child <name> element with the name Songlin in it, this test will always be true:

 <xsl:template match="performance">  <xsl:if test="name='Songlin'">  <TR>        <TD><xsl:value-of select="name"/></TD>        <TD><xsl:apply-templates select="month"/></TD>        <TD><xsl:apply-templates select="day"/></TD>     </TR>  </xsl:if> </xsl:template>

Instead of working with a node-set, if you want to test only the first <name> element in each <performance> element, you can use the string function, which returns a string, not a node-set:

 <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.1" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">     <xsl:template match="/performances">         <HTML>         .         .         .         </HTML>     </xsl:template>     <xsl:template match="performance">  <xsl:if test="string(name)='Songlin'">  <TR>           <TD><xsl:value-of select="name"/></TD>           <TD><xsl:apply-templates select="month"/></TD>           <TD><xsl:apply-templates select="day"/></TD>        </TR>     </xsl:if>    </xsl:template>     <xsl:template match="month">         <xsl:value-of select="."/>     </xsl:template>         .         .         .     <xsl:template match="day">         <xsl:value-of select="."/>     </xsl:template> </xsl:stylesheet>

Having said all this, however, note that you can solve the same problem by explicitly matching to name[1] in a template without having to use the string function at all.

The `string-length` Function

The string-length function returns the length of a string you pass to it. You can see an example in XPath Visualiser in Figure 4.7, where we're checking the length of the string "Now is the time."

Figure 4.7. Using the `string-length` function.

graphics/04fig07.jpg

You can see an example using this function in XSLT, where we're using string-length to determine the length of each planet's name, in ch04_03.xsl (see Listing 4.2).

Listing 4.2 Using the `string-length` Function ( `ch04_03.xsl` )

 <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.1" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">     <xsl:template match="/planets">         <HTML>             <HEAD>                 <TITLE>                     Using string-length                 </TITLE>             </HEAD>             <BODY>                 <H1>                     Using string-length                 </H1>                     <xsl:apply-templates/>             </BODY>         </HTML>     </xsl:template>     <xsl:template match="planet">  <xsl:value-of select="name"/> is <xsl:value-of   select="string-length(name)"/> letters in length.  <BR/>    </xsl:template>     <xsl:template match="*">     </xsl:template> </xsl:stylesheet>

And here's the result of ch04_03.xsl when applied to ch04_01.xml :

 <HTML>     <HEAD>         <TITLE>             Using string-length         </TITLE>     </HEAD>     <BODY>         <H1>             Using string-length         </H1>  Mercury is 7 letters in length.   <BR>   Venus is 5 letters in length.   <BR>   Earth is 5 letters in length.   <BR>  </BODY> </HTML>

The `substring` Function

The substring function returns a substring from a string. This function returns the substring of the source string starting at the starting position and continuing for the number of characters you've specifiedor to the end of the string if you haven't specified a number of characters to return. Here's how you use this function: substring( source-string, start-position, number-of-characters ) . You pass this function a source-string , a starting-position , and, optionally , a number-of-characters . If you ask for more characters than it's possible to return from the string, an error occurs.

You can see an example in the XPath Visualiser in Figure 4.8, where we're evaluating the expression substring("Now is the time.", 0, 3) .

Figure 4.8. Using the `substring` function.

graphics/04fig08.jpg

The substring function is one of three substring functions: substring-before , which returns the string preceding a matched substring; substring itself, which returns substrings that you specify; and substring-after , which returns the substring following a match. We'll see an XSLT example that uses all three functions after taking a look at the other two substring-after and substring-before .

The `substring-after` Function

The substring-after function returns the substring following a matched string. You pass this function a source string, and a string to match inside the source string. It will return the substring of the source string following the match if there was a match, or an empty string (that is, "") otherwise. Here's how you use this function:

 substring-after(  string, string-to-match  )

The `substring-before` Function

You can pass substring-before a source string, and a string to match inside the source string. It will return the substring in the source string preceding the match if there is a match; otherwise, it returns an empty string (that is, ""). Here's how you use substring-before :

 substring-before(  string, string-to-match  )

You can see an XSLT example showing how to use the substring-before , substring , and substring-after functions in ch04_04.xsl (Listing 4.3). Here, we'll use substring-before to get the substring of "Earth" before the "r", the substring function to get the "r" itself, and substring-after to get the text after the "r".

Listing 4.3 Using the `substring-before` , `substring` , and `substring-after` Functions ( `ch04_04.xsl` )

 <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.1" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">     <xsl:template match="/planets">         <HTML>             <HEAD>                 <TITLE>                     Planetary Information                 </TITLE>             </HEAD>             <BODY>                 <H1>                     Planetary Information                 </H1>                     <xsl:apply-templates/>             </BODY>         </HTML>     </xsl:template>  <xsl:template match="planet">   <xsl:if test="name='Earth'">   You are on   <xsl:value-of   select="concat(substring-before(name, 'r'), substring(name, 3, 1),   substring-after(name, 'r'))"/>.   <BR/>   </xsl:if>   </xsl:template>  <xsl:template match="*">     </xsl:template> </xsl:stylesheet>

Here's the result, where we've reassembled the Earth from its parts :

 <HTML>     <HEAD>         <TITLE>             Planetary Information         </TITLE>     </HEAD>     <BODY>         <H1>             Planetary Information         </H1>           You are on Earth.         <BR>     </BODY> </HTML>

The `translate` Function

You use the translate function to translate characters. You pass three stringsthe first is the string to work on, the next is a list of characters to match, and the last is a list of characters to replace the matched characters with. Each character in the first string that matches a character in the match string is replaced with the character in the same position in the replace string. Here's how you use this function:

 string translate(  string, from-characters, to-characters  )

For example, to convert "XSLT" (or any string) to lowercase, you could evaluate the expression translate("XSLT", "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz") , as you see in XPath Visualiser in Figure 4.9.

Figure 4.9. Using the `translate` function.

graphics/04fig09.jpg

The concat Function

Figure 4.4. Using the concat function.

The contains Function

The normalize-space Function

Figure 4.5. Using the normalize-space function.

The starts-with Function

Figure 4.6. Using the starts-with function.

The string Function

SHOP TALK : THE STRING FUNCTION

The string-length Function

Figure 4.7. Using the string-length function.

Listing 4.2 Using the string-length Function ( ch04_03.xsl )

The substring Function

Figure 4.8. Using the substring function.

The substring-after Function

The substring-before Function

Listing 4.3 Using the substring-before , substring , and substring-after Functions ( ch04_04.xsl )

The translate Function

Figure 4.9. Using the translate function.