Some Convenient RegEx UDFs


The listings for this chapter include a file called RegExFunctions.cfm, which creates several user-defined functions (UDFs) that might come in handy when you're working with regular expressions. To use the library, simply <cfinclude> it in your own templates. Table 13.5 lists the functions included in this simple UDF library.

Table 13.5. Functions in UDF Library RegExFunctions.cfm

FUNCTION

DESCRIPTION

reFindString()

Performs a regular expression match and returns the matched string. Returns an empty string if no match is found. This is a shortcut for using the pos[1] and len[1] values as shown in Listing 13.3.

reFindMatches()

Performs a regular expression match and returns a query object that contains a row for every substring that matched. This is a shortcut for the looping technique shown in Listing 13.5.

adjustNewlinesToLinefeeds()

Replaces any CRLF or CR sequences in a chunk of text with LF characters. This is handy when using multiline mode with (?m). Accepts just one argument, str, as shown in the section "Understanding Multiline Mode" in this chapter.


Using reFindString()

The reFindString() UDF function takes two required arguments and two optional arguments:

 reFindString(regex, string [, start] [, casesensitive]) 

The required regex and string arguments are the regular expression to use, and the chunk of text to search, respectively. The optional start argument is the text position at which to start the search (the default is 1). The optional casesensitive argument is a Boolean value that indicates whether the search should be case-sensitive (the default is False for no case-sensitivity).

The function returns the matched string. If no match is found, it returns an empty string. Listing 13.10 is a simple example that shows how the function might be used. It is similar to Listing 13.3, except that it uses the UDF function instead of reFind().

Listing 13.10. RegExFindEmail2a.cfmUsing the reFindString() Function from the UDF Library
 <!---  Filename: RegExFindEmail2a.cfm  Author: Nate Weiss (NMW)  Purpose: Demonstrates use of RegExFunctions.cfm library ---> <html> <head><title>Using a Regular Expression</title></head> <body> <!--- Include UDF library of regular expression functions ---> <!--- This allows us to use the REFindString() function ---> <cfinclude template="RegExFunctions.cfm"> <!--- The text to search ---> <cfset text = "My email address is nate@nateweiss.com. Write to me anytime."> <!--- Attempt to find a match ---> <cfset matchedString = reFindString("([\w._]+)\@([\w_]+(\.[\w_]+)+)", text)> <!--- Display the result ---> <cfif matchedString neq "">  <cfoutput><p>A match was found: #matchedString#</p></cfoutput> <cfelse>  <p>No matches were found.</p> </cfif> </body> </html> 

Using reFindMatches()

The reFindMatches() UDF function takes two required arguments and two optional arguments, as follows:

 reFindMatches(regex, string [, casesensitive] [, subexprcolumnnames]) 

The required regex and string arguments are the regular expression to use and the chunk of text to search, respectively. The optional casesensitive argument is a Boolean value that indicates whether the search should be case-sensitive (the default is False for no case-sensitivity). The optional subexpr columnnames argument is a comma-separated list of column names to assign to the matches of the RegEx's subexpressions.

The function returns a query object with the following column names:

  • Found. The actual matched text.

  • Len. The length of the matched text.

  • Pos. The position at which the matched text was found.

  • SubExprCount. The number of subexpressions in the RegEx. This value is the same for all rows of the query object.

The query object will also contain an additional column for each subexpression in the RegEx (the number of additional columns is available in the SubExprCount column). If you provide a subexprcolumnnames argument, the additional columns will be named accordingly. If no subexprcolumnnames argument is provided, the additional columns will be named SubExpr1, SubExpr2, and so on. Just like any other query, the column names are available in the automatic ColumnList property of the query object.

The main purpose of this function is to make it easier to work with regular expressions that might match multiple substrings within a chunk of text. Rather than having to work carefully with the pos and len arrays to advance the value passed to reFind()'s start argument, you can use this function to get a query object full of all matches in one shot.

For instance, consider the following line, which would match boldfaced areas within HTML-formatted text:

 <cfset matchQuery = reFindMatches("<b>(.*?)</b>", text)> 

This would create a query object called matchQuery that could be used just like any other query object. For instance, the following snippet would display all the matches found:

 <cfoutput query="matchQuery">  A match was found at position #Pos#: #Found#<br> </cfoutput> 

You can also specify column names for subexpressions. This line would match simple text links within HTML-formatted text:

 reFindMatches("<a[^>]+href="([^"]*)"[^>]*>([^<]*)</a>", text, True, "url,link") 

In this case, the resulting query would include additional columns called url and link (in addition to the usual Found, Pos, and Len columns) that correspond to the URL and linked text portions of each match.

Listing 13.11 provides the code for the RegExFunctions.cfm UDF library.

Listing 13.11. RegExFunctions.cfmA UDF Function Library for Working with Regular Expressions
 <!---  Filename: RegExFunctions.cfm  Author: Nate Weiss (NMW)  Purpose: Implements a UDF library for working with regular expression ---> <!--- REFindString() function ---> <cffunction name="reFindString" returntype="string" output="false">  <!--- Function arguments --->  <cfargument name="regEx" type="string" required="yes">  <cfargument name="string" type="string" required="yes">  <cfargument name="start" type="numeric" required="no" default="1">  <cfargument name="caseSensitive" type="boolean" required="no" default="no">  <!--- The value to return (start off with an empty string) --->  <cfset var result = "">  <cfset var foundstruct = "">  <!--- Perform the regular expression operation --->  <cfif ARGUMENTS.caseSensitive>    <cfset foundStruct = reFind(regEx, string, start, true)>  <cfelse>   <cfset foundStruct = reFindNoCase(regEx, string, start, true)>  </cfif>  <!--- If a match was found, use the found string as the result --->  <cfif foundStruct.pos[1] gt 0>  <cfset result = mid(string, foundStruct.pos[1], foundStruct.len[1])>  </cfif>  <!--- Return the result --->  <cfreturn result> </cffunction> <!--- REFindMatches() function ---> <cffunction name="reFindMatches" returntype="query" output="false">  <!--- Function arguments --->  <cfargument name="regex" type="string" required="yes">  <cfargument name="string" type="string" required="yes">  <cfargument name="caseSensitive" type="boolean" required="no" default="no">  <cfargument name="subExprColumnNames" type="string" required="no" default="">  <!--- Local variables (visible to this function only) --->  <cfset var queryColNames = "Found,Len,Pos,SubExprCount">  <cfset var result = queryNew(queryColNames)>  <cfset var startPos = 1>  <cfset var regExMatch = "">  <cfset var subExprMatch = "">  <cfset var thisColName = "">  <cfset var numSubexpressions = 0>  <cfset var subExpColNames = "">  <cfset var i = 0>  <!--- Begin looping: this continues looping until a <CFBREAK> tag --->  <!--- The first time through this loop will find the first match, --->  <!--- the second iteration will find the second match, and so on. --->  <cfloop condition="true">    <!--- Perform the actual regular expression search --->    <!--- Use the case sensitive or insensitive function as appropriate --->    <cfif ARGUMENTS.caseSensitive>      <cfset regExMatch = reFind(regEx, string, startPos, "Yes")>    <cfelse>      <cfset regExMatch = reFindNoCase(regEx, string, startPos, "Yes")>    </cfif>    <!--- If a match was found --->    <cfif regExMatch.len[1] gt 0>      <!--- If this is the first time through the loop --->      <cfif startPos eq 1>        <!--- How many subexpressions are in the regular expression? --->        <cfset numSubexpressions = arrayLen(regExMatch.pos)-1>        <!--- If there are subexpressions... --->        <cfif numSubexpressions gt 0>          <cfset subExpColNames = arrayNew(1)>          <!--- We will add a column to the query for each subexpression --->          <cfloop from="1" to="#numSubexpressions#" index="i">            <!--- If possible to use SubExprColumnNames argument --->            <cfif i lte listLen(ARGUMENTS.subExprColumnNames)>              <cfset thisColName = listGetAt(ARGUMENTS.subExprColumnNames, i)>              <!--- Otherwise, use name like SubExpr1, SubExpr2, etc --->            <cfelse>              <cfset thisColName = "SubExpr#i#">            </cfif>            <cfset arrayAppend(subExpColNames, thisColName)>          </cfloop>          <!--- Re-create query object with the new list of column names --->          <cfset queryColNames =            listAppend(queryColNames, arrayToList(subExpColNames))>          <cfset result = queryNew(queryColNames)>        </cfif>      </cfif>      <!--- Add a row to the Result query --->      <cfset queryAddRow(result, 1)>      <cfset querySetCell(result, "Pos", regExMatch.pos[1])>      <cfset querySetCell(result, "Len", regExMatch.len[1])>      <cfset querySetCell(result, "Found",      Mid(String, RegExMatch.pos[1], regExMatch.len[1]))>      <cfset querySetCell(result, "SubExprCount", numSubexpressions)>      <!--- If there are subexpressions... --->      <cfif numSubexpressions gt 0>        <!--- For each subexpression --->        <cfloop from="1" to="#numSubexpressions#" index="i">        <!--- If this subexpression matched (it may not have --->        <!--- matched if the subexpression uses ? to be optional) --->          <cfif regExMatch.pos[i+1] gt 0>            <cfset subExprMatch =            mid(string, regExMatch.pos[i+1], regExMatch.len[i+1])>          <!--- If there is no match, use an empty string --->          <cfelse>            <cfset subExprMatch = "">          </cfif>          <!--- Place the value into the appropriate subexpression column --->          <cfset querySetCell(result, subExpColNames[i], subExprMatch)>        </cfloop>      </cfif>      <!--- Advance the StartPos variable, so that the next --->      <!--- iteration of the loop will start right after this match --->      <cfset startPos = regExMatch.pos[1] + regExMatch.len[1]>  <!--- If no match was found, then our work here is done --->    <cfelse>       <cfbreak>    </cfif>  </cfloop>  <!--- Return the completed query object --->  <cfreturn result> </cffunction> <!--- AdjustNewlinesToLinefeeds() function ---> <cffunction name="adjustNewlinesToLinefeeds" returntype="string" output="false">  <!--- argument: string --->  <cfargument name="string" type="string" required="yes">  <!--- Replace all CRLF sequences with just LF --->  <cfset var result = reReplace(string, chr(13)&chr(10), chr(10), "ALL")>  <!--- Replace any remaining CR characters with LF --->  <cfset result = reReplace(string, chr(13), chr(10), "ALL")>  <!--- Return the result --->  <cfreturn result> </cffunction> 

NOTE

For information about the <cffunction>, <cfargument>, and <cfreturn> code used here, see Chapter 19, "Creating Advanced ColdFusion Components."


NOTE

These UDFs could have been implemented using <cfscript> and function syntax rather than <cffunction> blocks. If you want the functions to be compatible with ColdFusion 5, it would be a simple task to convert the functions to the older syntax as described in Chapter 12, "ColdFusion Scripting."




Advanced Macromedia ColdFusion MX 7 Application Development
Advanced Macromedia ColdFusion MX 7 Application Development
ISBN: 0321292693
EAN: 2147483647
Year: 2006
Pages: 240
Authors: Ben Forta, et al

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net