Functions and CALL Routines


Definitions of Functions and CALL Routines

Definition of Functions

A SAS function performs a computation or system manipulation on arguments and returns a value. Most functions use arguments supplied by the user , but a few obtain their arguments from the operating environment.

In Base SAS software, you can use SAS functions in DATA step programming statements, in a WHERE expression, in macro language statements, in PROC REPORT, and in Structured Query Language (SQL).

Some statistical procedures also use SAS functions. In addition, some other SAS software products offer functions that you can use in the DATA step. Refer to the documentation that pertains to the specific SAS software product for additional information about these functions.

Definition of CALL Routines

A CALL routine alters variable values or performs other system functions. CALL routines are similar to functions, but differ from functions in that you cannot use them in assignment statements.

All SAS CALL routines are invoked with CALL statements; that is, the name of the routine must appear after the keyword CALL on the CALL statement.

Syntax of Functions and CALL Routines

Syntax of Functions

The syntax of a function is as follows :

function-name ( argument-1 <..., argument-n >)

function-name (OF variable-list )

function-name (OF array-name {*})

Here is an explanation of the syntax:

function-name

  • names the function.

argument

  • can be a variable name, constant, or any SAS expression, including another function. The number and kind of arguments that SAS allows are described with individual functions. Multiple arguments are separated by a comma.

  • Tip: If the value of an argument is invalid (for example, missing or outside the prescribed range), then SAS writes a note to the log indicating that the argument is invalid, sets _ERROR_ to 1, and sets the result to a missing value.

  • Examples:

    • x=max(cash,credit);

    • x=sqrt(1500);

    • NewCity=left( upcase (City));

    • x=min(YearTemperature-July,YearTemperature-Dec);

    • s=repeat('----+',16);

    • x=min((enroll-drop),(enroll-fail));

    • dollars=int(cash);

    •  if sum(cash,credit)>1000 then     put 'Goal reached'; 

variable-list

  • can be any form of a SAS variable list, including individual variable names. If more than one variable list appears, separate them with a space or with a comma and another OF.

    Examples:

    • a=sum(of x y z);

    • The following two examples are equivalent.

    • a=sum(of x1-x10 y1-y10 z1-z10);

      a=sum(of x1-x10, of y1-y10, of z1-z10);

  • z=sum(of y1-y10);

array-name {*}

  • names a currently defined array. Specifying an array in this way causes SAS to treat the array as a list of the variables instead of processing only one element of the array at a time.

  • Examples:

    • array y{10} y1-y10;

      x=sum(of y{*});

Syntax of CALL Routines

The syntax of a CALL routine is as follows:

CALL routine-name ( argument-1 <, argument-n >);

CALL routine-name (OF variable-list );

CALL routine-name ( argument-1 OF variable-list-1 <, argument-n OF variable-list-n >);

Here is an explanation of the syntax:

routine-name

  • names a SAS CALL routine.

argument

  • can be a variable name, a constant, any SAS expression, an external module name, an array reference, or a function. Multiple arguments are separated by a comma. The number and kind of arguments that are allowed are described with individual CALL routines in the SAS Language Reference: Dictionary .

  • Examples:

    • call rxsubstr (rx,string,position);

    • call set(dsid);

    • call ranbin(Seed_1,n,p,X1);

    • call label(abc{j},lab);

variable-list

  • can be any form of a SAS variable list, including variable names. If more than one variable list appears, separate them with a space or with a comma and another OF.

  • Examples:

    • call cats(inventory, of y1-y15, of z1-z15);

      call catt(of item17-item23 pack17-pack23);

Using Functions

Restrictions on Function Arguments

If the value of an argument is invalid, then SAS prints an error message and sets the result to a missing value. Here are some common restrictions on function arguments:

  • Some functions require that their arguments be restricted within a certain range. For example, the argument of the LOG function must be greater than 0.

  • Most functions do not permit missing values as arguments. Exceptions include some of the descriptive statistics functions and financial functions.

  • In general, the allowed range of the arguments is platform-dependent, such as with the EXP function.

  • For some probability functions, combinations of extreme values can cause convergence problems.

Notes on Descriptive Statistic Functions

SAS provides functions that return descriptive statistics. Except for the MISSING function, the functions correspond to the statistics produced by the MEANS procedure. The computing method for each statistic is discussed in 'SAS Elementary Statistics Procedures' in Base SAS Procedures Guide . SAS calculates descriptive statistics for the nonmissing values of the arguments.

Notes on Financial Functions

SAS provides a group of functions that perform financial calculations. The functions are grouped into the following types:

Table 4.3: Types of Financial Functions

Function type

Functions

Description

Cashflow

CONVX, CONVXP

calculates convexity for cashflows

 

DUR, DURP

calculates modified duration for cashflows

 

PVP, YIELDP

calculates present value and yield-to-maturity for a periodic cashflow

Parameter calculations

COMPOUND

calculates compound interest parameters

 

MORT

calculates amortization parameters

Internal rate of return

INTRR, IRR

calculates the internal rate of return

Net present and future

value

NETPV, NPV

calculates net present and future values

 

SAVING

calculates the future value of periodic

saving

Depreciation

DACCxx

calculates the accumulated depreciation up to the specified period

 

DEPxxx

calculates depreciation for a single period

Special Considerations for Depreciation Functions

The period argument for depreciation functions can be fractional for all of the functions except DEPDBSL and DACCDBSL. For fractional arguments, the depreciation is prorated between the two consecutive time periods preceding and following the fractional period.

CAUTION:

  • Verify the depreciation method for fractional periods. You must verify whether this method is appropriate to use with fractional periods because many depreciation schedules, specified as tables, have special rules for fractional periods.

Using DATA Step Functions within Macro Functions

The macro functions %SYSFUNC and %QSYSFUNC can call DATA step functions to generate text in the macro facility. %SYSFUNC and %QSYSFUNC have one difference: %QSYSFUNC masks special characters and mnemonics and %SYSFUNC does not. For more information on these functions, see %QSYSFUNC and % SYSFUNC in SAS Macro Language: Reference .

%SYSFUNC arguments are a single DATA step function and an optional format, as shown in the following examples:

 %sysfunc(date(),worddate.)  %sysfunc(attrn(&dsid,NOBS)) 

You cannot nest DATA step functions within %SYSFUNC. However, you can nest %SYSFUNC functions that call DATA step functions. For example:

 %sysfunc(compress(%sysfunc(getoption(sasautos)),     %str(%)%(%'))); 

All arguments in DATA step functions within %SYSFUNC must be separated by commas. You cannot use argument lists that are preceded by the word OF.

Because %SYSFUNC is a macro function, you do not need to enclose character values in quotation marks as you do in DATA step functions. For example, the arguments to the OPEN function are enclosed in quotation marks when you use the function alone, but the arguments do not require quotation marks when used within %SYSFUNC.

 dsid=open("sasuser.houses","i");  dsid=open("&mydata","&mode");  %let dsid=%sysfunc(open(sasuser.houses,i));  %let dsid=%sysfunc(open(&mydata,&mode)); 

You can use these functions to call all of the DATA step SAS functions except those that pertain to DATA step variables or processing. These prohibited functions are: DIF, DIM, HBOUND, INPUT, IORCMSG, LAG, LBOUND, MISSING, PUT, RESOLVE, SYMGET, and all of the variable information functions (for example, VLABEL).

Using Functions to Manipulate Files

SAS manipulates files in different ways, depending on whether you use functions or statements. If you use functions such as FOPEN, FGET, and FCLOSE, you have more opportunity to examine and manipulate your data than when you use statements such as INFILE, INPUT, and PUT.

When you use external files, the FOPEN function allocates a buffer called the File Data Buffer (FDB) and opens the external file for reading or updating. The FREAD function reads a record from the external file and copies the data into the FDB. The FGET function then moves the data to the DATA step variables. The function returns a value that you can check with statements or other functions in the DATA step to determine how to further process your data. After the records are processed , the FWRITE function writes the contents of the FDB to the external file, and the FCLOSE function closes the file.

When you use SAS data sets, the OPEN function opens the data set. The FETCH and FETCHOBS functions read observations from an open SAS data set into the Data Set Data Vector (DDV). The GETVARC and GETVARN functions then move the data to DATA step variables. The functions return a value that you can check with statements or other functions in the DATA step to determine how you want to further process your data. After the data is processed, the CLOSE function closes the data set.

For complete descriptions and examples, see the functions and CALL routines in SAS Language Reference: Dictionary .

Using Random-Number Functions and CALL Routines

Seed Values

Random-number functions and CALL routines generate streams of random numbers from an initial starting point, called a seed , that either the user or the computer clock supplies . A seed must be a nonnegative integer with a value less than 2 31 -1 (or 2,147,483,647). If you use a positive seed, you can always replicate the stream of random numbers by using the same DATA step. If you use zero as the seed, the computer clock initializes the stream, and the stream of random numbers is not replicable.

Each random-number function and CALL routine generates pseudo-random numbers from a specific statistical distribution. Every random-number function requires a seed value expressed as an integer constant, or a variable that contains the integer constant. Every CALL routine calls a variable that contains the seed value. Additionally, every CALL routine requires a variable that contains the generated random numbers.

The seed variable must be initialized prior to the first execution of the function or CALL routine. After each execution of a function, the current seed is updated internally, but the value of the seed argument remains unchanged. After each iteration of the CALL routine, however, the seed variable contains the current seed in the stream that generates the next random number. With a function, it is not possible to control the seed values, and, therefore, the random numbers after the initialization.

Comparison of Random-Number Functions and CALL Routines

Except for the NORMAL and UNIFORM functions, which are equivalent to the RANNOR and RANUNI functions, respectively, SAS provides a CALL routine that has the same name as each random-number function. Using CALL routines gives you greater control over the seed values.

With a CALL routine, you can generate multiple streams of random numbers within a single DATA step. If you supply a different seed value to initialize each of the seed variables, the streams of the generated random numbers are computationally independent. With a function, however, you cannot generate more than one stream by supplying multiple seeds within a DATA step. The following two examples illustrate the difference.

Example 1: Generating Multiple Streams from a CALL Routine

This example uses the CALL RANUNI routine to generate three streams of random numbers from the uniform distribution, with ten numbers each. See the results in Output 4.1.

Output 4.1: The CALL Routine Example
start example
 Multiple Streams from a CALL Routine                           1   Obs      Seed_1          Seed_2        Seed_3         X1          X2          X3     1    1394231558      512727191      367385659    0.64924     0.23876     0.17108     2    1921384255     1857602268     1297973981    0.89471     0.86501     0.60442     3     902955627      422181009      188867073    0.42047     0.19659     0.08795     4     440711467      761747298      379789529    0.20522     0.35472     0.17685     5    1044485023     1703172173      591320717    0.48638     0.79310     0.27536     6    2136205611     2077746915      870485645    0.99475     0.96753     0.40535     7    1028417321     1800207034     1916469763    0.47889     0.83829     0.89243     8    1163276804      473335603      753297438    0.54169     0.22041     0.35078     9     176629027     1114889939     2089210809    0.08225     0.51916     0.97286    10    1587189112      399894790      284959446    0.73909     0.18622     0.13269 
end example
 
 options nodate pageno=1 linesize=80 pagesize=60;  data multiple(drop=i);     retain Seed_1 1298573062 Seed_2 447801538            Seed_3 631280;     do i=1 to 10;        call ranuni (Seed_1,X1);        call ranuni (Seed_2,X2);        call ranuni (Seed_3,X3);        output;     end;  run;  proc print data=multiple;     title 'Multiple Streams from a CALL Routine';  run; 

Example 2: Assigning Values from a Single Stream to Multiple Variables

Using the same three seeds that were used in Example 1, this example uses a function to create three variables. The results that are produced are different from those in Example 1 because the values of all three variables are generated by the first seed. When you use an individual function more than once in a DATA step, the function accepts only the first seed value that you supply and ignores the rest.

 options nodate pageno=1 linesize=80 pagesize=60;  data single(drop=i);     do i=1 to 3;        Y1=ranuni(1298573062);        Y2=ranuni(447801538);        Y3=ranuni(631280);        output;     end;  run;  proc print data=single;     title 'A Single Stream across Multiple Variables';  run; 

The following output shows the results. The values of Y1, Y2, and Y3 in this example come from the same random-number stream generated from the first seed. You can see this by comparing the values by observation across these three variables with the values of X1 in Output 4.2.

Output 4.2: The Function Example
start example
 A Single Stream across Multiple Variables                              1     Obs       Y1         Y2         Y3      1     0.64924    0.89471    0.42047      2     0.20522    0.48638    0.99475      3     0.47889    0.54169    0.08225 
end example
 

Pattern Matching Using SAS Regular Expressions (RX) and Perl Regular Expressions (PRX)

Definition of Pattern Matching and Regular Expressions

Pattern matching enables you to search for and extract multiple matching patterns from a character string in one step, as well as to make several substitutions in a string in one step. The DATA step supports two kinds of pattern-matching functions and CALL routines:

  • SAS regular expressions (RX)

  • Perl regular expressions (PRX).

Regular expressions are a pattern language which provides fast tools for parsing large amounts of text. Regular expressions are composed of characters and special characters that are called metacharacters.

The asterisk (*) and the question mark (?) are two examples of metacharacters. The asterisk (*) matches zero or more characters, and the question mark (?) matches one or zero characters. For example, if you issue the ls data*.txt command from a UNIX shell prompt, UNIX displays all the files in the current directory that begin with the letters 'data' and end with the file extension 'txt'.

The asterisk (*) and the question mark (?) are a limited form of regular expressions. Perl regular expressions build on the asterisk and the question mark to make searching more powerful and flexible.

Definition of SAS Regular Expression (RX) Functions and CALL Routines

SAS Regular expression (RX) functions and CALL routines refers to a group of functions and CALL routines that uses SAS' regular expression pattern matching to parse character strings. You can search for character strings that have a specific pattern that you specify, and change a matched substring to a different substring.

SAS regular expressions consist of CALL RXCHANGE , CALL RXFREE, CALL RXSUBSTR, RXMATCH , and RXPARSE , and are part of the character string matching category for functions and CALL routines. For more information on these functions and CALL routines, see SAS Language Reference: Dictionary .

Definition of Perl Regular Expression (PRX) Functions and CALL Routines

Perl regular expression (PRX) functions and CALL routines refers to a group of functions and CALL routines that uses a modified version of Perl as a pattern matching language to parse character strings. PRX functions enable you to do the following:

  • search for a pattern of characters within a string

  • extract a substring from a string

  • search and replace text with other text

  • parse large amounts of text, such as Web logs or other text data, more quickly than with SAS regular expressions.

Perl regular expressions consist of CALL PRXCHANGE, CALL PRXDEBUG , CALL PRXFREE, CALL PRXNEXT, CALL PRXPOSN, CALL PRXSUBSTR, PRXPAREN , PRXMATCH , and PRXPARSE , and are part of the character string matching category for functions and CALL routines. For more information on these functions and CALL routines, see SAS Language Reference: Dictionary .

Benefits of Using Perl Regular Expressions in the DATA Step

Using Perl regular expressions in the DATA step enhances search and replace options in text. You can use Perl regular expressions to do the following:

  • validate data

  • replace text

  • extract a substring from a string

  • write Perl debug output to the SAS log.

You can write SAS programs that do not use regular expressions to produce the same results as you do when you use Perl regular expressions. The code without the regular expressions, however, requires more function calls to handle character positions in a string and to manipulate parts of the string.

Perl regular expressions combine most, if not all, of these steps into one expression. The resulting code has the following advantages.

  • less prone to error

  • easier to maintain

  • clearer to read

  • more efficient in terms of improving system performance.

Using Perl Regular Expressions in the DATA Step - License Agreement

The following paragraph complies with sections 3 (a) and 4 (c) of the artistic license:

The PRX functions use a modified version of Perl 5.6.1 to perform regular expression compilation and matching. Perl is compiled into a library for use with SAS. The modified and original Perl 5.6.1 files are freely available from the SAS Web site at http://support.sas.com/rnd/base. Each of the modified files has a comment block at the top of the file describing how and when the file was changed. The executables were given non-standard Perl names. The standard version of Perl can be obtained from http://www.perl.com.

Only Perl regular expressions are accessible from the PRX functions. Other parts of the Perl language are not accessible. The modified version of Perl regular expressions does not support the following:

  • Perl variables.

  • the regular expression options /c, /g, and /o and the /e option with substitutions.

  • named characters, which use the \N{name} syntax.

  • the metacharacters \pP, \PP, and \X.

  • executing Perl code within a regular expression. This includes the syntax (?{code}), (??{code}), and (?p{code}).

  • unicode pattern matching.

  • using ?PATTERN?. The ? metacharacter is treated like a regular expression start-and-end delimiter .

  • the metacharacter \G.

  • Perl comments between a pattern and replacement text. For example: s{regexp} # perl comment {replacement}.

  • matching backslashes with m/\\\\/. Instead m/\\/ should be used to match a backslash.

Syntax of Perl Regular Expressions

Perl regular expressions are composed of characters and special characters that are called metacharacters. When performing a match, SAS searches a source string for a substring that matches the Perl regular expression that you specify. Using metacharacters enables SAS to perform special actions when searching for a match:

  • If you use the metacharacter \d, SAS matches a digit between 0-9.

  • If you use /\dt/, SAS finds the digits in the string 'Raleigh, NC 27506'.

  • If you use /world/, SAS finds the substring 'world' in the string 'Hello world!'.

The following table contains a short list of Perl regular expression metacharacters that you can use when you build your code. You can find a complete list of metacharacters on the following Perl man page at http://www.perldoc.com/perl5.6.1/pod/ perlre.html.

Metacharacter

Description

\

marks the next character as either a specialcharacter, a literal, a back reference, or an octal escape:

  • "n" matches the character "n"

  • "\n" matches a new line character

  • "\\" matches "\"

  • "\(" matches "("

specifies the or condition when you compare alphanumeric strings.

^

matches the position at the beginning of the input string.

$

matches the position at the end of the input string.

*

matches the preceding subexpression zero or more times:

  • zo* matches "z" and "zoo"

  • * is equivalent to {0}

+

matches the preceding subexpression one or more times:

  • "zo+" matches "zo" and "zoo"

  • "zo+" does not match "z"

  • + is equivalent to {1,}

?

matches the preceding subexpression zero or one time:

  • "do(es)?" matches the "do" in "do" or "does"

  • ? is equivalent to {0,1}

{n}

n is a non-negative integer that matches exactly n times:

  • "o{2}" matches the two o's in "food"

  • "o{2}" does not match the "o" in "Bob"

{n,}

n is a non-negative integer that matches n or more times:

  • "o{2,}" matches all the o's in "foooood"

  • "o{2,}" does not match the "o" in "Bob"

  • "o{1,}" is equivalent to "o+"

  • "o{0,}" is equivalent to "o*"

{n,m}

m and n are non-negative integers, where n<=m. They match at least n and at most m times:

  • "o{1,3}" matches the first three o's in "fooooood"

  • "o{0,1}" is equivalent to "o?"

 

Note: You cannot put a space between the comma and the numbers.

period (.)

matches any single character except newline. To match any character including newline, use a pattern such as "[.\n]".

(pattern)

matches a pattern and captures the match. To retrieve the position and length of the match that is captured, use CALL PRXPOSN. To match parentheses characters, use "\(" or "\)".

xy

matches either x or y:

  • "zfood" matches "z" or "food"

  • "(zf)ood" matches "zood" or "food"

[xyz]

specifies a character set that matches any one of the enclosed characters:

  • "[abc]" matches the "a" in "plain"

[^xyz]

specifies a negative character set that matches any character that is not enclosed:

  • "[^abc]" matches the "p" in "plain"

[a-z]

specifies a range of characters that matches any character in the range:

  • "[a-z]" matches any lowercase alphabetic character in the range "a" through "z"

[^a-z]

specifies a range of characters that does not match any character in the range:

  • "[^a-z]" matches any character that is not in the range "a" through "z"

\b

matches a word boundary (the position between a word and a space):

  • "er\b" matches the "er" in "never"

  • "er\b" does not match the "er" in "verb"

\B

matches a non-word boundary:

  • "er\B" matches the "er" in "verb"

  • "er\B" does not match the "er" in "never"

\d

matches a digit character that is equivalent to [0-9].

\D

matches a non-digit character that is equivalent to [^0-9].

\s

matches any white space character including space, tab, form feed, and so on, and is equivalent to [\f\n\r\t\v].

\S

matches any character that is not a white space character and is equivalent to [^\f\n\r\t\v].

\t

matches a tab character and is equivalent to "\x09".

\w

matches any word character including the underscore and is equivalent to [A-Za-z0-9_].

\W

matches any non-word character and is equivalent to [^A-Za-z0-9_].

\num

matches num, where num is a positive integer. This is a reference back to captured matches:

  • "(.)\1" matches two consecutive identical characters.

Example 1: Validating Data

You can test for a pattern of characters within a string. For example, you can examine a string to determine whether it contains a correctly formatted telephone number. This type of test is called data validation.

The following example validates a list of phone numbers. To be valid, a phone number must have one of the following forms: (XXX) XXX-XXXX or XXX-XXX-XXXX .

 data _null_; [1]     if _N_ = 1 then        do;           paren = "\([2-9]\d\d\) ?[2-9]\d\d-\d\d\d\d"; [2]           dash = "[2-9]\d\d-[2-9]\d\d-\d\d\d\d"; [3]           regexp = "/("  paren  ")("  dash  ")/"; [4]           retain re;           re = prxparse(regexp); [5]           if missing(re) then [6]              do;                 putlog "ERROR: Invalid regexp " regexp; [7]                 stop;              end;        end;     length first last home business $ 16;     input first last home business;     if ^prxmatch(re, home) then [8]        putlog "NOTE: Invalid home phone number for " first last home;     if ^prxmatch(re, business) then [9]        putlog "NOTE: Invalid business phone number for " first last business;     datalines;  Jerome Johnson (919)319-1677 (919)846-2198  Romeo Montague 800-899-2164 360-973-6201  Imani Rashid (508)852-2146 (508)366-9821  Palinor Kent . 919-782-3199  Ruby Archuleta . .  Takei Ito 7042982145 .  Tom Joad 209/963/2764 2099-66-8474  ; 

The following items correspond to the lines that are numbered in the DATA step that is shown above.

[1]  

Create a DATA step.

[2]  

Build a Perl regular expression to identify a phone number that matches (XXX)XXX-XXXX, and assign the variable PAREN to hold the result. Use the following syntax elements to build the Perl regular expression:

\(

matches the open parenthesis in the area code.

[2-9]

matches the digits 2-9. This is the first number in the area code.

\d

matches a digit. This is the second number in the area code.

\d

matches a digit. This is the third number in the area code.

\)

matches the closed parenthesis in the area code.

?

matches the space (which is the preceding subexpression) zero or one time. Spaces are significant in Perl regular expressions. They match a space in the text that you are searching. If a space precedes the question mark metacharacter (as it does in this case), the pattern matches either zero spaces or one space in this position in the phone number.

[3]  

Build a Perl regular expression to identify a phone number that matches XXX-XXX-XXXX, and assign the variable DASH to hold the result.

[4]  

Build a Perl regular expression that concatenates the regular expressions for (XXX)XXX-XXXX and XXX-XXX-XXXX. The concatenation enables you to search for both phone number formats from one regular expression.

The PAREN and DASH regular expressions are placed within parentheses. The bar metacharacter () that is located between PAREN and DASH instructs the compiler to match either pattern. The slashes around the entire pattern tell the compiler where the start and end of the regular expression is located.

[5]  

Pass the Perl regular expression to PRXPARSE and compile the expression. PRXPARSE returns a value to the compiled pattern. Using the value with other Perl regular expression functions and CALL routines enables SAS to perform operations with the compiled Perl regular expression.

[6]  

Use the MISSING function to check whether the regular expression was successfully compiled.

[7]  

Use the PUTLOG statement to write an error message to the SAS log if the regular expression did not compile.

[8]  

Search for a valid home phone number. PRXMATCH uses the value from PRXPARSE along with the search text and returns the position where the regular expression was found in the search text. If there is no match for the home phone number, the PUTLOG statement writes a note to the SAS log.

[9]  

Search for a valid business phone number. PRXMATCH uses the value from PRXPARSE along with the search text and returns the position where the regular expression was found in the search text. If there is no match for the business phone number, the PUTLOG statement writes a note to the SAS log.

The following lines are written to the SAS log:

 NOTE: Invalid home phone number for Palinor Kent  NOTE: Invalid home phone number for Ruby Archuleta  NOTE: Invalid business phone number for Ruby Archuleta  NOTE: Invalid home phone number for Takei Ito 7042982145  NOTE: Invalid business phone number for Takei Ito  NOTE: Invalid home phone number for Tom Joad 209/963/2764  NOTE: Invalid business phone number for Tom Joad 2099-66-8474 

Example 2: Replacing Text

You can use Perl regular expressions to find specific characters within a string. You can then remove the characters or replace them with other characters. In this example, the two occurrences of the less-than character (<) are replaced by &lt; and the two occurrences of the greater-than character (>) are replaced by &gt;.

 data _null_; [1]     if _N_ = 1 then        do;           retain lt_re gt_re;           lt_re = prxparse('s/</&lt;/'); [2]           gt_re = prxparse('s/>/&gt;/'); [3]           if missing(lt_re) or missing(gt_re) then [4]              do;                putlog "ERROR: Invalid regexp."; [5]                stop;              end;        end;     input;     call prxchange(lt_re, -1, _infile_); [6]     call prxchange(gt_re, -1, _infile_); [7]     put _infile_;     datalines4;  The bracketing construct (...) creates capture buffers. To refer to  the digit'th buffer use \<digit> within the match. Outside the match  use "$" instead of "\". (The \<digit> notation works in certain  circumstances outside the match. See the warning below about  vs   for details.) Referring back to another part of the match is called  backreference.  ;;;; 

The following items correspond to the numbered lines in the DATA step that is shown above.

[1]  

Create a DATA step.

[2]  

Use metacharacters to create a substitution syntax for a Perl regular expression, and compile the expression. The substitution syntax specifies that a less-than character (<) in the input is replaced by the value & lt; in the output.

[3]  

Use metacharacters to create a substitution syntax for a Perl regular expression, and compile the expression. The substitution syntax specifies that a greater-than character (>) in the input is replaced by the value & gt; in the output.

[4]  

Use the MISSING function to check whether the Perl regular expression compiled without error.

[5]  

Use the PUTLOG statement to write an error message to the SAS log if neither of the regular expressions was found.

[6]  

Call the PRXCHANGE routine. Pass the LT_RE pattern-id , and search for and replace all matching patterns. Put the results in _INFILE_ and write the observation to the SAS log.

[7]  

Call the PRXCHANGE routine. Pass the GT_RE pattern-id , and search for and replace all matching patterns. Put the results in _INFILE_ and write the observation to the SAS log.

The following lines are written to the SAS log:

 The bracketing construct (   ) creates capture buffers. To refer to  the digit'th buffer use \&lt;digit&gt; within the match. Outside the match  use "$" instead of "\". (The \&lt;digit&gt; notation works in certain  circumstances outside the match. See the warning below about  vs   for details.) Referring back to another part of the match is called a  backreference. 

Example 3: Extracting a Substring from a String

You can use Perl regular expressions to find and easily extract text from a string. In this example, the DATA step creates a subset of North Carolina business phone numbers. The program extracts the area code and checks it against a list of area codes for North Carolina.

 data _null_;    [1]     if _N_ = 1 then        do;           paren = "\(([2-9]\d\d)\) ?[2-9]\d\d-\d\d\d\d";    [2]           dash = "([2-9]\d\d)-[2-9]\d\d-\d\d\d\d";    [3]           regexp = "/("  paren  ")("  dash  ")/";    [4]           retain re;           re = prxparse(regexp);    [5]           if missing(re) then    [6]              do;                 putlog "ERROR: Invalid regexp " regexp;    [7]                 stop;              end;           retain areacode_re;           areacode_re = prxparse("/828336704910919252/");    [8]           if missing(areacode_re) then              do;                 putlog "ERROR: Invalid area code regexp";                 stop;              end;        end;     length first last home business $ 16;     length areacode $ 3;     input first last home business;     if ^prxmatch(re, home) then        putlog "NOTE: Invalid home phone number for " first last home;     if prxmatch(re, business) then    [9]        do;           which_format = prxparen(re);    [10]           call prxposn(re, which_format, pos, len);    [11]           areacode = substr(business, pos, len);           if prxmatch(areacode_re, areacode) then    [12]              put "In North Carolina: " first last business;        end;        else           putlog "NOTE: Invalid business phone number for " first last business;     datalines;  Jerome Johnson (919)319-1677 (919)846-2198  Romeo Montague 800-899-2164 360-973-6201  Imani Rashid (508)852-2146 (508)366-9821  Palinor Kent 704-782-4673 704-782-3199  Ruby Archuleta 905-384-2839 905-328-3892  Takei Ito 704-298-2145 704-298-4738  Tom Joad 515-372-4829 515-389-2838  ; 

The following items correspond to the numbered lines in the DATA step that is shown above.

[1]  

Create a DATA step.

[2]  

Build a Perl regular expression to identify a phone number that matches (XXX)XXX-XXXX, and assign the variable PAREN to hold the result. Use the following syntax elements to build the Perl regular expression:

\(

matches the open parenthesis in the area code. The open parenthesis marks the start of the submatch.

[2-9]

matches the digits 2-9. This is the first number in the area code.

\d

matches a digit. This is the second number in the area code.

\d

matches a digit. This is the third number in the area code.

\)

matches the closed parenthesis in the area code. The closed parenthesis marks the end of the submatch.

?

matches the space (which is the preceding subexpression) zero or one time. Spaces are significant in Perl regular expressions. They match a space in the text that you are searching. If a space precedes the question mark metacharacter (as it does in this case), the pattern matches either zero spaces or one space in this position in the phone number.

[3]  

Build a Perl regular expression to identify a phone number that matches XXX-XXX-XXXX, and assign the variable DASH to hold the result.

[4]  

Build a Perl regular expression that concatenates the regular expressions for (XXX)XXX-XXXX and XXX-XXX-XXXX. The concatenation enables you to search for both phone number formats from one regular expression.

The PAREN and DASH regular expressions are placed within parentheses. The bar metacharacter () that is located between PAREN and DASH instructs the compiler to match either pattern. The slashes around the entire pattern tell the compiler where the start and end of the regular expression is located.

[5]  

Pass the Perl regular expression to PRXPARSE and compile the expression. PRXPARSE returns a value to the compiled pattern. Using the value with other Perl regular expression functions and CALL routines enables SAS to perform operations with the compiled Perl regular expression.

[6]  

Use the MISSING function to check whether the Perl regular expression compiled without error.

[7]  

Use the PUTLOG statement to write an error message to the SAS log if the regular expression did not compile.

[8]  

Compile a Perl regular expression that searches a string for a valid North Carolina area code.

[9]  

Search for a valid business phone number.

[10]  

Use the PRXPAREN function to determine which submatch to use. PRXPAREN returns the last submatch that was matched. If an area code matches the form (XXX), PRXPAREN returns the value 2. If an area code matches the form XXX, PRXPAREN returns the value 4.

[11]  

Call the PRXPOSN routine to retrieve the position and length of the submatch.

[12]  

Use the PRXMATCH function to determine whether the area code is a valid North Carolina area code, and write the observation to the log.

The following lines are written to the SAS log:

 In North Carolina: Jerome Johnson (919)846-2198  In North Carolina: Palinor Kent 704-782-3199  In North Carolina: Takei Ito 704-298-4738 

Writing Perl Debug Output to the SAS Log

The DATA step provides debugging support with the CALL PRXDEBUG routine. CALL PRXDEBUG enables you to turn on and off Perl debug output messages that are sent to the SAS log.

The following example writes Perl debug output to the SAS log.

 data _null_;        /* CALL PRXDEBUG(1) turns on Perl debug output. */     call prxdebug(1);     putlog 'PRXPARSE: ';     re = prxparse('/[bc]d(ef*g)+h[ij]k$/');     putlog 'PRXMATCH: ';     pos = prxmatch(re, 'abcdefg_gh_');        /* CALL PRXDEBUG(0) turns off Perl debug output. */     call prxdebug(0);  run; 

SAS writes the following output to the log.

Output 4.3: SAS Debugging Output
start example
 PRXPARSE:  Compiling REx '[bc]d(ef*g)+h[ij]k$'  size 41 first at 1  rarest char g at 0  rarest char d at 0     1: ANYOF[bc](10)    10: EXACT <d>(12)    12: CURLYX[0] {1,32767}(26)    14:   OPEN1(16)    16:     EXACT <e>(18)    18:     STAR(21)    19:       EXACT <f>(0)    21:     EXACT <g>(23)    23:     CLOSE1(25)    25:   WHILEM[1/1](0)    26: NOTHING(27)    27: EXACT <h>(29)    29: ANYOF[ij](38)    38: EXACT <k>(40)    40: EOL(41)    41: END(0)  anchored 'de' at 1 floating 'gh' at 3..2147483647 (checking floating) stclass  'ANYOF[bc]' minlen 7  PRXMATCH:  Guessing start of match, REx '[bc]d(ef*g)+h[ij]k$' against 'abcdefg_gh_'...  Did not find floating substr 'gh'...  Match rejected by optimizer 
end example
 

For a detailed explanation of Perl debug output, see the 'CALL PRXDEBUG Routine' in SAS Language Reference: Dictionary .

Base SAS Functions for Web Applications

Four functions that manipulate Web- related content are available in Base SAS software. HTMLENCODE and URLENCODE return encoded strings. HTMLDECODE and URLDECODE return decoded strings. For information about Web-based SAS tools, follow the Communities link on the SAS customer support home page, at support.sas.com.




SAS 9.1 Language Reference. Concepts
SAS 9.1 Language Reference Concepts
ISBN: 1590471989
EAN: 2147483647
Year: 2004
Pages: 255

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net