INFILE Statement


Identifies an external file to read with an INPUT statement

Valid: in a DATA Step

Category: File-handling

Type: Executable

See: INFILE Statement in the documentation for your operating environment.

Syntax

INFILE file-specification < options >< operating-environment-options >;

INFILE DBMS-specifications;

Arguments

file-specification

  • identifies the source of the input data records, which is an external file or instream data. File-specification can have these forms:

    • external-file

      • specifies the physical name of an external file. The physical name is the name that the operating environment uses to access the file.

    • fileref

      • specifies the fileref of an external file.

      • Requirement: You must have previously associated the fileref with an external file in a FILENAME statement, FILENAME function, or an appropriate operating environment command.

      • See: FILENAME Statement on page 1169

    • fileref(file)

      • specifies a fileref of an aggregate storage location and the name of a file or member, enclosed in parentheses, that resides in that location.

      • Operating Environment Information: Different operating environments call an aggregate grouping of files by different names , such as a directory, a MACLIB, or a partitioned data set. For details about how to specify external files, see the SAS documentation for your operating environment.

      • Requirement: You must have previously associated the fileref with an external file in a FILENAME statement, a FILENAME function, or an appropriate operating environment command.

      • See: FILENAME Statement on page 1169

    • CARDS CARDS4

      • for a definition, see DATALINES.

      • Alias: DATALINES DATALINES4

    • DATALINES DATALINES4

      • specifies that the input data immediately follows the DATALINES or DATALINES4 statement in the DATA step. This allows you to use the

      • INFILE statement options to control how the INPUT statement reads instream data lines.

      • Alias: CARDS CARDS4

      • Featured in: Example 1 on page 1234

Options

  • BLKSIZE= block- size

    • specifies the block size of the input file.

    • Default: Dependent on the operating environment

      • Operating Environment Information: For details, see the SAS documentation for your operating environment.

  • COLUMN= variable

    • names a variable that SAS uses to assign the current column location of the input pointer. Like automatic variables , the COLUMN= variable is not written to the data set.

    • Alias: COL=

    • See Also: LINE= on page 1225

    • Featured in: Example 8 on page 1238

  • DELIMITER = delimiter(s)

    • specifies an alternate delimiter (other than a blank) to be used for LIST input, where delimiter is

      • list-of-delimiting- characters

        • specifies one or more characters to read as delimiters.

        • Requirement: Enclose the list of characters in quotation marks.

        • Featured in: Example 1 on page 1234

      • character-variable

        • specifies a character variable whose value becomes the delimiter.

  • DSD (delimiter-sensitive data)

    • specifies that when data values are enclosed in quotation marks, delimiters within the value be treated as character data. The DSD option changes how SAS treats delimiters when you use LIST input and sets the default delimiter to a comma. When you specify DSD, SAS treats two consecutive delimiters as a missing value and removes quotation marks from character values.

    • Interaction: Use the DELIMITER= option to change the delimiter.

    • Tip: Use the DSD option and LIST input to read a character value that contains a delimiter within a string that is enclosed in quotation marks. The INPUT statement treats the delimiter as a valid character and removes the quotation marks from the character string before the value is stored. Use the tilde (~) format modifier to retain the quotation marks.

    • See: Reading Delimited Data on page 1231

    • See Also: DELIMITER= on page 1223

    • Featured in: Example 1 on page 1234 and Example 2 on page 1235

  • ENCODING= encoding-value

    • specifies the encoding to use when reading from the external file. The value for ENCODING= indicates that the external file has a different encoding from the current session encoding.

    • When you read data from an external file, SAS transcodes the data from the specified encoding to the session encoding.

    • For valid encoding values, see Encoding Values in SAS Language Elements in SAS National Language Support (NLS): User s Guide .

    • Default: SAS assumes that an external file is in the same encoding as the session encoding.

    • Featured in: Example 11 on page 1241

  • END= variable

    • names a variable that SAS sets to 1 when the current input data record is the last in the input file. Until SAS processes the last data record, the END= variable is set to 0. Like automatic variables, this variable is not written to the data set.

    • Restriction: You cannot use the END= option with

      • the UNBUFFERED option

      • the DATALINES or DATALINES4 statement

      • an INPUT statement that reads multiple input data records.

  • EOF= label

    • specifies a statement label that is the object of an implicit GO TO when the INFILE statement reaches end of file. When an INPUT statement attempts to read from a file that has no more records, SAS moves execution to the statement label indicated.

    • Interaction: Use EOF= instead of the END= option with

      • the UNBUFFERED option

      • the DATALINES or DATALINES4 statement

      • an INPUT statement that reads multiple input data records.

  • EOV= variable

    • names a variable that SAS sets to 1 when the first record in a file in a series of concatenated files is read. The variable is set only after SAS encounters the next file. Like automatic variables, the EOV= variable is not written to the data set.

    • Tip: Reset the EOV= variable back to 0 after SAS encounters each boundary.

    • See Also: END= on page 1224 and EOF= on page 1224

  • EXPANDTABS NOEXPANDTABS

    • specifies whether to expand tab characters to the standard tab setting, which is set at 8-column intervals that start at column 9.

    • Default: NOEXPANDTABS

    • Tip: EXPANDTABS is useful when you read data that contains the tab character that is native to your operating environment.

  • FILENAME= variable

    • names a variable that SAS sets to the physical name of the currently opened input file. Like automatic variables, the FILENAME= variable is not written to the data set.

    • Tip: Use a LENGTH statement to make the variable length long enough to contain the value of the filename.

    • See Also: FILEVAR= on page 1225

    • Featured in: Example 5 on page 1237

  • FILEVAR= variable

    • names a variable whose change in value causes the INFILE statement to close the current input file and open a new one. When the next INPUT statement executes, it reads from the new file that the FILEVAR= variable specifies. Like automatic variables, this variable is not written to the data set.

    • Restriction: The FILEVAR= variable must contain a character string that is a physical filename.

    • Interaction: When you use the FILEVAR= option, the file-specification is just a placeholder, not an actual filename or a fileref that has been previously assigned to a file. SAS uses this placeholder for reporting processing information to the SAS log. It must conform to the same rules as a fileref.

    • Tip: Use FILEVAR= to dynamically change the currently opened input file to a new physical file.

    • See Also: Updating External Files in Place on page 1230

    • Featured in: Example 5 on page 1237

  • FIRSTOBS= record-number

    • specifies a record number that SAS uses to begin reading input data records in the input file.

    • Default: 1

    • Tip: Use FIRSTOBS= with OBS= to read a range of records from the middle of a file.

    • Example: This statement processes record 50 through record 100:

       infile  file-specification  firstobs=50 obs=100; 
  • LENGTH= variable

    • names a variable that SAS sets to the length of the current input line. SAS does not assign the variable a value until an INPUT statement executes. Like automatic variables, the LENGTH= variable is not written to the data set.

    • Tip: This option in conjunction with the $VARYING informat on page 963 is useful when the field width varies.

    • Featured in: Example 4 on page 1236 and Example 7 on page 1238

  • LINE= variable

    • names a variable that SAS sets to the line location of the input pointer in the input buffer. Like automatic variables, the LINE= variable is not written to the data set.

    • Range: 1 to the value of the N= option

    • Interaction: The value of the LINE= variable is the current relative line number within the group of lines that is specified by the N= option or by the # n line pointer control in the INPUT statement.

    • See Also: COLUMN= on page 1223 and N= on page 1226

    • Featured in: Example 8 on page 1238

  • LINESIZE= line-size

    • specifies the record length that is available to the INPUT statement.

    • Operating Environment Information: Values for line-size are dependent on the operating environment record size. For details, see the SAS documentation for your operating environment.

    • Alias: LS=

    • Range: up to 32767

    • Interaction: If an INPUT statement attempts to read past the column that is specified by the LINESIZE= option, then the action that is taken depends on whether the FLOWOVER, MISSOVER, SCANOVER, STOPOVER, or TRUNCOVER option is in effect. FLOWOVER is the default.

    • Tip: Use LINESIZE= to limit the record length when you do not want to read the entire record.

    • Example: If your data lines contain a sequence number in columns 73 through 80, then use this INFILE statement to restrict the INPUT statement to the first 72 columns :

       infile  file-specification  linesize=72; 
  • LRECL= logical-record-length

    • specifies the logical record length.

    • Operating Environment Information: Values for logical-record-length are dependent on the operating environment. For details, see the SAS documentation for your operating environment.

    • Default: Dependent on the file characteristics of your operating environment

    • Restriction: LRECL is not valid when you use the DATALINES file specification.

    • Tip: LRECL= specifies the physical line length of the file. LINESIZE= tells the INPUT statement how much of the line to read.

  • MISSOVER

  • N= available-lines

    • specifies the number of lines that are available to the input pointer at one time.

    • Default: The highest value following a # pointer control in any INPUT statement in the DATA step. If you omit a # pointer control, then the default value is 1.

    • Interaction: This option affects only the number of lines that the pointer can access at a time; it has no effect on the number of lines an INPUT statement reads.

    • Tip: When you use # pointer controls in an INPUT statement that are less than the value of N=, you might get unexpected results. To prevent this, include a # pointer control that equals the value of the N= option. Here is an example:

       infile '  external file  ' n=5;  input #2 name : . #3 job : . #5; 
    • The INPUT statement includes a #5 pointer control, even though no data is read from that record.

    • Featured in: Example 8 on page 1238

  • NBYTE= variable

    • specifies the name of a variable that contains the number of bytes to read from a file when you are reading data in stream record format (RECFM=S in the FILENAME statement).

    • Default: The LRECL value of the file

    • Interaction: If the number of bytes to read is set to -1, then the FTP and SOCKET access methods return the number of bytes that are currently available in the input buffer.

    • See: The RECFM= option on page 1200 in the FILENAME statement, SOCKET access method, and the RECFM= option on page 1194 in the FILENAME statement, FTP access method

  • OBS= record-number MAX

    • record-number

      specifies the record number of the last record to read in an input file that is read sequentially.

      MAX

      specifies the maximum number of observations to process, which will be at least as large as the largest signed, 32 “bit integer. The absolute maximum depends on your host operating environment.

    • Default: MAX

    • Tip: Use OBS= with FIRSTOBS= to read a range of records from the middle of a file.

    • Example: This statement processes only the first 100 records in the file:

       infile  file-specification  obs=100; 
  • PAD NOPAD

    • controls whether SAS pads the records that are read from an external file with blanks to the length that is specified in the LRECL= option.

    • Default: NOPAD

    • See Also: LRECL= on page 1226

  • PRINT NOPRINT

    • specifies whether the input file contains carriage -control characters.

    • Tip: To read a file in a DATA step without having to remove the carriage-control characters, specify PRINT. To read the carriage-control characters as data values, specify NOPRINT.

  • RECFM= record-format

    • specifies the record format of the input file.

    • Operating Environment Information: Values for record-format are dependent on the operating environment. For details, see the SAS documentation for your operating environment.

  • SCANOVER

    • causes the INPUT statement to scan the input data records until the character string that is specified in the @ character-string expression is found.

    • Interaction: The MISSOVER, TRUNCOVER, and STOPOVER options change how the INPUT statement behaves when it scans for the @ character-string expression and reaches the end of the record. By default (FLOWOVER option), the INPUT statement scans the next record while these other options cause scanning to stop.

    • Tip: It is redundant to specify both SCANOVER and FLOWOVER.

    • See: Reading Past the End of a Line on page 1232

    • See Also: FLOWOVER on page 1225, MISSOVER on page 1226, STOPOVER on page 1228, and TRUNCOVER on page 1229

    • Featured in: Example 3 on page 1236

  • SHAREBUFFERS

    • specifies that the FILE statement and the INFILE statement share the same buffer.

    • Alias: SHAREBUFS

      • CAUTION:

        • When using SHAREBUFFERS, RECFM=V, and _INFILE_, use caution if you read a record with one length and update the file with a record of a different length. The length of the record can change by modifying _INFILE_. One option to avoid this potential problem is to pad or truncate _INFILE_ so that the original record length is maintained .

    • Tip: Use SHAREBUFFERS with the INFILE, FILE, and PUT statements to update an external file in place. This saves CPU time because the PUT statement output is written straight from the input buffer instead of the output buffer.

    • Tip: Use SHAREBUFFERS to update specific fields in an external file instead of an entire record.

    • Featured in: Example 6 on page 1238

  • START= variable

    • names a variable whose value SAS uses as the first column number of the record that the PUT _INFILE_ statement writes . Like automatic variables, the START variable is not written to the data set.

    • See Also: _INFILE_ option in the PUT statement

  • TRUNCOVER

    • overrides the default behavior of the INPUT statement when an input data record is shorter than the INPUT statement expects. By default, the INPUT statement automatically reads the next input data record. TRUNCOVER enables you to read variable-length records when some records are shorter than the INPUT statement expects. Variables without any values assigned are set to missing.

    • Tip: Use TRUNCOVER to assign the contents of the input buffer to a variable when the field is shorter than expected.

    • See: Reading Past the End of a Line on page 1232

    • See Also: FLOWOVER on page 1225, MISSOVER on page 1226, SCANOVER on page 1228, and STOPOVER on page 1228

    • Featured in: Example 3 on page 1236

  • UNBUFFERED

    • tells SAS not to perform a buffered (look ahead) read.

    • Alias: UNBUF

    • Interaction: When you use UNBUFFERED, SAS never sets the END= variable to 1.

    • Tip: When you read instream data with a DATALINES statement, UNBUFFERED is in effect.

  • _INFILE_= variable

    • names a character variable that references the contents of the current input buffer for this INFILE statement. You can use the variable in the same way as any other variable, even as the target of an assignment. The variable is automatically retained and initialized to blanks. Like automatic variables, the _INFILE_= variable is not written to the data set.

    • Restriction: variable cannot be a previously defined variable. Ensure that the _INFILE_= specification is the first occurrence of this variable in the DATA step. Do not set or change the length of _INFILE_= variable with the LENGTH or ATTRIB statements. However, you can attach a format to this variable with the ATTRIB or FORMAT statement.

    • Interaction: The maximum length of this character variable is the logical record length for the specified INFILE statement. However, SAS does not open the file to know the LRECL= until prior to the execution phase. Therefore, the designated size for this variable during the compilation phase is 32,767.

    • Tip: Modification of this variable directly modifies the INFILE statement s current input buffer. Any PUT _INFILE_ (when this INFILE is current) that follows the buffer modification reflects the modified buffer contents. The _INFILE_= variable accesses only the current input buffer of the specified INFILE statement even if you use the N= option to specify multiple buffers.

    • Tip: To access the contents of the input buffer in another statement without using the _INFILE_= option, use the automatic variable _INFILE_.

    • Tip: The _INFILE_ variable does not have a fixed width. When you assign a value to the _INFILE_ variable, the length of the variable changes to the length of the value that is assigned.

    • Main Discussion: Accessing the Contents of the Input Buffer on page 1230

    • Featured in: Example 9 on page 1239 and Example 10 on page 1240

Operating Environment Options

Operating Environment Information: For descriptions of operating environment-specific options in the INFILE statement, see the SAS documentation for your operating environment.

DBMS Specifications

DBMS-Specifications

  • enable you to read records from some DBMS files. You must license SAS/ACCESS software to be able to read from DBMS files. See the SAS/ACCESS documentation for the DBMS that you use.

Details

Operating Environment Information: The INFILE statement contains operating environment-specific material. See the SAS documentation for your operating environment before using this statement.

How to Use the INFILE Statement Because the INFILE statement identifies the file to read, it must execute before the INPUT statement that reads the input data records. You can use the INFILE statement in conditional processing, such as an IF-THEN statement, because it is executable. This enables you to control the source of the input data records.

Usually, you use an INFILE statement to read data from an external file. When data is read from the job stream, you must use a DATALINES statement. However, to take advantage of certain data-reading options that are available only in the INFILE statement, you can use an INFILE statement with the file-specification DATALINES and a DATALINES statement in the same DATA step. See Reading Long Instream Data Records on page 1232 for more information.

When you use more than one INFILE statement for the same file specification and you use options in each INFILE statement, the effect is additive. To avoid confusion, use all the options in the first INFILE statement for a given external file.

Reading Multiple Input Files You can read from multiple input files in a single iteration of the DATA step in one of two ways:

  • to keep multiple files open and change which file is read, use multiple INFILE statements.

  • to dynamically change the current input file within a single DATA step, use the FILEVAR= option in an INFILE statement. The FILEVAR= option enables you to read from one file, close it, and then open another. See Example 5 on page 1237.

Updating External Files in Place You can use the INFILE statement in combination with the FILE statement to update records in an external file. Follow these steps:

  1. Specify the INFILE statement before the FILE statement.

  2. Specify the same fileref or physical filename in each statement.

  3. Use options that are common to both the INFILE and FILE statements in the INFILE statement instead of the FILE statement. (Any such options that are used in the FILE statement are ignored.)

See Example 6 on page 1238.

To update individual fields within a record instead of the entire record, see the term SHAREBUFFERS under Arguments on page 1222.

Accessing the Contents of the Input Buffer In addition to the _INFILE_= variable, you can use the automatic _INFILE_ variable to reference the contents of the current input buffer for the most recent execution of the INFILE statement. This character variable is automatically retained and initialized to blanks. Like other automatic variables, _INFILE_ is not written to the data set.

When you specify the _INFILE_= option in an INFILE statement, then this variable is also indirectly referenced by the automatic _INFILE_ variable. If the automatic _INFILE_ variable is present and you omit _INFILE_= in a particular INFILE statement, then SAS creates an internal _INFILE_= variable for that INFILE statement. Otherwise, SAS does not create the _INFILE_= variable for a particular FILE.

During execution and at the point of reference, the maximum length of this character variable is the maximum length of the current _INFILE_= variable. However, because _INFILE_ merely references other variables whose lengths are not known until prior to the execution phase, the designated length is 32,767 during the compilation phase. For example, if you assign _INFILE_ to a new variable whose length is undefined, then the default length of the new variable is 32,767. You cannot use the LENGTH statement and the ATTRIB statement to set or override the length of _INFILE_. You can use the FORMAT statement and the ATTRIB statement to assign a format to _INFILE_.

Like other SAS variables, you can update the _INFILE_ variable in an assignment statement. You can also use a format with _INFILE_ in a PUT statement. For example, the following PUT statement writes the contents of the input buffer by using a hexadecimal format.

 put _infile_ $hex100.; 

Any modification of the _INFILE_ directly modifies the current input buffer for the current INFILE statement. The execution of any PUT _INFILE_ statement that follows this buffer modification will reflect the contents of the modified buffer.

_INFILE_ only accesses the contents of the current input buffer for an INFILE statement, even when you use the N= option to specify multiple buffers. You can access all the N= buffers, but you must use an INPUT statement with the # line pointer control to make the desired buffer the current input buffer.

Reading Delimited Data By default, the delimiter to read input data records with list input is a blank space. Both the delimiter-sensitive data (DSD) option and the DELIMITER= option affect how list input handles delimiters. The DELIMITER= option specifies that the INPUT statement use a character other than a blank as a delimiter for data values that are read with list input. When the DSD option is in effect, the INPUT statement uses a comma as the default delimiter.

To read a value as missing between two consecutive delimiters, use the DSD option. By default, the INPUT statement treats consecutive delimiters as a unit. When you use DSD, the INPUT statement treats consecutive delimiters separately. Therefore, a value that is missing between consecutive delimiters is read as a missing value. To change the delimiter from a comma to another value, use the DELIMITER= option.

For example, this DATA step program uses list input to read data that is separated with commas. The second data line contains a missing value. Because SAS allows consecutive delimiters with list input, the INPUT statement cannot detect the missing value.

 data scores;     infile datalines delimiter=',';     input test1 test2 test3;     datalines;  91,87,95  97,,92  ,1,1  ; 

With the FLOWOVER option in effect, the data set SCORES contains two, not three, observations. The second observation is built incorrectly:

OBS

TEST1

TEST2

TEST3

1

91

87

95

2

97

92

1

To correct the problem, use the DSD option in the INFILE statement.

 infile datalines dsd; 

Now the INPUT statement detects the two consecutive delimiters and therefore assigns a missing value to variable TEST2 in the second observation.

OBS

TEST1

TEST2

TEST3

1

91

87

95

2

97

.

92

3

.

1

1

The DSD option also enables list input to read a character value that contains a delimiter within a quoted string. For example, if data is separated with commas, DSD enables you to place the character string in quotation marks and read a comma as a valid character. SAS does not store the quotation marks as part of the character value. To retain the quotation marks as part of the value, use the tilde (~) format modifier in an INPUT statement. See Example 1 on page 1234.

Reading Long Instream Data Records You can use the INFILE statement with the DATALINES file specification to process instream data. An INPUT statement reads the data records that follow the DATALINES statement. If you use the CARDIMAGE system option, or if this option is the default for your system, then SAS processes the data lines exactly like 80-byte punched card images that are padded with blanks. The default FLOWOVER option in the INFILE statement causes the INPUT statement to read the next record if it does not find values in the current record for all of the variables in the statement. To ensure that your data is processed correctly, use an external file for input when record lengths are greater than 80 bytes.

Note: The NOCARDIMAGE system option (see CARDIMAGE System Option on page 1487) specifies that data lines not be treated as if they were 80-byte card images. The end of a data line is always treated as the end of the last token, except for strings that are enclosed in quotation marks.

Reading Past the End of a Line By default, if the INPUT statement tries to read past the end of the current input data record, then it moves the input pointer to column 1 of the next record to read the remaining values. This default behavior is specified by the FLOWOVER option. A message is written to the SAS log:

 NOTE: SAS went to a new line when INPUT  statement reached past the end of a line. 

Several options are available to change the INPUT statement behavior when an end of line is reached. The STOPOVER option treats this condition as an error and stops building the data set. The MISSOVER option sets the remaining INPUT statement variables to missing values. The SCANOVER option, used with @ character-string scans the input record until it finds the specified character-string . The FLOWOVER option restores the default behavior.

The TRUNCOVER and MISSOVER options are similar. Both options set the remaining INPUT statement variables to missing values. The MISSOVER option, however, causes the INPUT statement to set a value to missing if the statement is unable to read an entire field because the field length that is specified in the INPUT statement is too short. The TRUNCOVER option writes whatever characters are read to the appropriate variable so that you know what the input data record contained.

For example, an external file with variable-length records contains these records:

 ----+----1----+----2  1  22  333  4444  55555 

The following DATA step reads this data to create a SAS data set. Only one of the input records is as long as the informatted length of the variable TESTNUM.

 data numbers;     infile '  external-file'  ;     input testnum 5.;  run; 

This DATA step creates the three observations from the five input records because by default the FLOWOVER option is used to read the input records.

If you use the MISSOVER option in the INFILE statement, then the DATA step creates five observations. However, all the values that were read from records that were too short are set to missing. Use the TRUNCOVER option in the INFILE statement to correct this problem:

 infile '  external-file  ' truncover; 

The DATA step now reads the same input records and creates five observations. See Table 7.5 on page 1233 to compare the SAS data sets.

Table 7.5: The Value of TESTNUM Using Different INFILE Statement Options

OBS

FLOWOVER

MISSOVER

TRUNCOVER

1

22

.

1

2

4444

.

22

3

55555

.

333

4

 

.

4444

5

 

55555

55555

Comparisons

  • The INFILE statement specifies the input file for any INPUT statements in the DATA step. The FILE statement specifies the output file for any PUT statements in the DATA step.

  • An INFILE statement usually identifies data from an external file. A DATALINES statement indicates that data follows in the job stream. You can use the INFILE statement with the file specification DATALINES to take advantage of certain data-reading options that affect how the INPUT statement reads instream data.

Examples

Example 1: Changing How Delimiters Are Treated

By default, the INPUT statement uses a blank as the delimiter. This DATA step uses a comma as the delimiter:

 data num;     infile datalines dsd;     input x y z;     datalines;  ,2,3  4,5,6  7,8,9  ; 

The argument DATALINES in the INFILE statement allows you to use an INFILE statement option to read instream data lines. The DSD option sets the comma as the default delimiter. Because a comma precedes the first value in the first dataline, a missing value is assigned to variable X in the first observation, and the value 2 is assigned to variable Y.

If the data uses multiple delimiters or a single delimiter other than a comma, then simply specify the delimiter values with the DELIMITER= option. In this example, the characters a and b function as delimiters:

 data nums;      infile datalines dsd delimiter=ab;      input X Y Z;      datalines;  1aa2ab3  4b5bab6  7a8b9  ; 

The output that PROC PRINT generates shows the resulting NUMS data set. Values are missing for variables in the first and second observations because DSD causes list input to detect two consecutive delimiters. If you omit DSD, the characters a, b, aa, ab, ba, or bb function as the delimiter and no variables are assigned missing values.

Output 7.5: The NUMS Data Set
start example
 The SAS System                  1  OBS    X    Y    Z   1     1    .    2   2     4    5    .   3     7    8    9 
end example
 

This DATA step uses modified list input and the DSD option to read data that is separated by commas and that might contain commas as part of a character value:

 data scores;     infile datalines dsd;     input Name : . Score           Team  : . Div $;     datalines;  Joseph,76,"Red Racers, Washington",AAA  Mitchel,82,"Blue Bunnies, Richmond",AAA  Sue Ellen,74,"Green Gazelles, Atlanta",AA  ; 

The output that PROC PRINT generates shows the resulting SCORES data set. The delimiter (comma) is stored as part of the value of TEAM while the quotation marks are not.

Output 7.6: Data Set SCORES
start example
 The SAS System                            1  OBS    NAME         SCORE             TEAM                DIV   1     Joseph         76     Red Racers, Washington       AAA   2     Mitchel        82     Blue Bunnies, Richmond       AAA   3     Sue Ellen      74     Green Gazelles, Atlanta      AA 
end example
 

Example 2: Handling Missing Values and Short Records with List Input

This example demonstrates how to prevent missing values from causing problems when you read the data with list input. Some data lines in this example contain fewer than five temperature values. Use the MISSOVER option so that these values are set to missing.

 data weather;     infile datalines missover;     input temp1-temp5;     datalines;  97.9 98.1 98.3  98.6 99.2 99.1 98.5 97.5  96.2 97.3 98.3 97.6 96.5  ; 

SAS reads the three values on the first data line as the values of TEMP1, TEMP2, and TEMP3. The MISSOVER option causes SAS to set the values of TEMP4 and TEMP5 to missing for the first observation because no values for those variables are in the current input data record.

When you omit the MISSOVER option or use FLOWOVER, SAS moves the input pointer to line 2 and reads values for TEMP4 and TEMP5. The next time the DATA step executes, SAS reads a new line which, in this case, is line 3. This message appears in the SAS log:

 NOTE: SAS went to a new line when INPUT  statement reached past the end of a line. 

You can also use the STOPOVER option in the INFILE statement. This causes the DATA step to halt execution when an INPUT statement does not find enough values in a record of raw data:

 infile datalines stopover; 

Because SAS does not find a TEMP4 value in the first data record, it sets _ERROR_ to 1, stops building the data set, and prints data line 1.

Example 3: Scanning Variable-Length Records for a Specific Character String

This example shows how to use TRUNCOVER in combination with SCANOVER to pull phone numbers from a phone book. The phone number is always preceded by the word phone: . Because the phone numbers include international numbers, the maximum length is 32 characters.

 filename phonebk  host-specific-path  ;  data _null_;    file phonebk;    input line .;    put line;    datalines;      Jenny's Phone Book      Jim Johanson phone: 619-555-9340         Jim wants a scarf for the holidays.      Jane Jovalley phone: (213) 555-4820         Jane started growing cabbage in her garden.         Her dog's name is Juniper.      J.R. Hauptman phone: (49)12 34-56 78-90         J.R. is my brother.     ;  run; 

Use @ phone: to scan the lines of the file for a phone number and position the file pointer where the phone number begins. Use TRUNCOVER in combination with SCANOVER to skip the lines that do not contain phone: and write only the phone numbers to the log.

 data _null_;     infile phonebk truncover scanover;     input @'phone:' phone .;     put phone=;  run; 

The program writes the following lines to the SAS log:

 ----+----1----+----2----+----3  phone=619-555-9340  phone=(213) 555-4820  phone=(49)12 34-56 78-90 

Example 4: Reading Files That Contain Variable-Length Records

This example shows how to use LENGTH=, in combination with the $VARYING. informat, to read a file that contains variable-length records:

 data a;     infile  file-specification  length=linelen;     input firstvar 1-10 @;  /* assign LINELEN   */     varlen=linelen-10;      /* Calculate VARLEN */     input @11 secondvar $varying500. varlen;  run; 

The following occurs in this DATA step:

  • The INFILE statement creates the variable LINELEN but does not assign it a value.

  • When the first INPUT statement executes, SAS determines the line length of the record and assigns that value to the variable LINELEN. The single trailing @ holds the record in the input buffer for the next INPUT statement.

  • The assignment statement uses the two known lengths (the length of FIRSTVAR and the length of the entire record) to determine the length of VARLEN.

  • The second INPUT statement uses the VARLEN value with the informat $VARYING500. to read the variable SECONDVAR.

See $VARYING w . Informat on page 963 for more information.

Example 5: Reading from Multiple Input Files

The following DATA step reads from two input files during each iteration of the DATA step. As SAS switches from one file to the next, each file remains open. The input pointer remains in place to begin reading from that location the next time an INPUT statement reads from that file.

 data qtrtot(drop=jansale febsale marsale                   aprsale maysale junsale);       /* identify location of 1st file */     infile  file-specification-1  ;       /* read values from 1st file     */     input name $ jansale febsale marsale;     qtr1tot=sum(jansale,febsale,marsale);       /* identify location of 2nd file */     infile  file-specification-2  ;       /* read values from 2nd file     */     input @7 aprsale maysale junsale;     qtr2tot=sum(aprsale,maysale,junsale);  run; 

The DATA step terminates when SAS reaches an end of file on the shortest input file.

This DATA step uses FILEVAR= to read from a different file during each iteration of the DATA step:

 data allsales;     length fileloc myinfile $ 300;     input fileloc $ ; /* read instream data       */    /* The INFILE statement closes the current file       and opens a new one if FILELOC changes value       when INFILE executes                        */     infile  file-specification  filevar=fileloc            filename=myinfile end=done;    /* DONE set to 1 when last input record read   */     do while(not done);    /* Read all input records from the currently   */    /* opened input file, write to ALLSALES        */       input name $ jansale febsale marsale;       output;     end;     put 'Finished reading ' myinfile=;     datalines;  external-file-1   external-file-2   external-file-3  ; 

The FILENAME= option assigns the name of the current input file to the variable MYINFILE. The LENGTH statement ensures that the FILENAME= variable and FILEVAR= variable have a length that is long enough to contain the value of the filename. The PUT statement prints the physical name of the currently open input file to the SAS log.

Example 6: Updating an External File

This example shows how to use the INFILE statement with the SHAREBUFFERS option and the INPUT, FILE, and PUT statements to update an external file in place:

 data _null_;       /* The INFILE and FILE statements     */       /* must specify the same file.       */     infile  file-specification-1  sharebuffers;     file  file-specification-1  ;     input state $ 1-2 phone $ 5-16;       /* Replace area code for NC exchanges */     if state= 'NC' and substr(phone,5,3)='333' then       phone='910-'substr(phone,5,8);     put phone 5-16;  run; 

Example 7: Truncating Copied Records

The LENGTH= option is useful when you copy the input file to another file with the PUT _INFILE_ statement. Use LENGTH= to truncate the copied records. For example, these statements truncate the last 20 columns from each input data record before the input data record is written to the output file:

 data _null_;     infile  file-specification-1  length=a;     input;     a=a-20;     file  file-specification-2  ;     put _infile_;  run; 

The START= option is also useful when you want to truncate what the PUT _INFILE_ statement copies. For example, if you do not want to copy the first 10 columns of each record, these statements copy from column 11 to the end of each record in the input buffer:

 data _null_;     infile  file-specification  start=s;     input;     s=11;     file  file-specification-2  ;     put _infile_;  run; 

Example 8: Listing the Pointer Location

This DATA step assigns the value of the current pointer location in the input buffer to the variables LINEPT and COLUMNPT:

 data _null_;     infile datalines n=2 line=Linept col=Columnpt;     input name $ 1-15 #2 @3 id;     put linept= columnpt=;     datalines;  J. Brooks    40974  T. R. Ansen    4032  ; 

These statements produce the following line for each execution of the DATA step because the input pointer is on the second line in the input buffer when the PUT statement executes:

 Linept=2 Columnpt=9  Linept=2 Columnpt=8 

Example 9: Working with Data in the Input Buffer

The _INFILE_ variable always contains the most recent record that is read from an INPUT statement. This example illustrates the use of the _INFILE_ variable to

  • read an entire record that you want to parse without using the INPUT statement.

  • read an entire record that you want to write to the SAS log.

  • modify the contents of the input record before parsing the line with an INPUT statement.

The example file contains phone bill information. The numeric data, minutes, and charge are enclosed in angle brackets (< >).

 filename phonbill  host-specific-filename  ;  data _null_;     file phonbill;     input line .;     put line;     datalines;     City Number Minutes Charge     Jackson 415-555-2384 <25> <2.45>     Jefferson 813-555-2356 <15> <1.62>     Joliet 913-555-3223 <65> <10.32>     ;  run; 

The following code reads each record and parses the record to extract the minute and charge values.

 data _null_;     infile phonbill firstobs=2;     input;     city = scan(_infile_, 1, ' ');     char_min = scan(_infile_, 3, ' ');     char_min = substr(char_min, 2, length(char_min)-2);     minutes = input(char_min, BEST12.);     put city= minutes=;  run; 

The program writes the following lines to the SAS log:

 ----+----1----+----2----+----3  city=Jackson minutes=25  city=Jefferson minutes=15  city=Joliet minutes=65 

The INPUT statement in the following code reads a record from the file. The automatic _INFILE_ variable is used in the PUT statement to write the record to the log.

 data _null_;     infile phonbill;     input;     put _infile_;  run; 

The program writes the following lines to the SAS log:

 ----+----1----+----2----+----3----+----4  City Number Minutes Charge  Jackson 415-555-2384 <25> <2.45>  Jefferson 813-555-2356 <15> <1.62>  Joliet 913-555-3223 <65> <10.32> 

In the following code, the first INPUT statement reads and holds the record in the input buffer. The _INFILE_= option removes the angle brackets (< >) from the numeric data. The second INPUT statement parses the value in the buffer.

 data _null_;     length city number . minutes charge 8;     infile phonbill firstobs=2;     input @;     _infile_ = compress(_infile_, <>);     input city number minutes charge;     put city= number= minutes= charge=;  run; 

The program writes the following lines to the SAS log:

 ----+----1----+----2----+----3----+----4----+----5----+----6  city=Jackson number=415-555-2384 minutes=25 charge=2.45  city=Jefferson number=813-555-2356 minutes=15 charge=1.62  city=Joliet number=913-555-3223 minutes=65 charge=10.32 

Example 10: Accessing the Input Buffers of Multiple Files

This example uses both the _INFILE_ automatic variable and the _INFILE_= option to read multiple files and access the input buffers for each of them. The following code creates four files: three data files and one file that contains the names of all the data files. The second DATA step reads the filenames file, opens each data file, and writes the contents to the log. Because the PUT statement needs _INFILE_ for the filenames file and the data file, one of the _INFILE_ variables is referenced with fname.

 data _null_;     do i = 1 to 3;        fname= '  external-data-file  '  put(i,1.)  '.dat';        file datfiles filevar=fname;        do j = 1 to 5;           put i j;        end;        file '  external-filenames-file  ';        put fname;     end;  run;  data _null_;     infile '  external-filenames-file  ' _infile_=fname;     input;     infile datfiles filevar=fname end=eof;     do while(^eof);        input;        put fname _infile_;     end;  run; 

The program writes the following lines to the SAS log:

 ----+----1----+----2----+----3----+----4----+----5----+----6  NOTE: The infile  external-filenames-file  is:        File Name=  external-filenames-file  ,        RECFM=V, LRECL=256  NOTE: The infile DATFILES is:        File Name=  external-data-file  1.dat,        RECFM=V, LRECL=256  external-data-file  1.dat 1 1  external-data-file  1.dat 1 2  external-data-file  1.dat 1 3  external-data-file  1.dat 1 4  external-data-file  1.dat 1 5  NOTE: The infile DATFILES is        File Name=  external-data-file  2.dat,        RECFM=V, LRECL=256  external-data-file  2.dat 2 1  external-data-file  2.dat 2 2  external-data-file  2.dat 2 3  external-data-file  2.dat 2 4  external-data-file  2.dat 2 5  NOTE: The infile DATFILES is        File Name=  external-data-file  3.dat,        RECFM=V, LRECL=256  external-data-file  3.dat 3 1  external-data-file  3.dat 3 2  external-data-file  3.dat 3 3  external-data-file  3.dat 3 4  external-data-file  3.dat 3 5 

Example 11: Specifying an Encoding When Reading an External File

This example creates a SAS data set from an external file. The external file s encoding is in UTF-8, and the current SAS session encoding is Wlatin1. By default, SAS assumes that the external file is in the same encoding as the session encoding, which causes the character data to be written to the new SAS data set incorrectly.

To tell SAS what encoding to use when reading the external file, specify the ENCODING= option. When you tell SAS that the external file is in UTF-8, SAS then transcodes the external file from UTF-8 to the current session encoding when writing to the new SAS data set. Therefore, the data is written to the new data set correctly in Wlatin1.

 libname myfiles '  SAS-data-library  ';  filename extfile '  external-file  ';  data myfiles.unicode;     infile extfile   encoding="utf-8"   ;     input Make $ Model $ Year;  run; 

See Also

Statements:

  • FILENAME Statement on page 1169

  • INPUT Statement on page 1245

  • PUT Statement on page 1342




SAS 9.1 Language Reference Dictionary, Volumes 1, 2 and 3
SAS 9.1 Language Reference Dictionary, Volumes 1, 2 and 3
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 704

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net