Reading Raw Data with the INPUT Statement


Choosing an Input Style

The INPUT statement reads raw data from instream data lines or external files into a SAS data set. You can use the following different input styles, depending on the layout of data values in the records:

  • list input

  • column input

  • formatted input

  • named input.

You can also combine styles of input in a single INPUT statement. For details about the styles of input, see the INPUT statement in SAS Language Reference: Dictionary .

List Input

List input uses a scanning method for locating data values. Data values are not required to be aligned in columns but must be separated by at least one blank (or other defined delimiter ). List input requires only that you specify the variable names and a dollar sign ($), if defining a character variable. You do not have to specify the location of the data fields.

An example of list input follows :

 data scores;     length name $ 12;     input name $ score1 score2;     datalines;  Riley 1132 1187  Henderson 1015 1102  ; 

List input has several restrictions on the type of data that it can read:

  • Input values must be separated by at least one blank (the default delimiter) or by the delimiter specified with the DELIMITER= option in the INFILE statement. If you want SAS to read consecutive delimiters as though there is a missing value between them, specify the DSD option in the INFILE statement.

  • Blanks cannot represent missing values. A real value, such as a period, must be used instead.

  • To read and store a character input value longer than 8 bytes, define a variable's length by using a LENGTH, INFORMAT, or ATTRIB statement prior to the INPUT statement, or by using modified list input, which consists of an informat and the colon modifier on the INPUT statement. See 'Modified List Input' on page 363 for more information.

  • Character values cannot contain embedded blanks when the file is delimited by blanks.

  • Fields must be read in order.

  • Data must be in standard numeric or character format.

Note: Nonstandard numeric values, such as packed decimal data, must use the formatted style of input. See 'Formatted Input' on page 365 for more information.

Modified List Input

A more flexible version of list input, called modified list input, includes format modifiers. The following format modifiers enable you to use list input to read nonstandard data by using SAS informats:

  • The & (ampersand) format modifier enables you to read character values that contain embedded blanks with list input and to specify a character informat. SAS reads until it encounters multiple blanks.

  • The : (colon) format modifier enables you to use list input but also to specify an informat after a variable name, whether character or numeric. SAS reads until it encounters a blank column.

  • The ~ (tilde) format modifier enables you to read and retain single quotation marks, double quotation marks, and delimiters within character values.

The following is an example of the : and ~ format modifiers:

 data scores;     infile datalines dsd;     input Name : . Score1-Score3 Team ~ . Div $;     datalines;  Smith,12,22,46,"Green Hornets, Atlanta",AAA  Mitchel,23,19,25,"High Volts, Portland",AAA  Jones,09,17,54,"Vulcans, Las Vegas",AA  ;  proc print data=scores noobs;  run; 
Output 21.1: Output from Example with Format Modifiers
start example
 Name        Score1    Score2    Score3               Team                 Div  Smith          12        22        46       "Green Hornets, Atlanta"       AAA  Mitchel        23        19        25       "High Volts, Portland"         AAA  Jones           9        17        54       "Vulcans, Las Vegas"           AA 
end example
 

Column Input

Column input enables you to read standard data values that are aligned in columns in the data records. Specify the variable name, followed by a dollar sign ($) if it is a character variable, and specify the columns in which the data values are located in each record:

 data scores;     infile datalines truncover;     input name $ 1-12 score2 17-20 score1 27-30;     datalines;  Riley           1132       987  Henderson       1015      1102  ; 

Note: Use the TRUNCOVER option on the INFILE statement to ensure that SAS handles data values of varying lengths appropriately.

To use column input, data values must be:

  • in the same field on all the input lines

  • in standard numeric or character form.

Note: You cannot use an informat with column input.

Features of column input include the following:

  • Character values can contain embedded blanks.

  • Character values can be from 1 to 32,767 characters long.

  • Placeholders, such as a single period (.), are not required for missing data.

  • Input values can be read in any order, regardless of their position in the record.

  • Values or parts of values can be reread.

  • Both leading and trailing blanks within the field are ignored.

  • Values do not need to be separated by blanks or other delimiters.

Formatted Input

Formatted input combines the flexibility of using informats with many of the features of column input. By using formatted input, you can read nonstandard data for which SAS requires additional instructions. Formatted input is typically used with pointer controls that enable you to control the position of the input pointer in the input buffer when you read data.

The INPUT statement in the following DATA step uses formatted input and pointer controls. Note that $12. and COMMA5. are informats and +4 and +6 are column pointer controls.

 data scores;     input name . +4 score1 comma5. +6 score2 comma5.;     datalines;  Riley           1,132      1,187  Henderson       1,015      1,102  ; 

Note: You also can use informats to read data that is not aligned in columns. See 'Modified List Input' on page 363 for more information.

Important points about formatted input are:

  • Characters values can contain embedded blanks.

  • Character values can be from 1 to 32,767 characters long.

  • Placeholders, such as a single period (.), are not required for missing data.

  • With the use of pointer controls to position the pointer, input values can be read in any order, regardless of their positions in the record.

  • Values or parts of values can be reread.

  • Formatted input enables you to read data stored in nonstandard form, such as packed decimal or numbers with commas.

Named Input

You can use named input to read records in which data values are preceded by the name of the variable and an equal sign (=). The following INPUT statement reads the data lines containing equal signs.

 data games;     input name=$ score1= score2=;     datalines;  name=riley score1=1132 score2=1187  ;  proc print data=games;  run; 

Note: When an equal sign follows a variable in an INPUT statement, SAS expects that data remaining on the input line contains only named input values. You cannot switch to another form of input in the same INPUT statement after using named input.

Also, note that any variable that exists in the input data but is not defined in the INPUT statement generates a note in the SAS log indicating a missing field.

Additional Data-Reading Features

In addition to different styles of input, there are many tools to meet the needs of different data-reading situations. You can use options in the INFILE statement in combination with the INPUT statement to give you additional control over the reading of data records. Table 21.3 on page 366 lists common data-reading tasks and the appropriate features available in the INPUT and INFILE statements.

Table 21.3: Additional Data-Reading Features

Input Data Feature

Goal

Use

multiple records

create a single observation

#n or / line pointer control in the INPUT statement with a DO loop.

a single record

create multiple observations

trailing @@ in the INPUT statement. trailing @ with multiple INPUT and OUTPUT statements.

variable-length data fields and records

read delimited data

list input with or without a format modifier in the INPUT statement and the TRUNCOVER, DELIMITER= and/or DSD options in the INFILE statement.

 

read non-delimited data

$VARYING w . informat in the INPUT statement and the LENGTH= and TRUNCOVER options in the INFILE statement.

a file with varying record layouts

 

IF-THEN statements with multiple INPUT statements, using trailing @ or @@ as necessary.

hierarchical files

 

IF-THEN statements with multiple INPUT statements, using trailing @ as necessary.

more than one input file or to control the program flow at EOF

 

EOF= or END= option in an INFILE statement.

   

multiple INFILE and INPUT statements.

   

FILEVAR=option in an INFILE statement.

   

FILENAME statement with concatenation, wildcard, or piping.

only part of each record

 

LINESIZE=option in an INFILE statement.

some but not all records in the file

 

FIRSTOBS=and OBS= options in an INFILE statement; FIRSTOBS= and OBS= system options; #n line pointer control.

instream datalines

control the reading with special options

INFILE statement with DATALINES and appropriate options.

starting at a particular column

 

@ column pointer controls.

leading blanks

maintain them

$CHAR w. informat in an INPUT statement.

a delimiter other than blanks (with list input or modified list input with the colon modifier)

 

DELIMITER= option and/ or DSD option in an INFILE statement.

the standard tab character

 

DELIMITER= option in an INFILE statement; or the EXPANDTABS option in an INFILE statement.

missing values (with list input or modified list input with the colon modifier)

create observations without compromising data integrity; protect data integrity by overriding the default behavior

TRUNCOVER option in an INFILE statement; DSD and/or DELIMITER= options might also be needed.

For further information on data-reading features, see the INPUT and INFILE statements in SAS Language Reference: Dictionary .




SAS 9.1 Language Reference. Concepts
SAS 9.1 Language Reference Concepts
ISBN: 1590471989
EAN: 2147483647
Year: 2004
Pages: 255

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net