Scans the input data record for input values and assigns them to the corresponding SAS variables
Valid: in a DATA step
Category: File-handling
Type: Executable
INPUT < pointer-control > variable <$><&><@@@>;
INPUT < pointer-control > variable <:&~>
< informat. > <@ @@>;
pointer-control
moves the input pointer to a specified line or column in the input buffer.
See: Column Pointer Controls on page 1247 and Line Pointer Controls on page 1249
Featured in: Example 2 on page 1271
variable
names a variable that is assigned input values.
$
indicates to store a variable value as a character value rather than as a numeric value.
Tip: If the variable is previously defined as character, $ is not required.
Featured in: Example 1 on page 1270
&
indicates that a character value may have one or more single embedded blanks. This format modifier reads the value from the next non-blank column until the pointer reaches two consecutive blanks, the defined length of the variable, or the end of the input line, whichever comes first.
Restriction: The & modifier must follow the variable name and $ sign that it affects.
Tip: If you specify an informat after the & modifier, the terminating condition for the format modifier remains two blanks.
See: Modified List Input on page 1269
Featured in: Example 2 on page 1271
:
enables you to specify an informat that the INPUT statement uses to read the variable value. For a character variable, this format modifier reads the value from the next non-blank column until the pointer reaches the next blank column, the defined length of the variable, or the end of the data line, whichever comes first. For a numeric variable, this format modifier reads the value from the next non-blank column until the pointer reaches the next blank column or the end of the data line, whichever comes first.
Tip: If the length of the variable has not been previously defined, then its value is read and stored with the informat length.
Tip: The pointer continues to read until the next blank column is reached. However, if the field is longer than the formatted length, then the value is truncated to the length of variable.
See: Modified List Input on page 1269
Featured in: Example 3 on page 1271 and Example 5 on page 1272
~
indicates to treat single quotation marks, double quotation marks, and delimiters in character values in a special way. This format modifier reads delimiters within quoted character values as characters instead of as delimiters and retains the quotation marks when the value is written to a variable.
Restriction: You must use the DSD option in an INFILE statement. Otherwise, the INPUT statement ignores this option.
See: Modified List Input on page 1269
Featured in: Example 5 on page 1272
informat.
specifies an informat to use to read the variable values.
Tip: Decimal points in the actual input values always override decimal specifications in a numeric informat.
See Also: Definition of Informats on page 930
Featured in: Example 3 on page 1271 and Example 5 on page 1272
@
holds an input record for the execution of the next INPUT statement within the same iteration of the DATA step. This line-hold specifier is called trailing @ .
Restriction: The trailing @ must be the last item in the INPUT statement.
Tip: The trailing @ prevents the next INPUT statement from automatically releasing the current input record and reading the next record into the input buffer. It is useful when you need to read from a record multiple times.
See: Using Line-Hold Specifiers on page 1253
@@
holds an input record for the execution of the next INPUT statement across iterations of the DATA step. This line-hold specifier is called double trailing @ .
Restriction: The double trailing @ must be the last item in the INPUT statement.
Tip: The double trailing @ is useful when each input line contains values for several observations.
See: Using Line-Hold Specifiers on page 1253
When to Use List Input List input requires that you specify the variable names in the INPUT statement in the same order that the fields appear in the input data records. SAS scans the data line to locate the next value but ignores additional intervening blanks. List input does not require that the data are located in specific columns . However, you must separate each value from the next by at least one blank unless the delimiter between values is changed. By default, the delimiter for data values is one blank space or the end of the input record. List input will not skip over any data values to read subsequent values, but it can ignore all values after a given point in the data record. However, pointer controls enable you to change the order that the data values are read.
There are two types of list input:
simple list input
modified list input.
Modified list input makes the INPUT statement more versatile because you can use a format modifier to overcome several of the restrictions of simple list input. See Modified List Input on page 1269.
Simple List Input Simple list input places several restrictions on the type of data that the INPUT statement can read:
By default, at least one blank must separate the input values. Use the DELIMITER= option or the DSD option in the INFILE statement to specify a delimiter other than a blank.
Represent each missing value with a period, not a blank, or two adjacent delimiters.
Character input values cannot be longer than 8 bytes unless the variable is given a longer length in an earlier LENGTH, ATTRIB, or INFORMAT statement.
Character values cannot contain embedded blanks unless you change the delimiter.
Data must be in standard numeric or character format. [ *]
Modified List Input List input is more versatile when you use format modifiers. The format modifiers are as follows :
Format Modifier | Purpose |
---|---|
& | reads character values that contain embedded blanks. |
: | reads data values that need the additional instructions that informats can provide but that are not aligned in columns. [ **] |
~ | reads delimiters within quoted character values as characters and retains the quotation marks. |
[ **] Use formatted input and pointer controls to quickly read data values that are aligned in columns. |
For example, use the : modifier with an informat to read character values that are longer than 8 bytes or numeric values that contain nonstandard values.
Because list input interprets a blank as a delimiter, use modified list input to read values that contain blanks. The & modifier reads character values that contain single embedded blanks. However, the data values must be separated by two or more blanks. To read values that contain leading, trailing, or embedded blanks with list input, use the DELIMITER= option in the INFILE statement to specify another character as the delimiter. See Example 5 on page 1272. If your input data use blanks as delimiters and they contain leading, trailing, or embedded blanks, you may need to use either column input or formatted input. If quotation marks surround the delimited values, you can use list input with the DSD option in the INFILE statement.
How Modified List Input and Formatted Input Differ Modified list input has a scanning feature that can use informats to read data which are not aligned in columns. Formatted input causes the pointer to move like that of column input to read a variable value. The pointer moves the length that is specified in the informat and stops at the next column.
This DATA step uses modified list input to read the first data value and formatted input to read the second:
data jansales; input item : . amount comma5.; datalines; trucks 1,382 vans 1,235 sedans 2,391 ;
The value of ITEM is read with modified list input. The INPUT statement stops reading when the pointer finds a blank space. The pointer then moves to the second column after the end of the field, which is the correct position to read the AMOUNT value with formatted input.
Formatted input, on the other hand, continues to read the entire width of the field. This INPUT statement uses formatted input to read both data values:
input item . +1 amount comma5.;
To read this data correctly with formatted input, the second data value must occur after the 10 th column of the first value, as shown here:
----+----1----+----2 trucks 1,382 vans 1,235 sedans 2,391
Also, after the value of ITEM is read with formatted input, you must use the pointer control +1 to move the pointer to the column where the value AMOUNT begins.
When Data Contains Quotation Marks When you use the DSD option in an INFILE statement, which sets the delimiter to a comma, the INPUT statement removes quotation marks before a value is written to a variable. When you also use the tilde (~) modifier in an INPUT statement, the INPUT statement maintains quotation marks as part of the value.
The INPUT statement in this DATA step uses simple list input to read the input data records:
data scores; input name $ score1 score2 score3 team $; datalines; Joe 11 32 76 red Mitchel 13 29 82 blue Susan 14 27 74 green ;
The next INPUT statement reads only the first four fields in the previous data lines, which demonstrates that you are not required to read all the fields in the record:
input name $ score1 score2 score3;
The INPUT statement in this DATA step uses the & format modifier with list input to read character values that contain embedded blanks.
data list; infile file-specification ; input name $ & score; run;
It can read these input data records:
----+----1----+----2----+----3----+ Joseph 11 Joergensen red Mitchel 13 Mc Allister blue Su Ellen 14 Fischer-Simon green
The & modifier follows the variable it affects in the INPUT statement. Because this format modifier follows NAME, at least two blanks must separate the NAME field from the SCORE field in the input data records.
You can also specify an informat with a format modifier, as shown here:
input name $ & +3 lastname & . team $;
In addition, this INPUT statement reads the same data to demonstrate that you are not required to read all the values in an input record. The +3 column pointer control moves the pointer past the score value in order to read the value for LASTNAME and TEAM.
This DATA step uses modified list input to read data values with an informat:
data jansales; input item : . amount; datalines; trucks 1382 vans 1235 sedans 2391 ;
The $10. informat allows a character variable of up to ten characters to be read.
This DATA step uses the DELIMITER= option in the INFILE statement to read list input values that are separated by commas instead of blanks. The example uses an informat to read the date, and a format to write the date.
options pageno=1 nodate ls=80 ps=64; data scores2; length Team $ 14; infile datalines delimiter=','; input Name $ Score1-Score3 Team $ Final_Date:MMDDYY10.; format final_date weekdate17.; datalines; Joe,11,32,76,Red Racers,2/3/2003 Mitchell,13,29,82,Blue Bunnies,4/5/2003 Susan,14,27,74,Green Gazelles,11/13/2003 ; proc print data=scores2; var Name Team Score1-Score3 Final_Date; title 'Soccer Player Scores'; run;
Soccer Player Scores 1 Obs Name Team Score1 Score2 Score3 Final_Date 1 Joe Red Racers 11 32 76 Mon, Feb 3, 2003 2 Mitchell Blue Bunnies 13 29 82 Sat, Apr 5, 2003 3 Susan Green Gazelles 14 27 74 Thu, Nov 13, 2003
This DATA step uses the DSD option in an INFILE statement and the tilde (~) format modifier in an INPUT statement to retain the quotation marks in character data and to read a character in a string that is enclosed in quotation marks as a character instead of as a delimiter.
data scores; infile datalines dsd; input Name : . Score1-Score3 Team ~ . Div $; datalines; Joseph,11,32,76,"Red Racers, Washington",AAA Mitchel,13,29,82,"Blue Bunnies, Richmond",AAA Sue Ellen,14,27,74,"Green Gazelles, Atlanta",AA ;
The output that PROC PRINT generates shows the resulting SCORES data set. The values for TEAM contain the quotation marks.
The SAS System 1 OBS Name Score1 Score2 Score3 Team Div 1 Joseph 11 32 76 "Red Racers, Washington" AAA 2 Mitchel 13 29 82 "Blue Bunnies, Richmond" AAA 3 Sue Ellen 14 27 74 "Green Gazelles, Atlanta" AA
Statements:
INFILE Statement on page 1222
INPUT Statement on page 1245
INPUT Statement, Formatted on page 1263
[ *] See SAS Language Reference: Concepts for the information about standard and nonstandard data values.