Dictionary


CATALOG Procedure

Manages entries in SAS catalogs

UNIX specifics: FILE= option in the CONTENTS statement

See: CATALOG Procedure in Base SAS Procedures Guide

Syntax

PROC CATALOG CATALOG=< libref. > catalog <ENTRYTYPE= etype > <KILL>; CONTENTS <OUT= SAS-data-set > <FILE= fileref >;

Note  

This is a simplified version of the CATALOG procedure syntax. For the complete syntax and its explanation, see the CATALOG procedure in Base SAS Procedures Guide .

fileref

  • names a file specification that is specific to the UNIX operating environment.

Details

The FILE= option in the CONTENTS statement of the CATALOG procedure accepts a fileref. If the name specified does not correspond to a fileref, a file with that name and an extension of .lst is created in the current directory. For example, if MyFile is not a fileref, the following code creates the file MyFile.lst in your current directory:

 proc catalog catalog=sasuser.profile;       contents file=myfile;    run; 
Note  

The filename that is created is always in lowercase, even if you specified it in uppercase.

CIMPORT Procedure

Restores a transport file created by the CPORT procedure

UNIX specifics: name and location of transport file

See: CIMPORT Procedure in Base SAS Procedures Guide

Syntax

PROC CIMPORT destination = libref < libref. > member-name < option(s) >;

Note  

This is a simplified version of the CIMPORT procedure syntax. For the complete syntax and its explanation, see the CIMPORT procedure in Base SAS Procedures Guide .

destination

  • identifies the file(s) in the transport file as a single SAS data set, single SAS catalog, or multiple members of a SAS data library.

libref <libref.>member-name

  • specifies the name of the SAS data set, catalog, or library to be created from the transport file.

Details

Note  

Starting in SAS 9.1, you can use the MIGRATE procedure to convert your SAS files. For more information, see "Migrating 32-Bit SAS Files to 64-Bit in UNIX Environments" on page 106.

The CIMPORT procedure imports a transfer file that was created ( exported ) by the CPORT procedure. The transport file can contain a SAS data set, a SAS catalog, or an entire SAS library.

Typically the INFILE= option is used to designate the source of the transport file. If this option is omitted, CIMPORT uses the default file Sascat.dat in the current directory as the transport file.

Note  

CIMPORT works only with transport files created by the CPORT procedure. If the transport file was created using the XPORT engine with the COPY procedure, then another PROC COPY must be used to restore the transport file. For more information about PROC COPY, see Base SAS Procedures Guide .

Example

For this example, a SAS data library that contains multiple SAS data sets was exported to a file (called transport-file) using the CPORT procedure on a foreign host. The transport file is then moved by a binary transfer to the receiving host.

The following code extracts all of the SAS data sets and catalogs stored within the transport file and restores them to their original state in the new library, called SAS-data-library.

 libname newlib 'SAS-data-library';   filename tranfile 'transport-file';   proc cimport lib=newlib infile=tranfile;   run; 

See Also

  • "CPORT Procedure" on page 275

  • "Migrating 32-Bit SAS Files to 64-Bit in UNIX Environments" on page 106

  • The MIGRATE Procedure at support.sas.com/rnd/migration

  • Moving and Accessing SAS Files

CONTENTS Procedure

Prints descriptions of one or more files from a SAS data library

UNIX specifics: information displayed in the SAS output

See: CONTENTS Procedure in Base SAS Procedures Guide

Syntax

PROC CONTENTS < option(s) >;

Details

The CONTENTS procedure produces the same information as the CONTENTS statement in the DATASETS procedure. See "DATASETS Procedure" on page 276 for sample output.

CONVERT Procedure

Converts BMDP and OSIRIS system files, and SPSS export files to SAS data sets

UNIX specifics: all

Syntax

PROC CONVERT product-specification < option-list >;

Details

The CONVERT procedure converts BMDP and OSIRIS system files, and SPSS export files to SAS data sets. The procedure is supplied for compatibility. The procedure invokes the appropriate engine to convert files.

PROC CONVERT produces one output data set, but no printed output. The new data set contains the same information as the input system file; exceptions are noted in "How Missing Values Are Handled" on page 273.

The procedure converts system files from these three products:

  • BMDP saves files up to and including the most recent release of BMDP (available for AIX, HP-UX, and Solaris only).

  • SPSS saves files in a portable file format. An SPSS portable file can have any file extension. Two common extensions are .por and .exp.

  • OSIRIS saves files through and including OSIRIS IV. (Hierarchical file structures are not supported.)

Because the BMDP, OSIRIS, and SPSS products are maintained by other organizations, changes might be made that make new files incompatible with the current version of PROC CONVERT. SAS upgrades PROC CONVERT to support changes to these products only when a new version of SAS is released.

In the PROC CONVERT statement, product-specification is required and can be one of the following:

BMDP= fileref <(CODE= code CONTENT= content-type )>

  • converts the first member of a BMDP save file created under UNIX (AIX) into a SAS data set. Here is an example:

     filename save '/usr/mydir/bmdp.dat';   proc convert bmdp=save;   run; 

    If you have more than one save file in the BMDP file referenced by the fileref argument, you can use two options in parentheses after fileref . The CODE= option specifies the code of the save file that you want, and the CONTENT= option specifies the content of the save file. For example, if a file with CODE=JUDGES has a content of DATA, you can use the following statements:

     filename save '/usr/mydir/bmdp.dat';   proc convert bmdp=save(code=judges                          content=data);   run; 

OSIRIS= fileref libref

  • specifies a fileref or libref for the OSIRIS file to be converted into a SAS data set. You must also include the DICT= option.

SPSS= fileref libref

  • specifies a fileref or libref for the SPSS export file that is to be converted into a SAS data set. The SPSS file must be created by using the SPSS EXPORT command, but it can be from any operating system.

The option-list can be one or more of the following:

DICT= fileref libref

  • specifies a fileref or libref of the dictionary file for the OSIRIS file. DICT= is valid only when used with the OSIRIS product specification.

FIRSTOBS= n

  • gives the number of the observation where the conversion is to begin, so that you can skip observations at the beginning of the BMDP, OSIRIS, or SPSS file.

OBS= n

  • specifies the number of the last observation to be converted. This enables you to exclude observations at the end of the file.

OUT= SAS-data-set

  • names the SAS data set that will hold the converted data. If OUT= is omitted, SAS still creates a Work data set and automatically names it DATA n , just as if you had omitted a data set name in a DATA statement. See Chapter 4, "Using SAS Files," on page 101 for more information.

How Missing Values are Handled

If a numeric variable in the input data set has no value or a system missing value, CONVERT assigns it a missing value.

How Variable Names are Assigned

The following sections explain how names are assigned to the SAS variables created by the CONVERT procedure.

Caution  

Be sure that the translated names will be unique. Variable names are translated as indicated in the following sections.

Variable Names in BMDP Output

Variable names from the BMDP save file are used in the SAS data set, but nontrailing blanks and all special characters are converted to underscores in the SAS variable names. The subscript in BMDP variable names, such as x(1), becomes part of the SAS variable name with the parentheses omitted: X1. Alphabetic BMDP variables become SAS character variables of corresponding length. Category records from BMDP are not accepted.

Variable Names in OSIRIS Output

For single-response variables, the V1 through V9999 name becomes the SAS variable name. For multiple-response variables, the suffix R n is added to the variable name where n is the response. For example, V25R1 would be the first response of the multiple-response V25. If the variable after V1000 has 100 or more responses, responses above 99 are eliminated. Numeric variables that OSIRIS stores in character, fixed-point binary, or floating-point binary mode become SAS numeric variables. Alphabetic variables become SAS character variables; any alphabetic variable of length greater than 200 is truncated to 200. The OSIRIS variable description becomes a SAS variable label, and OSIRIS print format information becomes a SAS format.

Variable Names in SPSS Output

SPSS variable names and variable labels become variable names and labels without change. SPSS alphabetic variables become SAS character variables of the same length. SPSS blank values are converted to SAS missing values. SPSS print formats become SAS formats, and the SPSS default precision of no decimal places becomes part of the variables' formats. The SPSS DOCUMENT data is copied so that the CONTENTS procedure can display it. SPSS value labels are not copied .

File Conversion Examples

These three examples show how to convert BMDP, OSIRIS, and SPSS files to SAS data sets.

Converting a BMDP save file

  • The following statements convert a BMDP save file and produce the temporary SAS data set Temp, which contains the converted data:

     filename bmdpfile 'bmdp.savefile';   proc convert bmdp=bmdpfile out=temp;   run; 

Converting an OSIRIS file

  • The following statements convert an OSIRIS file and produce the temporary SAS data set Temp, which contains the converted data:

     filename osirfile 'osirdata';   filename dictfile 'osirdict';   proc convert osiris=osirfile dict=dictfile                out=temp;   run; 

Converting an SPSS file

  • The following statements convert an SPSS Release 9 file and produce the temporary SAS data set Temp, which contains the converted data:

     filename spssfile 'spssfile.num1';   proc convert spss=spssfile out=temp;   run; 

Comparison with Interface Library Engines

The CONVERT procedure is closely related to the interface library engines BMDP, OSIRIS, and SPSS. (In fact, the CONVERT procedure uses these engines.) For example, the following two sections of code provide identical results:

 filename myfile 'mybmdp.dat';   proc convert bmdp=myfile out=temp;   run;   libname myfile bmdp 'mybmdp.dat';   data temp;      set myfile._first_;   run; 

However, the BMDP, OSIRIS, and SPSS engines provide more extensive capability than PROC CONVERT. For example, PROC CONVERT converts only the first BMDP member in a save file. The BMDP engine, in conjunction with the COPY procedure, copies all members.

See Also

  • "Accessing BMDP, OSIRIS, or SPSS Files in UNIX Environments" on page 125

CPORT Procedure

Writes SAS data sets and catalogs into a special format in a transport file that can be moved between different hosts

UNIX specifics: name and location of transport file

See: CPORT Procedure in Base SAS Procedures Guide

Syntax

PROC CPORT source-type = libref < libref. > member-name < option(s) >;

Note  

This is a simplified version of the CPORT procedure syntax. For the complete syntax and its explanation, see the "CPORT Procedure" in Base SAS Procedures Guide .

source-type

  • identifies the file(s) to export as either a single SAS data set, single SAS catalog, or multiple members of a SAS data library.

libref <libref.> member-name

  • specifies the name of the SAS data set, catalog, or library to be exported.

Details

Note  

Starting in SAS 9.1, you can use the MIGRATE procedure to convert your SAS files. For more information, see "Migrating 32-Bit SAS Files to 64-Bit in UNIX Environments" on page 106.

The CPORT procedure creates a transport file to later be restored ( imported ) by the CIMPORT procedure. The transport file can contain a SAS data set, SAS catalog, or an entire SAS library.

Typically the FILE= option is used to specify the path of the transport file. The value of the FILE= option can be a fileref defined in a FILENAME statement or an environment variable. If this option is omitted, CPORT creates the default file Sascat.dat in the current directory as the transport file.

Examples

In this example, a SAS data library (called Oldlib) that contains multiple SAS data sets is being exported to the file, called transport-file.

 libname oldlib 'SAS-data-library';    filename tranfile 'transport-file';    proc cport lib=oldlib file=tranfile;    run; 

This transport file is then typically moved by binary transfer to a different host where the CIMPORT procedure will be used to restore the SAS data library.

See Also

  • "CIMPORT Procedure" on page 270

  • "Migrating 32-Bit SAS Files to 64-Bit in UNIX Environments" on page 106

  • The MIGRATE Procedure at support.sas.com/rnd/migration

  • Moving and Accessing SAS Files

DATASETS Procedure

Lists, copies, renames, and deletes SAS files, and also manages indexes for and appends SAS data sets in a SAS data library

UNIX specifics: Directory information, CONTENT statement output

See: DATASETS Procedure in Base SAS Procedures Guide

Syntax

PROC DATASETS < option(s) >; CONTENTS < option(s) ;>

Note  

This is a simplified version of the DATASETS procedure syntax. For the complete syntax and its explanation, see the DATASETS procedure in Base SAS Procedures Guide .

CONTENTS option(s)

  • the value for option(s) can be the following:

  • DIRECTORY

    • prints a list of information specific to the UNIX operating environment.

Details

The output from the DATASETS procedure shows you the libref, engine, and physical name that are associated with the library, as well as the names and other properties of the SAS files that are contained in the library. Some of the SAS data library information, such as the filenames and access permissions, that is displayed in the SAS log by the DATASETS procedure depends on the operating environment and the engine. The information generated by the CONTENTS statement also varies according to the device type or access method associated with the data set.

If you specify the DIRECTORY option in the CONTENTS statement, the directory information is displayed in both the log and output windows .

The CONTENTS statement in the DATASETS procedure generates the same Engine/ Host Dependent information as the CONTENTS procedure.

Example

The following SAS code creates two data sets, Grades.sas7bdat and Majors.sas7bdat, and runs PROC DATASETS on Majors.sas7bdat.

 options nodate pageno=1;   libname classes '.';   data classes.grades (label='First Data Set');      input student year state $ grade1 grade2;      label year='Year of Birth';      format grade1 4.1;      datalines;   1000 1980 NC 85 87   1042 1981 MD 92 92   1095 1979 PA 78 72   1187 1980 MA 87 94   ;   data classes.majors(label='Second Data Set');      input student $ year state $ grade1 grade2 major $;      label state='Home State';      format grade1 5.2;      datalines;   1000 1980 NC 84 87 Math   1042 1981 MD 92 92 History   1095 1979 PA 79 73 Physics   1187 1980 MA 87 74 Dance   1204 1981 NC 82 96 French   ;   proc datasets library=classes;      contents data=majors directory;   run; 

The output of this example is shown in Output 15.1. The first page of output from this example SAS code is produced by the DIRECTORY option in the CONTENTS statement. This information also appears on the SAS log. Pages 2 and 3 in this output describe the data set Classes.Majors.sas7bdat and appear only on the SAS output.

Output 15.1: PROC DATASETS Example
start example
 The SAS System                               The DATASETS Procedure                                     Directory                  Libref             CLASSES                  Engine             V9                  Physical Name      /remote/u/yourid                  File Name          /remote/u/yourid                  Inode Number       1058605                  Access Permission  rwxrwxrwx                  Owner Name         yourid                  File Size (bytes)  1024                             Member    File                  #  Name    Type      Size    Last modified                  1  GRADES  DATA      16384    12MAY2003:14:30:19                  2  MAJORS  DATA      16384    12MAY2003:14:31:20                                   The SAS System                               The DATASETS Procedure Data Set Name       CLASSES.MAJORS                   Observations          5 Member Type         DATA                             Variables             6 Engine              V9                               Indexes               0 Created             Monday, May 12, 2003 14:31:20    Observation Length    48 Last Modified       Monday, May 12, 2003 14:31:20    Deleted Observations  0 Protection                                           Compressed            NO Data Set Type                                        Sorted                NO Label               Second Data Set Data Representation HP_UX_64, RS_6000_AIX_64, SOLARIS_64, HP_IA64 Encoding            latin1  Western (ISO)                         Engine/Host Dependent Information      Data Set Page Size          8192      Number of Data Set Pages    1      First Data Page             1      Max Obs per Page            169      Obs in First Data Page      5      Number of Data Set Repairs  0      File Name                   /remote/u/yourid/majors.sas7bdat      Release Created             9.0101B0      Host Created                SunOS      Inode Number                1059264      Access Permission           rw-r--r--      Owner Name                  yourid      File Size (bytes)           16384                                   The SAS System                               The DATASETS Procedure                    Alphabetic List of Variables and Attributes           #     Variable    Type    Len    Format    Label           4     grade1      Num       8     5.2           5     grade2      Num       8           6     major       Char      8           3     state       Char      8            Home State           1     student     Char      8           2     year        Num       8 
end example
 

See Also

  • "CONTENTS Procedure" in Base SAS Procedures Guide

OPTIONS Procedure

Lists the current values of all SAS system options

UNIX specifics: options available only under UNIX

See: OPTIONS Procedure in Base SAS Procedures Guide

Syntax

PROC OPTIONS =< option(s) >

Note  

This is a simplified version of the OPTIONS procedure syntax. For the complete syntax and its explanation, see the OPTIONS procedure in Base SAS Procedures Guide .

option(s)

  • HOST NOHOST

    • displays only host options (HOST) or only portable options (NOHOST). PORTABLE is an alias for NOHOST.

  • RESTRICT

    • displays the system options that have been restricted by your site administrator. These options cannot be changed by the user . For each option that is restricted, the RESTRICT option displays the option's value, scope, and how it was set.

      If your site administrator has not restricted any options, then the following message will appear in the SAS log:

       Your Site Administrator has not restricted any options. 

Details

PROC OPTIONS lists the current values of the system options that are available in all operating environments and, if you specify the HOST option in the PROC OPTIONS statement, it lists those options that are available only under UNIX (host options). The option values displayed by PROC OPTIONS depend on the default values shipped with SAS, the default values specified by your site administrator, the default values in your own configuration file, any changes made in your current session through the System Options window or OPTIONS statement, and possibly, the device on which you are running SAS.

For more information about a specific option, refer to Chapter 17, "System Options under UNIX," on page 311.

See Also

  • For more information about restricted options, see "Order of Precedence for SAS Configuration Files" on page 17.

PMENU Procedure

Defines PMENU facilities for windows created with SAS software

UNIX specifics: ATTR= and COLOR = options in the TEXT statement have no effect; ACCELERATE= and MNEMONIC= options in the ITEM statement are ignored

See: PMENU Procedure in Base SAS Procedures Guide

Syntax

PROC PMENU <CATALOG=< libref .> catalog > <DESC ' entry-description '>;

Note  

This is a simplified version of the PMENU procedure syntax. For the complete syntax and its explanation, see the PMENU procedure in Base SAS Procedures Guide .

CATALOG= <libref.>catalog

  • specifies the catalog in which you want to store PMENU entries. If you omit libref , the PMENU entries are stored in a catalog in the Sasuser data library. If you omit CATALOG=, the entries are stored in the Sasuser.Profile catalog.

DESC 'entry-description'

  • provides a description of the PMENU catalog entries created in the step.

Details

The PMENU procedure defines PMENU facilities for windows created by using the WINDOW statement in Base SAS software, the %WINDOW macro statement, the BUILD procedure of SAS/AF software, or the SAS Component Language (SCL) PMENU function with SAS/AF, SAS/CALC, and SAS/FSP software.

Under UNIX, the following options are ignored:

  • ATTR= and COLOR= options in the TEXT statement. The colors and attributes for text and input fields are controlled by the CPARMS colors specified in the SASCOLOR window. See "Customizing Colors in UNIX Environments" on page 84 for more information.

  • ACCELERATE= and the MNEMONIC= options in the ITEM statement.

PRINTTO Procedure

Defines destinations for SAS procedure output and the SAS log

UNIX specifics: Valid values of file specification

See: PRINTTO Procedure in Base SAS Procedures Guide

Syntax

PROC PRINTTO < option(s) >

Note  

This is a simplified version of the PRINTTO procedure syntax. For the complete syntax and its explanation, see the PRINTTO procedure in Base SAS Procedures Guide .

LOG= file-specification

  • specifies a fully qualified pathname (in quotation marks), an environment variable, a fileref, or a file in the current directory (without extension).

PRINT= file-specification

  • specifies a fully qualified pathname (in quotation marks), an environment variable, a fileref, or a file in the current directory (without extension). If you specify a fileref that is defined with the PRINTER device-type keyword, output is sent directly to the printer.

Examples

The following statements send any SAS log entries that are generated after the RUN statement to the external file that is associated with the fileref MyFile:

 filename myfile '/users/myid/mydir/mylog';   proc printto log=myfile;   run; 

If MyFile has not been defined as a fileref, PROC PRINTTO will create the file MyFile.log in the current directory.

The following statements send any procedure output that is generated after the RUN statement to the file /users/myid/mydir/myout :

 proc printto print='/users/myid/mydir/myout';   run; 

The following statements send the procedure output from the CONTENTS procedure directly to the system printer:

 filename myfile printer;   proc printto print=myfile;   run;   proc contents data=oranges;   run; 

To redirect the SAS log and procedure output to their original default destinations, run PROC PRINTTO without any options:

 proc printto;   run; 

If MYPRINT and MYLOG have not been defined as filerefs , then the following statements send any SAS procedure output to MyPrint.lst and any log output to MyLog.log in the current directory:

 proc printto print=myprint log=mylog;   run; 

If filerefs MyPrint and MyLog had been defined, the output would have gone to the files associated with these filerefs.

See Also

  • Chapter 6, "Printing and Routing Output," on page 153

SORT Procedure

Sorts observations in a SAS data set by one or more variables, then stores the resulting sorted observations in a new SAS data set or replaces the original data set

UNIX specifics: sort utilities available

See: SORT Procedure in Base SAS Procedures Guide

Syntax

PROC SORT < option(s) >< collating-sequence-option >

Note  

This is a simplified version of the SORT procedure syntax. For the complete syntax and its explanation, see the SORT procedure in Base SAS Procedures Guide .

option(s)

  • SORTSIZE = memory-specification

    • specifies the maximum amount of memory available to the SORT procedure. For further explanation of the SORTSIZE= option, see the following Details section.

  • TAGSORT

    • stores only the BY variables and the observation number in temporary files. For further explanation of the TAGSORT option, see the following Details section.

Note  

The TAGSORT option is ignored when used with a host sort.

Details

The SORT procedure sorts observations in a SAS data set by one or more character or numeric variables, either replacing the original data set or creating a new, sorted data set. By default under UNIX, the SORT procedure uses the ASCII collating sequence.

The SORT procedure uses the sort utility specified by the SORTPGM system option. Sorting can be done by SAS or the syncsort utility. You can use all of the options available to the SAS sort utility, such as the SORTSEQ and NODUPKEY options. In some situations, you can improve your performance by using the NOEQUALS option. If you specify an option that is not supported by the host sort, then the SAS sort will be used instead. For more information about all of the options that are available, see the SORT procedure in Base SAS Procedures Guide .

SORTSIZE= Option

Limiting the Amount of Memory Available to PROC SORT

You can use the SORTSIZE= option in the PROC SORT statement to limit the amount of memory available to the SORT procedure. This option can reduce the amount of swapping SAS must do to sort the data set.

Note  

If you do not specify the SORTSIZE= option, PROC SORT uses the value of the SORTSIZE system option. The SORTSIZE system option can be defined on the command line or in the SAS configuration file.

Syntax of the SORTSIZE= Option

The syntax of the SORTSIZE= option is as follows :

SORTSIZE= memory-specification

  • where memory-specification can be one of the following:

    n

    specifies the amount of memory in bytes.

    n K

    specifies the amount of memory in 1-kilobyte multiples .

    n M

    specifies the amount of memory in 1-megabyte multiples.

    n G

    specifies the amount of memory in 1-gigabyte multiples.

Default Value of the SORTSIZE= Option

The default SAS configuration file sets this option based on the value of the SORTSIZE system option. The default for the SORTSIZE system option is MAX; however, the value of MAX depends on your operating system. To view the value of MAX for your operating environment, run the following code:

 proc options option=sortsize;   run; 

You can override the default value of the SORTSIZE system option by

  • specifying a different SORTSIZE= value in the PROC SORT statement

  • submitting an OPTIONS statement that sets the SORTSIZE system option to a new value

  • setting the SORTSIZE system option on the command line during the invocation of SAS.

Improving Performance with the SORTSIZE= Option

In general, you should set the SORTSIZE= option no larger than the amount of physical memory available to the SAS process. If the SORTSIZE= value is larger than the amount of available memory, then the operating system will be forced to page excessively. If the SORTSIZE= value is too small, then not all of the sorting can be done in memory, which also results in more disk I/O.

When the SORTSIZE= value is large enough to sort the entire data set in memory, you can achieve optimal sort performance. If the entire data set to be sorted will not fit in memory, SAS creates a temporary utility file to store the data. In this case, SAS uses a sort algorithm that is tuned to sort using disk space instead of memory.

Note  

You can also use the SORTSIZE system option, which has the same effect as the SORTSIZE= option in the PROC SORT statement.

TAGSORT Option

The TAGSORT option in the PROC SORT statement is useful when there might not be enough disk space to sort a large SAS data set. When you specify the TAGSORT option, only the sort keys (that is, the variables specified in the BY statement) and the observation number for each observation are stored in the temporary utility files. The sort keys, together with the observation number, are referred to as tags . At the completion of the sorting process, the tags are used to retrieve the records from the input data set in sorted order. Thus, in cases where the total number of bytes of the sort keys is small compared with the length of the record, temporary disk use is reduced considerably.

You must have enough disk space to hold an additional copy of the data set (the output data set) and the utility file that contains the tags. By default, this utility file is stored in the Work library. If this directory is too small, you can change this directory using the WORK system option. For more information, see "WORK System Option" on page 381.

Note  

If you are using a host sort utility, then you can use the SORTDEV system option to change the location of your temporary files. For more information, see "SORTDEV System Option" on page 366.

Note that while using the TAGSORT option may reduce temporary disk use, the processing time could be higher. However, on computers with limited available disk space, the TAGSORT option might enable sorts to be performed in situations where they would otherwise not be possible.

Disk Space Considerations for PROC SORT

You need to consider the following items when determining the amount of the disk space needed to run PROC SORT:

input SAS data set

  • PROC sort uses the input data set specified by the DATA= option.

output SAS data set

  • PROC SORT stores the output data set in the location specified by the OUT= option. If the OUT= option is not specified, PROC SORT stores the output SAS data set in the Work library.

utility file stored in the Work library

  • This utility file is approximately the size of the input SAS data set.

temporary output SAS data set

  • During the sort, PROC SORT creates its output in the directory specified in the OUT= option (or directory of the input data set if the OUT= option is not specified). The temporary data set has the same filename as the original data set, except it has an extension of .lck. After the sort completes successfully, the original data set is deleted, and the temporary data set is renamed to match the original data set. Therefore, you need to have enough available space in the target directory to hold two copies of the data set.

You can reduce the amount of disk space needed by specifying the OVERWRITE option on the PROC SORT statement. When you specify this option, SAS replaces the input data set with the sorted data set. This option should only be used with a data set that is backed up or with a data set that you can reconstruct. For more information, see the SORT procedure in Base SAS Procedures Guide .

Performance Tuning for PROC SORT

How SAS Determines the Amount of Memory to Use

Generally, SAS uses the memory value specified in the REALMEMSIZE system option. However, this value is limited by the SORTSIZE value (which is limited by the value of the MEMSIZE system option). If SORTSIZE is set to the default value of MAX, then PROC SORT uses the REALMEMSIZE value to determine the amount of memory to use. For information about setting the REALMEMSIZE system option, see "Guidelines for Setting the REALMEMSIZE System Option" on page 285.

Note  

If you receive an out of memory error, then increase the value of MEMSIZE. For more information, see "MEMSIZE System Option" on page 344.

Guidelines for Setting the REALMEMSIZE System Option

Since PROC SORT uses the REALMEMSIZE system option to determine how much memory to use, it is important that the REALMEMSIZE value reflects the amount of memory that is available on your system. If REALMEMSIZE is set too high, then PROC SORT might use more memory than is actually available. Using too much memory will cause excessive paging and adversely impact system performance.

In general, REALMEMSIZE should be set to the amount of physical memory (not including swap space) that you expect to be available to SAS at run time. A good starting value is the amount of physical memory installed on the computer less the amount that is being used by running applications and the operating system. You can experiment with the REALMEMSIZE value until you reach optimum performance for your environment. In some cases, optimum performance can be achieved with a very low REALMEMSIZE value. A low value could cause SAS to use less memory and leave more memory for the operating system to perform I/O caching.

For more information, see "REALMEMSIZE System Option" on page 353.

Creating Your Own Collating Sequences

If you want to provide your own collating sequences or change a collating sequence provided for you, use the TRANTAB procedure to create or modify translation tables. For more information about the TRANTAB procedure, see SAS National Language Support (NLS): User's Guide . When you create your own translation tables, they are stored in your Sasuser.Profile catalog, and they override any translation tables by the same name that are stored in the Host catalog.

Note  

System managers can modify the Host catalog by copying newly created tables from the Profile catalog to the Host catalog. Then all users can access the new or modified translation table.

If you are using the SAS windowing environment and want to see the names of the collating sequences that are stored in the Host catalog, issue the following command from any window:

 catalog sashelp.host 

If you are not using the SAS windowing environment, then issue the following statements to generate a list of the contents of the Host catalog:

 proc catalog catalog=sashelp.host;   contents;   run; 

Entries of type TRANTAB are the collating sequences.

To see the contents of a particular translation table, use the following statements:

 proc trantab table=  table-name  ;   list;   run; 

The contents of collating sequences are displayed in the SAS log.

Specifying the Host Sort Utility

Introduction to using the Host Sort

UNIX has one host sort utility, syncsort . You can use this sorting application as an alternative sorting algorithm to the SAS sort. SAS determines which sort to use by the values that are set for the SORTNAME, SORTPGM, SORTCUT, and SORTCUTP system options.

Setting the Host Sort Utility as the Sort Algorithm

To specify a host sort utility as the sort algorithm, complete the following steps:

  1. Specify the name of the host utility ( syncsort ) in the SORTNAME system option.

  2. Set the SORTPGM system option to tell SAS when to use the host sort utility.

    • If SORTPGM=HOST, then SAS will always use the host sort utility.

    • If SORTPGM=BEST, then SAS chooses the best sorting method (either the SAS sort or the host sort) for the situation. For more information, see "Sorting Based on Size or Observations" on page 286.

Sorting Based on Size or Observations

The sort routine that SAS uses can be based on either the number of observations in a data set or on the size of the data set. When the SORTPGM system option is set to BEST, SAS uses the first available and pertinent sorting algorithm based on this order of precedence:

  • host sort utility

  • SAS sort utility

SAS looks at the values for the SORTCUT and SORTCUTP system options to determine which sort to use.

The SORTCUT option specifies the number of observations above which the host sort utility is used instead of the SAS sort. The SORTCUTP option specifies the number of bytes in the data set above which the host sort utility is used.

If SORTCUT and SORTCUTP are set to zero, SAS uses the SAS sort routine. If you specify both options and either condition is met, SAS uses the host sort utility.

When the following OPTIONS statement is in effect, the host sort utility is used when the number of observations is 501 or greater:

 options sortpgm=best sortcut=500; 

In this example, the host sort utility is used when the size of the data set is greater than 40M:

 options sortpgm=best sortcutp=40M; 

For more information about these sort options, see "SORTCUT System Option" on page 364, "SORTCUTP System Option" on page 365, and "SORTPGM System Option" on page 368.

Changing the Location of Temporary Files Used by the Host Sort Utility

By default, the host sort utilities use the location that is specified in the -WORK option for temporary files. To change the location of these temporary files, specify a location by using the SORTDEV system option. Here is an example:

 options sortdev=''/tmp/host''; 

For more information, see "SORTDEV System Option" on page 366.

Passing Options to the Host Sort Utility

To specify options for the sort utility, use the SORTANOM system option. For a list of valid options, see "SORTANOM System Option" on page 363.

Passing Parameters to the Host Sort Utility

To pass parameters to the sort utility, use the SORTPARM system option. The parameters that you can specify depend on the host sort utility. For more information, see "SORTPARM System Option" on page 367.

Specifying the SORTSEQ= Option with a Host Sort Utility

Caution  

If you are using a host sort utility to sort your data, then specifying the SORTSEQ= option might corrupt the character BY variables if the sort sequence translation table and its inverse are not one-to-one mappings. In other words for the sort to work, the translation table must map each character to a unique weight, and the inverse table must map each weight to a unique character variable.

If your translation tables do not map one-to-one, then you can use one of the following methods to perform your sort:

  • create a translation table that maps one-to-one. Once you create a translation table that maps one-to-one, you can easily create a corresponding inverse table using the TRANTAB procedure. If your table is not mapped one-to-one, then you will receive the following note in the SAS log when you try to create an inverse table:

     NOTE:  This table cannot be mapped one to one. 

    For more information, see TRANTAB Procedure in SAS National Language Support (NLS): User's Guide .

  • use the SAS sort. You can specify the SAS sort using the SORTPGM system option. For more information, see "SORTPGM System Option" on page 368.

  • specify the collation order options of your host sort utility. See the documentation for your host sort utility for more information.

  • create a view with a dummy BY variable. For an example, see "Example: Creating a View with a Dummy BY Variable" on page 287.

Note  

After using one of these methods, you might need to perform subsequent BY processing using either the NOTSORTED option or the NOBYSORTED system option. For more information about the NOTSORTED option, see BY Statement in SAS Language Reference: Dictionary . For more information about the NOBYSORTED system option, see BYSORTED System Option in SAS Language Reference: Dictionary .

Example: Creating a View with a Dummy BY Variable

The following code is an example of creating a view using a dummy BY variable:

 options no date nostimer ls-78 ps-60;   options sortpgm=host msglevel=i;   data one;      input name $ age;      datalines;      anne 35      ALBERT 10      JUAN 90      janet 5      bridget 23      BRIAN 45      ;   data oneview / view=oneview;      set one;      name1=upcase(name);   run;   proc sort data=oneview out=final(drop=name1);      by name1;   run;   proc print data=final;   run; 

The output is the following:

Output 15.2: Creating a View with a Dummy BY Variable
start example
 The SAS System Obs        name       age  1         ALBERT      10  2         anne        35  3         BRIAN       45  4         bridget     23  5         janet        5  6         JUAN        90 
end example
 

See Also

  • "TRANTAB Procedure" in SAS National Language Support (NLS): User's Guide

  • "MEMSIZE System Option" on page 344

  • "REALMEMSIZE System Option" on page 353

  • "SORTANOM System Option" on page 363

  • "SORTCUT System Option" on page 364

  • "SORTCUTP System Option" on page 365

  • "SORTDEV System Option" on page 366

  • "SORTNAME System Option" on page 367

  • "SORTPARM System Option" on page 367

  • "SORTPGM System Option" on page 368

  • "SORTSIZE System Option" on page 368

  • "UTILLOC System Option" in SAS Language Reference: Dictionary




SAS 9.1 Companion for UNIX Environments
SAS 9.1 Companion For Unix Enivronments
ISBN: 1590472101
EAN: 2147483647
Year: 2004
Pages: 185
Authors: SAS Institute

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net