Requirements: BY statement
Reminder: You can use the ATTRIB, FORMAT, LABEL, and WHERE statements. See Chapter 3, Statements with the Same Function in Multiple Procedures, on page 57 for details. You can also use any global statements. See Global Statements on page 18 for a list.
See: SORT Procedure in the documentation for your operating environment.
PROC SORT < collating-sequence-option >< other option(s) >;
BY <DESCENDING> variable-1 < <DESCENDING> variable-n >;
PROC SORT < collating-sequence-option >< other option(s) >;
To do this | Use this option | |
---|---|---|
Specify the collating sequence | ||
Specify ASCII | ASCII | |
Specify EBCDIC | EBCDIC | |
Specify Danish | DANISH | |
Specify Finnish | FINNISH | |
Specify Norwegian | NORWEGIAN | |
Specify Swedish | SWEDISH | |
Specify a customized sequence | NATIONAL | |
Specify any of these collating sequences: ASCII, EBCDIC, DANISH, FINNISH, ITALIAN, NORWEGIAN, SPANISH, SWEDISH | SORTSEQ= | |
Specify the input data set | DATA= | |
Sort a SAS data set without changing the created and modified dates | DATECOPY | |
Create output data sets | ||
Specifies the output data set | OUT= | |
Specifies the output data set to which duplicate observations are written | DUPOUT= | |
Specify the output order | ||
Reverse the collation order for character variables | REVERSE | |
Maintain relative order within BY groups | EQUALS | |
Do not maintain relative order within BY groups | NOEQUALS | |
Eliminate duplicate observations | ||
Delete observations with duplicate BY values | NODUPKEY | |
Delete duplicate observations | NODUPRECS | |
Delete the input data set before the replacement output data set is populated | OVERWRITE | |
Specify the available memory | SORTSIZE= | |
Force redundant sorting | FORCE | |
Reduce temporary disk usage | TAGSORT | |
Override SAS system option THREADS | ||
Enable multi-threaded sorting | THREADS | |
Prevent multi-threaded sorting | NOTHREADS |
Options can include one collating-sequence-option and multiple other options . The order of the two types of options does not matter and both types are not necessary in the same PROC SORT step.
Collating-Sequence-Options
Operating Environment Information: For information about behavior specific to your operating environment for the DANISH, FINNISH, NORWEGIAN, or SWEDISH collating-sequence-option , see the SAS documentation for your operating environment.
Restriction: You can specify only one collating-sequence-option in a PROC SORT step.
ASCII
sorts character variables using the ASCII collating sequence. You need this option only when you sort by ASCII on a system where EBCDIC is the native collating sequence.
See also: Sorting Orders for Character Variables on page 1028
DANISH
NORWEGIAN
sorts characters according to the Danish and Norwegian national standard.
The Danish and Norwegian collating sequence is shown in Figure 44.1 on page 1022.
Figure 44.1: National Collating Sequences of Alphanumeric Characters
EBCDIC
sorts character variables using the EBCDIC collating sequence. You need this option only when you sort by EBCDIC on a system where ASCII is the native collating sequence.
See also: Sorting Orders for Character Variables on page 1028
FINNISH
SWEDISH
sorts characters according to the Finnish and Swedish national standard. The Finnish and Swedish collating sequence is shown in Figure 44.1 on page 1022.
NATIONAL
sorts character variables using an alternate collating sequence, as defined by your installation, to reflect a country s National Use Differences. To use this option, your site must have a customized national sort sequence defined. Check with the SAS Installation Representative at your site to determine if a customized national sort sequence is available.
NORWEGIAN
See DANISH.
SORTSEQ= collating-sequence
specifies the collating sequence. The value of collating-sequence can be any one of the collating-sequence-options in the PROC SORT statement, or the value can be the name of a translation table, either a default translation table or one that you have created in the TRANTAB procedure. For an example of using PROC TRANTAB and PROC SORT with SORTSEQ=, see Using Different Translation Tables for Sorting in SAS National Language Support (NLS): User s Guide . The available translation tables are
Danish
Finnish
Italian
Norwegian
Spanish
Swedish
The following figure shows how the alphanumeric characters in each language will sort.
CAUTION:
If you use a host sort utility to sort your data, then specifying the SORTSEQ= option might corrupt the character BY variables. For more information, see the PROC SORT documentation for your operating environment.
SWEDISH
See FINNISH.
Other Options
DATA= SAS-data-set
identifies the input SAS data set.
Main discussion: Input Data Sets on page 19
DATECOPY
copies the SAS internal date and time when the SAS data set was created and the date and time when it was last modified prior to the sort to the resulting sorted data set. Note that the operating environment date and time are not preserved.
Restriction: DATECOPY can be used only when the resulting data set uses the V8 or V9 engine.
Tip: You can alter the file creation date and time with the DTC= option in the MODIFY statement in PROC DATASETS. For more information, see MODIFY Statement on page 344.
DUPOUT= SAS-data-set
specifies the output data set to which duplicate observations are written.
EQUALS NOEQUALS
specifies the order of the observations in the output data set. For observations with identical BY-variable values, EQUALS maintains the relative order of the observations within the input data set in the output data set. NOEQUALS does not necessarily preserve this order in the output data set.
Default: EQUALS
Interaction: When you use NODUPRECS or NODUPKEY to remove observations in the output data set, the choice of EQUALS or NOEQUALS can affect which observations are removed.
Interaction: The EQUALS NOEQUALS procedure option overrides the default sort stability behavior that is established with the SORTEQUALS NOSORTEQUALS system option.
Interaction: The EQUALS option is supported by the multi-threaded sort. However, I/O performance may be reduced when using the EQUALS option with the multi-threaded sort because partitioned data sets will be processed as if they are non-partitioned data sets.
Interaction: The NOEQUALS option is supported by the multi-threaded sort. The order of observations within BY groups that are returned by the multi-threaded sort might not be consistent between runs. Therefore, using the NOEQUALS option can produce inconsistent results in your output data sets.
Tip: Using NOEQUALS can save CPU time and memory.
FORCE
sorts and replaces an indexed data set when the OUT= option is not specified. Without the FORCE option, PROC SORT does not sort and replace an indexed data set because sorting destroys user-created indexes for the data set. When you specify FORCE, PROC SORT sorts and replaces the data set and destroys all user-created indexes for the data set. Indexes that were created or required by integrity constraints are preserved.
Tip: PROC SORT checks for the sort information before it sorts a data set so that data is not re-sorted unnecessarily. By default, PROC SORT does not sort a data set if the sort information matches the requested sort. You can use FORCE to override this behavior. You might need to use FORCE if SAS cannot verify the sort specification in the data set option SORTEDBY=. For more information about SORTEDBY=, see the chapter on SAS data set options in SAS Language Reference: Dictionary .
Restriction: If you use PROC SORT with the FORCE option on data sets that were created with the Version 5 compatibility engine or with a sequential engine such as a tape format engine, you must also specify the OUT= option.
NODUPKEY
checks for and eliminates observations with duplicate BY values. If you specify this option, then PROC SORT compares all BY values for each observation to those for the previous observation that is written to the output data set. If an exact match is found, then the observation is not written to the output data set.
Operating Environment Information: If you use the VMS operating environment sort, then the observation that is written to the output data set is not always the first observation of the BY group .
Note: See NODUPRECS for information about eliminating duplicate observations.
Interaction: When you are removing observations with duplicate BY values with NODUPKEY, the choice of EQUALS or NOEQUALS can have an effect on which observations are removed.
Tip: Use the EQUALS option with the NODUPKEY option for consistent results in your output data sets.
Featured in: Example 4 on page 1037
NODUPRECS
checks for and eliminates duplicate observations. If you specify this option, then PROC SORT compares all variable values for each observation to those for the previous observation that was written to the output data set. If an exact match is found, then the observation is not written to the output data set.
Note: See NODUPKEY for information about eliminating observations with duplicate BY values.
Alias : NODUP
Interaction: When you are removing consecutive duplicate observations in the output data set with NODUPRECS, the choice of EQUALS or NOEQUALS can have an effect on which observations are removed.
Tip: Use the EQUALS option with the NODUPRECS option for consistent results in your output data sets.
Interaction: The action of NODUPRECS is directly related to the setting of the SORTDUP= system option. When SORTDUP= is set to LOGICAL, NODUPRECS removes duplicate observations based on the examination of the variables that remain after a DROP or KEEP operation on the input data set. Setting SORTDUP=LOGICAL increases the number of duplicate observations that are removed, because it eliminates variables before observation comparisons take place. Also, setting SORTDUP=LOGICAL can improve performance, because dropping variables before sorting reduces the amount of memory required to perform the sort. When SORTDUP= is set to PHYSICAL, NODUPRECS examines all variables in the data set, regardless of whether they have been kept or dropped. For more information about SORTDUP=, see the chapter on SAS system options in SAS Language Reference: Dictionary .
Tip: Because NODUPRECS checks only consecutive observations, some nonconsecutive duplicate observations might remain in the output data set. You can remove all duplicates with this option by sorting on all variables.
NOEQUALS
See EQUALS NOEQUALS.
NOTHREADS
See THREADSNOTHREADS.
OUT= SAS-data-set
names the output data set. If SAS-data-set does not exist, then PROC SORT creates it.
CAUTION:
Use care when you use PROC SORT without OUT=. Without OUT=, data could be lost if your system failed during execution of PROC SORT.
Default: Without OUT=, PROC SORT overwrites the original data set.
Tip : You can use data set options with OUT=.
Featured in: Example 1 on page 1031
OVERWRITE
enables the input data set to be deleted before the replacement output data set is populated with observations.
Restriction: The OVERWRITE option has no effect if you also specify the TAGSORT option. You cannot overwrite the input data set because TAGSORT must reread the input data set while populating the output data set.
Restriction: The OVERWRITE option is supported by the SAS sort and SAS multi-threaded sort only. The option has no effect if you are using a host sort.
Tip: Using the OVERWRITE option can reduce disk space requirements.
CAUTION:
Use the OVERWRITE option only with a data set that is backed up or with a data set that you can reconstruct. Because the input data set is deleted, data will be lost if a failure occurs while the output data set is being written.
REVERSE
sorts character variables using a collating sequence that is reversed from the normal collating sequence.
Operating Environment Information: For information about the normal collating sequence for your operating environment, see EBCDIC Order on page 1028, ASCII Order on page 1028, and the SAS documentation for your operating environment.
Interaction: Using REVERSE with the DESCENDING option in the BY statement restores the sequence to the normal order.
Restriction: The REVERSE option cannot be used with a collating-sequence-option . You can specify either a collating-sequence-option or the REVERSE option in a PROC SORT, but you cannot specify both.
See also: The DESCENDING option in the BY statement. The difference is that the DESCENDING option can be used with both character and numeric variables.
SORTSIZE= memory-specification
specifies the maximum amount of memory that is available to PROC SORT. Valid values for memory-specification are as follows :
MAX
specifies that all available memory can be used.
n
specifies the amount of memory in bytes, where n is a real number.
n K
specifies the amount of memory in kilobytes, where n is a real number.
n M
specifies the amount of memory in megabytes, where n is a real number.
n G
specifies the amount of memory in gigabytes, where n is a real number.
Specifying the SORTSIZE= option in the PROC SORT statement temporarily overrides the SAS system option SORTSIZE=. For more information about SORTSIZE=, see the chapter on SAS system options in SAS Language Reference: Dictionary .
Operating Environment Information: Some system sort utilities may treat this option differently. Refer to the SAS documentation for your operating environment.
Default: the value of the SAS system option SORTSIZE=
Tip: Setting the SORTSIZE= option in the PROC SORT statement to MAX or 0, or not setting the SORTSIZE= option, limits the PROC SORT to the available physical memory based on the settings of the SAS system options that relate to memory and information regarding available memory that is gathered from the operating environment.
Operating Environment Information: For information about the SAS system options that relate to memory, see the SAS documentation for your operating environment.
TAGSORT
stores only the BY variables and the observation numbers in temporary files. The BY variables and the observation numbers are called tags . At the completion of the sorting process, PROC SORT uses the tags to retrieve records from the input data set in sorted order.
Restriction: The TAGSORT option is not compatible with the OVERWRITE option.
Interaction: The TAGSORT option is not supported by the multi-threaded sort.
Tip: When the total length of BY variables is small compared with the record length, TAGSORT reduces temporary disk usage considerably. However, processing time may be much higher.
THREADS NOTHREADS
enables or prevents the activation of multi-threaded sorting.
Default: the value of the SAS system option THREADS
Interaction: THREADSNOTHREADS overrides the value of the SAS system option THREADS. For more information about THREADS, see the chapter on SAS system options in SAS Language Reference: Dictionary .
Interaction: The THREADS option is honored if the value of the SAS system option CPUCOUNT is greater than 1.
Interaction: The TAGSORT option is not supported by the multi-threaded sort.
Note: If THREADS is specified either as a SAS system option or in PROC SORT, and another program has the input SAS data set open for reading, writing, or updating using the SPDE engine, then the procedure might fail. In this case, PROC SORT stops processing and writes a message to the SAS log.
See also: Multi-threaded Sorting on page 1027
Specifies the sorting variables
Featured in: Example 1 on page 1031, Example 2 on page 1033, and Example 4 on page 1037
BY <DESCENDING> variable-1 < <DESCENDING> variable-n >;
variable
specifies the variable by which PROC SORT sorts the observations. PROC SORT first arranges the data set by the values in ascending order, by default, of the first BY variable. PROC SORT then arranges any observations that have the same value of the first BY variable by the values of the second BY variable in ascending order. This sorting continues for every specified BY variable.
DESCENDING
reverses the sort order for the variable that immediately follows in the statement so that observations are sorted from the largest value to the smallest value.
Featured in: Example 2 on page 1033