This section
Two options for the PROC SORT statement are available under
The PROC SORT statement supports the SORTSIZE= option, which limits the amount of memory available for PROC SORT to use.
If you do not use the SORTSIZE option in the PROC SORT statement, PROC SORT uses the value of the SORTSIZE system option. If the SORTSIZE system option is not set, PROC SORT uses the amount of memory specified by the REALMEMSIZE system option. If PROC SORT needs more memory than you specify, it creates a temporary utility file in your Saswork directory to complete the sort.
The default value of this option is 64 megabytes (MB).
The TAGSORT option is useful in single-threaded situations where there may not be enough disk space to sort a large SAS data set. The TAGSORT option is not supported for multi-threaded sorts.
When you specify the TAGSORT option, only sort keys (that is, the
Where the physical sort occurs for a given data set depends on how you reference the data set
When you sort a SAS data set, SAS creates a temporary utility file. If the sort uses multiple threads, you can specify the location of the utility file by using the UTILLOC system option. The default location for utility files is the Work data library. If two or more locations are specified for the UTILLOC option, the second location is used as the location for the utility file. For sorts that use a single thread, the temporary utility file is opened in the Work data library if there is not enough memory to hold the data set during the sort. The utility file has a .sas7butl file extension. Before you sort, be sure that your Work data library has room for this temporary utility file.
If you specify the OVERWRITE option on the PROC SORT statement, SAS
If you do not specify the OVERWRITE option on the PROC SORT statement, a second file that has a .sas7butl file extension is created. If the sort completes successfully, this file is
Use the following rules to determine where the .sas7butl file and the resulting sorted data set are created:
If you omit the OUT= option in the PROC SORT statement, the data set is sorted on the drive and in the directory or subdirectory where it is located. For example, if you submit the following statements (note the
libname mylib 'c:\sas\mydata'; proc sort data=mylib.report; by name; run;
Similarly, if you specify a
If you use the OUT= option in the PROC SORT statement, the .sas7butl file is created in the directory associated with the libref used in the OUT= option. If you use a one-level name (that is, no libref), the .sas7butl file is created in the Work data library. For example, in the following SAS program, the first sort occurs in the Saswork subdirectory, while the second occurs on the F: drive in the JANDATA directory:
proc sort data=report out=newrpt; by name; run; libname january 'f:\jandata'; proc sort data=report out=january.newrpt; by name; run;
In single-threaded environments, you always need free disk space that equals three to four times the data set size. For example, if your data set takes up 1MB of disk space, you need 3 to 4MB of disk space to complete the sort.
In multi-threaded environments, if you use the OVERWRITE option on the PROC SORT statement, you need space equal to the data set size. If you do not specify the OVERWRITE option, the space you need is equal to two times the data set size. For more information about the OVERWRITE option, see the SORT procedure in Base SAS Procedures Guide .
To estimate the amount of disk space that is needed for a SAS data set:
create a
run the CONTENTS procedure using the dummy data set
determine the data set size by performing simple math using information from the CONTENTS procedure output.
For example, for a data set that has one character variable and four numeric variables, you would submit the following statements:
data oranges; input variety $ flavor texture looks; total=flavor+texture+looks; datalines; navel 9 8 6 ; proc contents data=oranges; title 'Example for Calculating Data Set Size'; run;
These statements generate the output shown in the following output:
Output 7.1: Example for Calculating Data Set Size with PROC CONTENTS
|
|
Example for Calculating Data Set Size 1
19:39 Wednesday, February 12, 2003
The CONTENTS Procedure
Data Set Name WORK.ORANGES Observations 1
Member Type DATA Variables 5
Engine V9 Indexes 0
Created Wednesday, February Observation Length 40
12, 2003 07:41:04
Last Modified Wednesday, February Deleted Observations 0
12, 2003 07:41:04
Protection Compressed NO
Data Set Type Sorted NO
Label
Data Representation WINDOWS_32
Encoding wlatin1 Western (Windows)
Engine/Host Dependent Information
Data Set Page Size 4096
Number of Data Set Pages 1
First Data Page 1
Max Obs per Page 101
Obs in First Data Page 1
Number of Data Set Repairs 0
File Name C:\TEMP\SAS Temporary Files\_TD246\oranges.sas7bdat
Release Created 9.0101B0
Host Created WIN_NT
Alphabetic List of Variables and Attributes
# Variable Type Len
2 flavor Num 8
4 looks Num 8
3 texture Num 8
5 total um 8
1 variety Char 8
|
|
The size of the resulting data set depends on the data set page size and the number of observations. The following formula can be used to estimate the data set size:
number of data pages = 1 + (floor(number of obs / Max Obs per Page ))
size = 1024 + ( Data Set Page Size * number of data pages)
(
floor
represents a function that rounds the value down to the
Taking the information shown in Output 7.1, you can calculate the size of the example data set:
number of data pages = 1 + (floor(1/101))
size = 1024 + (4096 * 1) = 5120
Thus, the example data set uses 5,120 bytes of storage space.
If you are running a SAS job using SAS interactively and the job generates
Minimizing the LOG window also might make your job run faster,