Advanced Performance Tuning Methods


Advanced Performance Tuning Methods

This section presents some advanced performance topics, such as improving the performance of the SORT procedure and calculating data set size . Use these methods only if you are an experienced SAS user and you are familiar with the way SAS is configured on your machine.

Improving Performance of the SORT Procedure

Two options for the PROC SORT statement are available under Windows , the SORTSIZE= and TAGSORT options. These two options control the amount of memory the SORT procedure uses during a sort and are discussed in the next two sections. Also included is a discussion of determining where the sorting process occurs for a given data set and determining how much disk space you need for the sort. For more information about the SORT procedure, see .

SORTSIZE Option

The PROC SORT statement supports the SORTSIZE= option, which limits the amount of memory available for PROC SORT to use.

If you do not use the SORTSIZE option in the PROC SORT statement, PROC SORT uses the value of the SORTSIZE system option. If the SORTSIZE system option is not set, PROC SORT uses the amount of memory specified by the REALMEMSIZE system option. If PROC SORT needs more memory than you specify, it creates a temporary utility file in your Saswork directory to complete the sort.

The default value of this option is 64 megabytes (MB).

TAGSORT Option

The TAGSORT option is useful in single-threaded situations where there may not be enough disk space to sort a large SAS data set. The TAGSORT option is not supported for multi-threaded sorts.

When you specify the TAGSORT option, only sort keys (that is, the variables specified in the BY statement) and the observation number for each observation are stored in the temporary files. The sort keys, together with the observation number, are referred to as tags. At the completion of the sorting process, the tags are used to retrieve the records from the input data set in sorted order. Thus, in cases where the total number of bytes of the sort keys is small compared with the length of the record, temporary disk use is reduced considerably. However, you should have enough disk space to hold another copy of the data (the output data set) or two copies of the tags, whichever is greater. Note that although using the TAGSORT option can reduce temporary disk use, the processing time may be much higher.

Choosing a Location for the Sorted File

Where the physical sort occurs for a given data set depends on how you reference the data set name and whether you use the OUT= option in the PROC SORT statement. You might want to know where the sort occurs if you think there might not be enough disk space available for the sort.

When you sort a SAS data set, SAS creates a temporary utility file. If the sort uses multiple threads, you can specify the location of the utility file by using the UTILLOC system option. The default location for utility files is the Work data library. If two or more locations are specified for the UTILLOC option, the second location is used as the location for the utility file. For sorts that use a single thread, the temporary utility file is opened in the Work data library if there is not enough memory to hold the data set during the sort. The utility file has a .sas7butl file extension. Before you sort, be sure that your Work data library has room for this temporary utility file.

If you specify the OVERWRITE option on the PROC SORT statement, SAS replaces the input data set with the sorted data set.

If you do not specify the OVERWRITE option on the PROC SORT statement, a second file that has a .sas7butl file extension is created. If the sort completes successfully, this file is renamed to the data set name of the file being sorted (with a .sas7bdat file extension). The original data set is deleted after the sort is complete. Before you sort a data set, be sure that you have space for this .sas7butl file.

Use the following rules to determine where the .sas7butl file and the resulting sorted data set are created:

  • If you omit the OUT= option in the PROC SORT statement, the data set is sorted on the drive and in the directory or subdirectory where it is located. For example, if you submit the following statements (note the two-level data set name), the .sas7butl file is created on the C: drive in the MYDATA subdirectory:

     libname mylib 'c:\sas\mydata';  proc sort data=mylib.report;     by name;  run; 

    Similarly, if you specify a one-level data set name, the .sas7butl file is created in your Work data library.

  • If you use the OUT= option in the PROC SORT statement, the .sas7butl file is created in the directory associated with the libref used in the OUT= option. If you use a one-level name (that is, no libref), the .sas7butl file is created in the Work data library. For example, in the following SAS program, the first sort occurs in the Saswork subdirectory, while the second occurs on the F: drive in the JANDATA directory:

     proc sort data=report out=newrpt;     by name;  run;  libname january 'f:\jandata';  proc sort data=report out=january.newrpt;     by name;  run; 

Calculating Data Set Size

In single-threaded environments, you always need free disk space that equals three to four times the data set size. For example, if your data set takes up 1MB of disk space, you need 3 to 4MB of disk space to complete the sort.

In multi-threaded environments, if you use the OVERWRITE option on the PROC SORT statement, you need space equal to the data set size. If you do not specify the OVERWRITE option, the space you need is equal to two times the data set size. For more information about the OVERWRITE option, see the SORT procedure in Base SAS Procedures Guide .

To estimate the amount of disk space that is needed for a SAS data set:

  1. create a dummy SAS data set that contains one observation and the variables you need

  2. run the CONTENTS procedure using the dummy data set

  3. determine the data set size by performing simple math using information from the CONTENTS procedure output.

For example, for a data set that has one character variable and four numeric variables, you would submit the following statements:

 data oranges;    input variety $ flavor texture looks;    total=flavor+texture+looks;    datalines;  navel 9 8 6  ;  proc contents data=oranges;    title 'Example for Calculating Data Set Size';  run; 

These statements generate the output shown in the following output:

Output 7.1: Example for Calculating Data Set Size with PROC CONTENTS
start example
 Example for Calculating Data Set Size                    1                                               19:39 Wednesday, February 12, 2003                               The CONTENTS Procedure  Data Set Name         WORK.ORANGES                     Observations          1  Member Type           DATA                             Variables             5  Engine                V9                               Indexes               0  Created               Wednesday, February              Observation Length    40                        12, 2003 07:41:04  Last Modified         Wednesday, February              Deleted Observations  0                        12, 2003 07:41:04  Protection                                             Compressed            NO  Data Set Type                                          Sorted                NO  Label  Data Representation   WINDOWS_32  Encoding              wlatin1   Western (Windows)                        Engine/Host Dependent Information  Data Set Page Size          4096  Number of Data Set Pages    1  First Data Page             1  Max Obs per Page            101  Obs in First Data Page      1  Number of Data Set Repairs  0  File Name                   C:\TEMP\SAS Temporary Files\_TD246\oranges.sas7bdat  Release Created             9.0101B0  Host Created                WIN_NT                    Alphabetic List of Variables and Attributes                           #    Variable    Type    Len                           2    flavor      Num       8                           4    looks       Num       8                           3    texture     Num       8                           5    total       um        8                           1    variety     Char      8 
end example
 

The size of the resulting data set depends on the data set page size and the number of observations. The following formula can be used to estimate the data set size:

  • number of data pages = 1 + (floor(number of obs / Max Obs per Page ))

  • size = 1024 + ( Data Set Page Size * number of data pages)

( floor represents a function that rounds the value down to the nearest integer.)

Taking the information shown in Output 7.1, you can calculate the size of the example data set:

  • number of data pages = 1 + (floor(1/101))

  • size = 1024 + (4096 * 1) = 5120

Thus, the example data set uses 5,120 bytes of storage space.

Increasing the Efficiency of Interactive Processing

If you are running a SAS job using SAS interactively and the job generates numerous log messages or extensive output, consider using the AUTOSCROLL command to suppress the scrolling of windows. This makes your job run faster because SAS does not have to use resources to update the display of the LOG and OUTPUT windows during the job. For example, issuing autoscroll 0 in the LOG window causes the LOG window not to scroll until your job is finished. (For the OUTPUT window, AUTOSCROLL is set to 0 by default.)

Minimizing the LOG window also might make your job run faster, especially if SAS is generating numerous log messages.




SAS 9.1 Companion for Windows
SAS 9.1 Companion for Windows (2 Volumes)
ISBN: 1590472004
EAN: 2147483647
Year: 2004
Pages: 187

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net