Results: COMPARE Procedure


Results Reporting

PROC COMPARE reports the results of its comparisons in the following ways:

  • the SAS log

  • return codes stored in the automatic macro SYSINFO

  • procedure output

  • output data sets.

SAS Log

When you use the WARNING, PRINTALL, or ERROR option, PROC COMPARE writes a description of the differences to the SAS log.

Macro Return Codes (SYSINFO)

PROC COMPARE stores a return code in the automatic macro variable SYSINFO. The value of the return code provides information about the result of the comparison. By checking the value of SYSINFO after PROC COMPARE has run and before any other step begins, SAS macros can use the results of a PROC COMPARE step to determine what action to take or what parts of a SAS program to execute.

Table 9.1 on page 243 is a key for interpreting the SYSINFO return code from PROC COMPARE. For each of the conditions listed, the associated value is added to the return code if the condition is true. Thus, the SYSINFO return code is the sum of the codes listed in Table 9.1 on page 243 for the applicable conditions:

Table 9.1: Macro Return Codes

Bit

Condition

Code

Hex

Description

1

DSLABEL

1

0001X

Data set labels differ

2

DSTYPE

2

0002X

Data set types differ

3

INFORMAT

4

0004X

Variable has different informat

4

FORMAT

8

0008X

Variable has different format

5

LENGTH

16

0010X

Variable has different length

6

LABEL

32

0020X

Variable has different label

7

BASEOBS

64

0040X

Base data set has observation not in comparison

8

COMPOBS

128

0080X

Comparison data set has observation not in base

9

BASEBY

256

0100X

Base data set has BY group not in comparison

10

COMPBY

512

0200X

Comparison data set has BY group not in base

11

BASEVAR

1024

0400X

Base data set has variable not in comparison

12

COMPVAR

2048

0800X

Comparison data set has variable not in base

13

VALUE

4096

1000X

A value comparison was unequal

14

TYPE

8192

2000X

Conflicting variable types

15

BYVAR

16384

4000X

BY variables do not match

16

ERROR

32768

8000X

Fatal error: comparison not done

These codes are ordered and scaled to enable a simple check of the degree to which the data sets differ. For example, if you want to check that two data sets contain the same variables, observations, and values, but you do not care about differences in labels, formats, and so forth, then use the following statements:

 proc compare base=  SAS-data-set  compare=  SAS-data-set  ;  run;  %if &sysinfo >= 64 %then     %do;  handle error  ;     %end; 

You can examine individual bits in the SYSINFO value by using DATA step bit-testing features to check for specific conditions. For example, to check for the presence of observations in the base data set that are not in the comparison data set, use the following statements:

 proc compare base=  SAS-data-set  compare=  SAS-data-set  ;  run;  %let rc=&sysinfo;  data _null_;     if &rc='1......'b then        put 'Observations in Base but not             in Comparison Data Set';  run; 

PROC COMPARE must run before you check SYSINFO and you must obtain the SYSINFO value before another SAS step starts because every SAS step resets SYSINFO.

Procedure Output

Procedure Output Overview

The following sections show and describe the default output of the two data sets shown in Overview: COMPARE Procedure on page 224. Because PROC COMPARE produces lengthy output, the output is presented in seven pieces.

Data Set Summary

This report lists the attributes of the data sets that are being compared. These attributes include the following:

  • the data set names

  • the data set types, if any

  • the data set labels, if any

  • the dates created and last modified

  • the number of variables in each data set

  • the number of observations in each data set.

Output 9.2 shows the Data Set Summary.

Output 9.2: Partial Output
start example
 COMPARE Procedure                     Comparison of PROCLIB.ONE with PROCLIB.TWO                                   (Method=EXACT)                                 Data Set Summary  Dataset               Created           Modified   NVar    NObs  Label  PROCLIB.ONE  11SEP97:15:11:07   11SEP97:15:11:09      5       4  First Data Set  PROCLIB.TWO  11SEP97:15:11:10   11SEP97:15:11:10      6       5  Second Data Set 
end example
 

Variables Summary

This report compares the variables in the two data sets. The first part of the report lists the following:

  • the number of variables the data sets have in common

  • the number of variables in the base data set that are not in the comparison data set and vice versa

  • the number of variables in both data sets that have different types

  • the number of variables that differ on other attributes (length, label, format, or informat)

  • the number of BY, ID, VAR, and WITH variables specified for the comparison.

The second part of the report lists matching variables with different attributes and shows how the attributes differ. (The COMPARE procedure omits variable labels if the line size is too small for them.)

Output 9.3 shows the Variables Summary.

Output 9.3: Partial Output
start example
 Variables Summary      Number of Variables in Common: 5.      Number of Variables in PROCLIB.TWO but not in PROCLIB.ONE: 1.      Number of Variables with Conflicting Types: 1.      Number of Variables with Differing Attributes: 3.      Listing of Common Variables with Conflicting Types             Variable  Dataset      Type  Length             student   PROCLIB.ONE  Num        8                       PROCLIB.TWO  Char       8    Listing of Common Variables with Differing Attributes  Variable  Dataset      Type  Length  Format  Label  year      PROCLIB.ONE  Char       8          Year of Birth            PROCLIB.TWO  Char       8  state     PROCLIB.ONE  Char       8            PROCLIB.TWO  Char       8          Home State  gr1       PROCLIB.ONE  Num        8  4.1            PROCLIB.TWO  Num        8  5.2 
end example
 

Observation Summary

This report provides information about observations in the base and comparison data sets. First of all, the report identifies the first and last observation in each data set, the first and last matching observations, and the first and last differing observations. Then, the report lists the following:

  • the number of observations that the data sets have in common

  • the number of observations in the base data set that are not in the comparison data set and vice versa

  • the total number of observations in each data set

  • the number of matching observations for which PROC COMPARE judged some variables unequal

  • the number of matching observations for which PROC COMPARE judged all variables equal.

Output 9.4 shows the Observation Summary.

Output 9.4: Partial Output
start example
 Observation Summary                   Observation      Base Compare                   First Obs           1       1                   First Unequal       1       1                   Last  Unequal       4       4                   Last  Match         4       4                   Last  Obs           .       5  Number of Observations in Common: 4.  Number of Observations in PROCLIB.TWO but not in PROCLIB.ONE: 1.  Total Number of Observations Read from PROCLIB.ONE: 4.  Total Number of Observations Read from PROCLIB.TWO: 5.  Number of Observations with Some Compared Variables Unequal: 4.  Number of Observations with All Compared Variables Equal: 0. 
end example
 

Values Comparison Summary

This report first lists the following:

  • the number of variables compared with all observations equal

  • the number of variables compared with some observations unequal

  • the number of variables with differences involving missing values, if any

  • the total number of values judged unequal

  • the maximum difference measure between unequal values for all pairs of matching variables (for differences not involving missing values).

In addition, for the variables for which some matching observations have unequal values, the report lists

  • the name of the variable

  • other variable attributes

  • the number of times PROC COMPARE judged the variable unequal

  • the maximum difference measure found between values (for differences not involving missing values)

  • the number of differences caused by comparison with missing values, if any.

Output 9.5 shows the Values Comparison Summary.

Output 9.5: Partial Output
start example
 Values Comparison Summary  Number of Variables Compared with All Observations Equal: 1.  Number of Variables Compared with Some Observations Unequal: 3.  Total Number of Values which Compare Unequal: 6.  Maximum Difference: 20.                 Variables with Unequal Values        Variable  Type  Len   Compare Label  Ndif   MaxDif        state     CHAR    8   Home State        2        gr1       NUM     8                     2    1.000        gr2       NUM     8                     2   20.000 
end example
 

Value Comparison Results

This report consists of a table for each pair of matching variables judged unequal at one or more observations. When comparing character values, PROC COMPARE displays only the first 20 characters. When you use the TRANSPOSE option, it displays only the first 12 characters . Each table shows

  • the number of the observation or, if you use the ID statement, the values of the ID variables

  • the value of the variable in the base data set

  • the value of the variable in the comparison data set

  • the difference between these two values (numeric variables only)

  • the percent difference between these two values (numeric variables only).

Output 9.6 shows the Value Comparison Results for Variables.

Output 9.6: Partial Output
start example
 Value Comparison Results for Variables  __________________________________________________________                Home State                Base Value           Compare Value          Obs   state                 state    ________    ________              ________                         2    MD                    MA           4    MA                    MD  __________________________________________________________  __________________________________________________________                     Base    Compare          Obs         gr1        gr1      Diff.     % Diff    ________    _________  _________  _________  _________                         1         85.0      84.00    -1.0000    -1.1765           3         78.0      79.00     1.0000     1.2821  __________________________________________________________  __________________________________________________________                     Base    Compare          Obs         gr2        gr2      Diff.     % Diff    ________    _________  _________  _________  _________                         3      72.0000    73.0000     1.0000     1.3889           4      94.0000    74.0000   -20.0000   -21.2766  __________________________________________________________ 
end example
 

You can suppress the value comparison results with the NOVALUES option. If you use both the NOVALUES and TRANSPOSE options, then PROC COMPARE lists for each observation the names of the variables with values judged unequal but does not display the values and differences.

Table of Summary Statistics

If you use the STATS, ALLSTATS, or PRINTALL option, then the Value Comparison Results for Variables section contains summary statistics for the numeric variables that are being compared. The STATS option generates these statistics for only the numeric variables whose values are judged unequal. The ALLSTATS and PRINTALL options generate these statistics for all numeric variables, even if all values are judged equal.

Note: In all cases PROC COMPARE calculates the summary statistics based on all matching observations that do not contain missing values, not just on those containing unequal values.

Output 9.7 shows the following summary statistics for base data set values, comparison data set values, differences, and percent differences:

N

  • the number of nonmissing values

MEAN

  • the mean, or average, of the values

STD

  • the standard deviation

MAX

  • the maximum value

MIN

  • the minimum value

STDERR

  • the standard error of the mean

T

  • the T ratio (MEAN/STDERR)

PROB> T

  • the probability of a greater absolute T value if the true population mean is 0.

NDIF

  • the number of matching observations judged unequal, and the percent of the matching observations that were judged unequal.

DIFMEANS

  • the difference between the mean of the base values and the mean of the comparison values. This line contains three numbers . The first is the mean expressed as a percentage of the base values mean. The second is the mean expressed as a percentage of the comparison values mean. The third is the difference in the two means (the comparison mean minus the base mean).

R

  • the correlation of the base and comparison values for matching observations that are nonmissing in both data sets.

RSQ

  • the square of the correlation of the base and comparison values for matching observations that are nonmissing in both data sets.

Output 9.7 is from the ALLSTATS option using the two data sets shown in Overview :

Output 9.7: Partial Output
start example
 Value Comparison Results for Variables  __________________________________________________________                     Base    Compare          Obs         gr1        gr1      Diff.     % Diff   ________     _________  _________  _________  _________                         1         85.0      84.00    -1.0000    -1.1765           3         78.0      79.00     1.0000     1.2821   ________     _________  _________  _________  _________                     N                4          4          4          4      Mean        85.5000    85.5000          0     0.0264      Std          5.8023     5.4467     0.8165     1.0042      Max         92.0000    92.0000     1.0000     1.2821      Min         78.0000    79.0000    -1.0000    -1.1765     StdErr        2.9011     2.7234     0.4082     0.5021       t          29.4711    31.3951     0.0000     0.0526    Prob>t      <.0001     <.0001     1.0000     0.9614                  Ndif                2     50.000%   DifMeans         0.000%     0.000%         0    r, rsq          0.991      0.983  __________________________________________________________  __________________________________________________________                     Base    Compare         Obs          gr2        gr2      Diff.     % Diff   ________     _________  _________  _________  _________                        3       72.0000    73.0000     1.0000     1.3889          4       94.0000    74.0000   -20.0000   -21.2766   ________     _________  _________  _________  _________                    N                 4          4          4          4     Mean         86.2500    81.5000    -4.7500    -4.9719     Std           9.9457     9.4692    10.1776    10.8895     Max          94.0000    92.0000     1.0000     1.3889     Min          72.0000    73.0000   -20.0000   -21.2766    StdErr         4.9728     4.7346     5.0888     5.4447      t           17.3442    17.2136    -0.9334    -0.9132   Prob>t        0.0004     0.0004     0.4195     0.4285                  Ndif                2     50.000%   DifMeans        -5.507%    -5.828%   -4.7500    r, rsq          0.451      0.204  __________________________________________________________ 
end example
 

Note: If you use a wide line size with PRINTALL, then PROC COMPARE prints the value comparison result for character variables next to the result for numeric variables. In that case, PROC COMPARE calculates only NDIF for the character variables.

Comparison Results for Observations (Using the TRANSPOSE Option)

The TRANSPOSE option prints the comparison results by observation instead of by variable. The comparison results precede the observation summary report. By default, the source of the values for each row of the table is indicated by the following label:

 _OBS_1=  number-1  _OBS_2=  number-2  

where number-1 is the number of the observation in the base data set for which the value of the variable is shown, and number-2 is the number of the observation in the comparison data set.

Output 9.8 shows the differences in PROCLIB.ONE and PROCLIB.TWO by observation instead of by variable.

Output 9.8: Partial Output
start example
 Comparison Results for Observations  _OBS_1=1 _OBS_2=1:  Variable    Base Value       Compare         Diff.        % Diff       gr1          85.0         84.00     -1.000000     -1.176471  _OBS_1=2 _OBS_2=2:  Variable    Base Value       Compare     state            MD            MA  _OBS_1=3 _OBS_2=3:  Variable    Base Value       Compare         Diff.        % Diff       gr1          78.0         79.00      1.000000      1.282051       gr2     72.000000     73.000000      1.000000      1.388889  _OBS_1=4 _OBS_2=4:  Variable    Base Value       Compare         Diff.        % Diff       gr2     94.000000     74.000000    -20.000000    -21.276596     state            MA            MD 
end example
 

If you use an ID statement, then the identifying label has the following form:

  ID-1  =  ID-value-1 ... ID-n  =  ID-value-n  

where ID is the name of an ID variable and ID-value is the value of the ID variable.

Note: When you use the TRANSPOSE option, PROC COMPARE prints only the first 12 characters of the value.

ODS Table Names

The COMPARE procedure assigns a name to each table that it creates. You can use these names to reference the table when using the Output Delivery System (ODS) to select tables and create output data sets. For more information, see The Complete Guide to the SAS Output Delivery System .

Table 9.2: ODS Tables Produced by the COMPARE Procedure

Table Name

Description

Generated...

CompareDatasets

Information about the data set or data sets

by default, unless NOSUMMARY or NOVALUES option is specified

CompareDetails (Comparison Results for Observations)

A listing of observations that the base data set and the compare data set do not have in common

if PRINTALL option is specified

CompareDetails (ID variable notes and warnings)

A listing of notes and warnings concerning duplicate ID variable values

if ID statement is specified and duplicate ID variable values exist in either data set

CompareDifferences

A report of variable value differences

by default unless NOVALUES option is specified

CompareSummary

Summary report of observations, values, and variables with unequal values

by default

CompareVariables

A listing of differences in variable types or attributes between the base data set and the compare data set

by default, unless the variables are identical or the NOSUMMARY option is specified

Output Data Set (OUT=)

By default, the OUT= data set contains an observation for each pair of matching observations. The OUT= data set contains the following variables from the data sets you are comparing:

  • all variables named in the BY statement

  • all variables named in the ID statement

  • all matching variables or, if you use the VAR statement, all variables listed in the VAR statement.

In addition, the data set contains two variables created by PROC COMPARE to identify the source of the values for the matching variables: _TYPE_ and _OBS_.

_TYPE_

  • is a character variable of length 8. Its value indicates the source of the values for the matching (or VAR) variables in that observation. (For ID and BY variables, which are not compared, the values are the values from the original data sets.) _TYPE_ has the label Type of Observation . The four possible values of this variable are as follows :

    • BASE

      • The values in this observation are from an observation in the base data set. PROC COMPARE writes this type of observation to the OUT= data set when you specify the OUTBASE option.

    • COMPARE

      • The values in this observation are from an observation in the comparison data set. PROC COMPARE writes this type of observation to the OUT= data set when you specify the OUTCOMP option.

    • DIF

      • The values in this observation are the differences between the values in the base and comparison data sets. For character variables, PROC COMPARE uses a period (.) to represent equal characters and an X to represent unequal characters. PROC COMPARE writes this type of observation to the OUT= data set by default. However, if you request any other type of observation with the OUTBASE, OUTCOMP, or OUTPERCENT option, then you must specify the OUTDIF option to generate observations of this type in the OUT= data set.

    • PERCENT

      • The values in this observation are the percent differences between the values in the base and comparison data sets. For character variables the values in observations of type PERCENT are the same as the values in observations of type DIF.

_OBS_

  • is a numeric variable that contains a number further identifying the source of the OUT= observations.

    For observations with _TYPE_ equal to BASE, _OBS_ is the number of the observation in the base data set from which the values of the VAR variables were copied. Similarly, for observations with _TYPE_ equal to COMPARE, _OBS_ is the number of the observation in the comparison data set from which the values of the VAR variables were copied .

    For observations with _TYPE_ equal to DIF or PERCENT, _OBS_ is a sequence number that counts the matching observations in the BY group.

    _OBS_ has the label Observation Number .

The COMPARE procedure takes variable names and attributes for the OUT= data set from the base data set except for the lengths of ID and VAR variables, for which it uses the longer length regardless of which data set that length is from. This behavior has two important repercussions :

  • If you use the VAR and WITH statements, then the names of the variables in the OUT= data set come from the VAR statement. Thus, observations with _TYPE_ equal to BASE contain the values of the VAR variables, while observations with _TYPE_ equal to COMPARE contain the values of the WITH variables.

  • If you include a variable more than once in the VAR statement in order to compare it with more than one variable, then PROC COMPARE can include only the first comparison in the OUT= data set because each variable must have a unique name. Other comparisons produce warning messages.

For an example of the OUT= option, see Example 6 on page 268.

Output Statistics Data Set (OUTSTATS=)

When you use the OUTSTATS= option, PROC COMPARE calculates the same summary statistics as the ALLSTATS option for each pair of numeric variables compared (see Table of Summary Statistics on page 248). The OUTSTATS= data set contains an observation for each summary statistic for each pair of variables. The data set also contains the BY variables used in the comparison and several variables created by PROC COMPARE:

_VAR_

  • is a character variable that contains the name of the variable from the base data set for which the statistic in the observation was calculated.

_WITH_

  • is a character variable that contains the name of the variable from the comparison data set for which the statistic in the observation was calculated. The _WITH_ variable is not included in the OUTSTATS= data set unless you use the WITH statement.

_TYPE_

  • is a character variable that contains the name of the statistic contained in the observation. Values of the _TYPE_ variable are N , MEAN , STD , MIN , MAX , STDERR , T , PROBT , NDIF , DIFMEANS , and R , RSQ .

_BASE_

  • is a numeric variable that contains the value of the statistic calculated from the values of the variable named by _VAR_ in the observations in the base data set with matching observations in the comparison data set.

_COMP_

  • is a numeric variable that contains the value of the statistic calculated from the values of the variable named by the _VAR_ variable (or by the _WITH_ variable if you use the WITH statement) in the observations in the comparison data set with matching observations in the base data set.

_DIF_

  • is a numeric variable that contains the value of the statistic calculated from the differences of the values of the variable named by the _VAR_ variable in the base data set and the matching variable (named by the _VAR_ or _WITH_ variable) in the comparison data set.

_PCTDIF_

  • is a numeric variable that contains the value of the statistic calculated from the percent differences of the values of the variable named by the _VAR_ variable in the base data set and the matching variable (named by the _VAR_ or _WITH_ variable) in the comparison data set.

Note: For both types of output data sets, PROC COMPARE assigns one of the following data set labels:

 Comparison of  base-SAS-data-set  with  comparison-SAS-data-set  Comparison of variables in  base-SAS-data-set  

Labels are limited to 40 characters.

See Example 7 on page 271 for an example of an OUTSTATS= data set.




Base SAS 9.1.3 Procedures Guide (Vol. 1)
Base SAS 9.1 Procedures Guide, Volumes 1, 2, 3 and 4
ISBN: 1590472047
EAN: 2147483647
Year: 2004
Pages: 260

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net