Concepts: COMPARE Procedure


Comparisons Using PROC COMPARE

PROC COMPARE first compares the following:

  • data set attributes (set by the data set options TYPE= and LABEL=).

  • variables . PROC COMPARE checks each variable in one data set to determine whether it matches a variable in the other data set.

  • attributes (type, length, labels, formats, and informats) of matching variables.

  • observations. PROC COMPARE checks each observation in one data set to determine whether it matches an observation in the other data set. PROC COMPARE either matches observations by their position in the data sets or by the values of the ID variable.

After making these comparisons, PROC COMPARE compares the values in the parts of the data sets that match. PROC COMPARE either compares the data by the position of observations or by the values of an ID variable.

A Comparison by Position of Observations

Figure 9.1 on page 239 shows two data sets. The data inside the shaded boxes shows the part of the data sets that the procedure compares. Assume that variables with the same names have the same type.

Data Set ONE

IDNUM

NAME

GENDER

GPA

2998

Bagwell

f

3.722

9866

Metcalf

m

3.342

2118

Gray

f

3.177

3847

Baglione

f

4.000

2342

Hall

m

3.574

Data Set TWO

IDNUM

NAME

GENDER

GPA

YEAR

2998

Bagwell

f

3.722

2

9866

Metcalf

m

3.342

2

2118

Gray

f

3.177

3

3847

Baglione

f

4.000

4

2342

Hall

m

3.574

4

7565

Gold

f

3.609

2

1755

Syme

f

3.883

3


Figure 9.1: Comparison by the Positions of Observations

When you use PROC COMPARE to compare data set TWO with data set ONE, the procedure compares the first observation in data set ONE with the first observation in data set TWO, and it compares the second observation in the first data set with the second observation in the second data set, and so on. In each observation that it compares, the procedure compares the values of the IDNUM, NAME, GENDER, and GPA.

The procedure does not report on the values of the last two observations or the variable YEAR in data set TWO because there is nothing to compare them with in data set ONE.

A Comparison with an ID Variable

In a simple comparison, PROC COMPARE uses the observation number to determine which observations to compare. When you use an ID variable, PROC COMPARE uses the values of the ID variable to determine which observations to compare. ID variables should have unique values and must have the same type.

For the two data sets shown in Figure 9.2 on page 240, assume that IDNUM is an ID variable and that IDNUM has the same type in both data sets. The procedure compares the observations that have the same value for IDNUM. The data inside the shaded boxes shows the part of the data sets that the procedure compares.

Data Set ONE

IDNUM

NAME

GENDER

GPA

2998

Bagwell

f

3.722

9866

Metcalf

m

3.342

2118

Gray

f

3.177

3847

Baglione

f

4.000

2342

Hall

m

3.574

Data Set TWO

IDNUM

NAME

GENDER

GPA

YEAR

2998

Bagwell

f

3.722

2

9866

Metcalf

m

3.342

2

2118

Gray

f

3.177

3

3847

Baglione

f

4.000

4

2342

Hall

m

3.574

4

7565

Gold

f

3.609

2

1755

Syme

f

3.883

3


Figure 9.2: Comparison by the Value of the ID Variable

The data sets contain three matching variables: NAME, GENDER, and GPA. They also contain five matching observations: the observations with values of 2998 , 9866 , 2118 , 3847 , and 2342 for IDNUM.

Data Set TWO contains two observations (IDNUM= 7565 and IDNUM= 1755 ) for which data set ONE contains no matching observations. Similarly, no variable in data set ONE matches the variable YEAR in data set TWO.

See Example 5 on page 264 for an example that uses an ID variable.

The Equality Criterion

Using the CRITERION= Option

The COMPARE procedure judges numeric values unequal if the magnitude of their difference, as measured according to the METHOD= option, is greater than the value of the CRITERION= option. PROC COMPARE provides four methods for applying CRITERION=:

  • The EXACT method tests for exact equality.

  • The ABSOLUTE method compares the absolute difference to the value specified by CRITERION=.

  • The RELATIVE method compares the absolute relative difference to the value specified by CRITERION=.

  • The PERCENT method compares the absolute percent difference to the value specified by CRITERION=.

For a numeric variable compared, let x be its value in the base data set and let y be its value in the comparison data set. If both x and y are nonmissing, then the values are judged unequal according to the value of METHOD= and the value of CRITERION= ( ³ ) as follows :

  • If METHOD=EXACT, then the values are unequal if y does not equal x .

  • If METHOD=ABSOLUTE, then the values are unequal if

    click to expand
  • If METHOD=RELATIVE, then the values are unequal if

    click to expand

    The values are equal if x = y =0.

  • If METHOD=PERCENT, then the values are unequal if

    click to expand

    or

If x or y is missing, then the comparison depends on the NOMISSING option. If the NOMISSING option is in effect, then a missing value will always be judged equal to anything. Otherwise, a missing value is judged equal only to a missing value of the same type (that is, .=., .^=.A, .A=.A, .A^=.B, and so on).

If the value that is specified for CRITERION= is negative, then the actual criterion that is used is made equal to the absolute value of ³ times a very small number ˆˆ ( epsilon ) that depends on the numerical precision of the computer. This number ˆˆ is defined as the smallest positive floating-point value such that, using machine arithmetic, 1 ˆ’ ˆˆ <1<1+ ˆˆ . Round-off or truncation error in floating-point computations is typically a few orders of magnitude larger than ˆˆ . This means that CRITERION= ˆ’ 1000 often provides a reasonable test of the equality of computed results at the machine level of precision.

The value added to the denominator in the RELATIVE method is specified in parentheses after the method name: METHOD=RELATIVE( ). If not specified in METHOD=, then defaults to 0. The value of can be used to control the behavior of the error measure when both x and y are very close to 0. If is not given and x and y are very close to 0, then any error produces a large relative error (in the limit, 2).

Specifying a value for avoids this extreme sensitivity of the RELATIVE method for small values. If you specify METHOD=RELATIVE( ) CRITERION= ³ when both x and y are much smaller than in absolute value, then the comparison is as if you had specified METHOD=ABSOLUTE CRITERION= ³ . However, when either x or y is much larger than in absolute value, the comparison is like METHOD=RELATIVE CRITERION= ³ . For moderate values of x and y , METHOD=RELATIVE( ) CRITERION= ³ is, in effect, a compromise between METHOD=ABSOLUTE CRITERION= ³ and METHOD=RELATIVE CRITERION= ³ .

For character variables, if one value is longer than the other, then the shorter value is padded with blanks for the comparison. Nonblank character values are judged equal only if they agree at each character. If the NOMISSING option is in effect, then blank character values are judged equal to anything.

Definition of Difference and Percent Difference

In the reports of value comparisons and in the OUT= data set, PROC COMPARE displays difference and percent difference values for the numbers compared. These quantities are defined using the value from the base data set as the reference value. For a numeric variable compared, let x be its value in the base data set and let y be its value in the comparison data set. If x and y are both nonmissing, then the difference and percent difference are defined as follows:

Difference =

Percent Difference = click to expand

Percent Difference = missing for

How PROC COMPARE Handles Variable Formats

PROC COMPARE compares unformatted values. If you have two matching variables that are formatted differently, then PROC COMPARE lists the formats of the variables.




Base SAS 9.1.3 Procedures Guide (Vol. 1)
Base SAS 9.1 Procedures Guide, Volumes 1, 2, 3 and 4
ISBN: 1590472047
EAN: 2147483647
Year: 2004
Pages: 260

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net