Syntax: COMPARE Procedure


Restriction: You must use the VAR statement when you use the WITH statement.

Tip: Supports the Output Delivery System. See Output Delivery System on page 32 for details.

ODS Table Names : See: ODS Table Names on page 251

Reminder: You can use the LABEL, ATTRIB, FORMAT, and WHERE statements. See Chapter 3, Statements with the Same Function in Multiple Procedures, on page 57 for details. You can also use any global statements. See Global Statements on page 18 for a list.

PROC COMPARE < option(s) >;

  • BY <DESCENDING> variable-1

    • < <DESCENDING> variable-n >

    • <NOTSORTED>;

  • ID <DESCENDING> variable-1

    • < <DESCENDING> variable-n >

    • <NOTSORTED>;

  • VAR variable(s) ;

  • WITH variable(s) ;

To do this

Use this statement

Produce a separate comparison for each BY group

BY

Identify variables to use to match observations

ID

Restrict the comparison to values of specific variables

VAR

Compare variables of different names

WITH and VAR

Compare two variables in the same data set

WITH and VAR

PROC COMPARE Statement

Restriction: If you omit COMPARE=, then you must use the WITH and VAR statements.

Restriction: PROC COMPARE reports errors differently if one or both of the compared data sets are not RADIX addressable. Version 6 compressed files are not RADIX addressable, while, beginning with Version 7, compressed files are RADIX addressable. (The integrity of the data is not compromised; the procedure simply numbers the observations differently.)

Reminder: You can use data set options with the BASE= and COMPARE= options.

PROC COMPARE < option(s) >;

To do this

Use this option

Specify the data sets to compare

 
 

Specify the base data set

BASE=

 

Specify the comparison data set

COMPARE=

Control the output data set

 
 

Create an output data set

OUT=

 

Write an observation for each observation in the BASE= and COMPARE= data sets

OUTALL

 

Write an observation for each observation in the BASE= data set

OUTBASE

 

Write an observation for each observation in the COMPARE= data set

OUTCOMP

 

Write an observation that contains the differences for each pair of matching observations

OUTDIF

 

Suppress the writing of observations when all values are equal

OUTNOEQUAL

 

Write an observation that contains the percent differences for each pair of matching observations

OUTPERCENT

Create an output data set that contains summary statistics

OUTSTATS=

Specify how the values are compared

 
 

Specify the criterion for judging the equality of numeric values

CRITERION=

 

Specify the method for judging the equality of numeric values

METHOD=

 

Judge missing values equal to any value

NOMISSBASE and NOMISSCOMP

Control the details in the default report

 
 

Include the values for all matching observations

ALLOBS

 

Print a table of summary statistics for all pairs of matching variables

ALLSTATS and STATS

 

Include in the report the values and differences for all matching variables

ALLVARS

 

Print only a short comparison summary

BRIEFSUMMARY

 

Change the report for numbers between 0 and 1

FUZZ=

 

Restrict the number of differences to print

MAXPRINT=

 

Suppress the print of creation and last-modified dates

NODATE

 

Suppress all printed output

NOPRINT

 

Suppress the summary reports

NOSUMMARY

 

Suppress the value comparison results.

NOVALUES

 

Produce a complete listing of values and differences

PRINTALL

 

Print the value differences by observation, not by variable

TRANSPOSE

Control the listing of variables and observations

 
 

List all variables and observations found in only one data set

LISTALL

 

List all variables and observations found only in the base data set

LISTBASE

 

List all observations found only in the base data set

LISTBASEOBS

 

List all variables found only in the base data set

LISTBASEVAR

 

List all variables and observations found only in the comparison data set

LISTCOMP

 

List all observations found only in the comparison data set

LISTCOMPOBS

 

List all variables found only in the comparison data set

LISTCOMPVAR

 

List variables whose values are judged equal

LISTEQUALVAR

 

List all observations found in only one data set

LISTOBS

 

List all variables found in only one data set

LISTVAR

Options

ALLOBS

  • includes in the report of value comparison results the values and, for numeric variables, the differences for all matching observations, even if they are judged equal.

  • Default: If you omit ALLOBS, then PROC COMPARE prints values only for observations that are judged unequal .

  • Interaction: When used with the TRANSPOSE option, ALLOBS invokes the ALLVARS option and displays the values for all matching observations and variables.

ALLSTATS

  • prints a table of summary statistics for all pairs of matching variables.

  • See also: Table of Summary Statistics on page 248 for information on the statistics produced

ALLVARS

  • includes in the report of value comparison results the values and, for numeric variables, the differences for all pairs of matching variables, even if they are judged equal.

  • Default: If you omit ALLVARS, then PROC COMPARE prints values only for variables that are judged unequal.

  • Interaction: When used with the TRANSPOSE option, ALLVARS displays unequal values in context with the values for other matching variables. If you omit the TRANSPOSE option, then ALLVARS invokes the ALLOBS option and displays the values for all matching observations and variables.

BASE= SAS-data-set

  • specifies the data set to use as the base data set.

  • Alias: DATA=

  • Default: the most recently created SAS data set

  • Tip: You can use the WHERE= data set option with the BASE= option to limit the observations that are available for comparison.

BRIEFSUMMARY

  • produces a short comparison summary and suppresses the four default summary reports (data set summary report, variables summary report, observation summary report, and values comparison summary report).

  • Alias: BRIEF

  • Tip: By default, a listing of value differences accompanies the summary reports. To suppress this listing, use the NOVALUES option.

  • Featured in: Example 4 on page 262

COMPARE= SAS-data-set

  • specifies the data set to use as the comparison data set.

  • Aliases: COMP=, C=

  • Default: If you omit COMPARE=, then the comparison data set is the same as the base data set, and PROC COMPARE compares variables within the data set.

  • Restriction: If you omit COMPARE=, then you must use the WITH statement.

  • Tip: You can use the WHERE= data set option with COMPARE= to limit the observations that are available for comparison.

CRITERION= ³

  • specifies the criterion for judging the equality of numeric values. Normally, the value of ³ (gamma) is positive, in which case the number itself becomes the equality criterion. If you use a negative value for , then PROC COMPARE uses an equality criterion proportional to the precision of the computer on which SAS is running.

  • Default: 0.00001

  • See also: The Equality Criterion on page 240 for more information

ERROR

  • displays an error message in the SAS log when differences are found.

  • Interaction: This option overrides the WARNING option.

FUZZ= number

  • alters the values comparison results for numbers less than number . PROC COMPARE prints

    • 0 for any variable value that is less than number

    • a blank for difference or percent difference if it is less than number

    • 0 for any summary statistic that is less than number .

  • Default

  • Range: 0 - 1

  • Tip: A report that contains many trivial differences is easier to read in this form.

LISTALL

  • lists all variables and observations that are found in only one data set.

  • Alias LIST

  • Interaction: using LISTALL is equivalent to using the following four options: LISTBASEOBS, LISTCOMPOBS, LISTBASEVAR, and LISTCOMPVAR.

LISTBASE

  • lists all observations and variables that are found in the base data set but not in the comparison data set.

  • Interaction: Using LISTBASE is equivalent to using the LISTBASEOBS and LISTBASEVAR options.

LISTBASEOBS

  • lists all observations that are found in the base data set but not in the comparison data set.

LISTBASEVAR

  • lists all variables that are found in the base data set but not in the comparison data set.

LISTCOMP

  • lists all observations and variables that are found in the comparison data set but not in the base data set.

  • Interaction: Using LISTCOMP is equivalent to using the LISTCOMPOBS and LISTCOMPVAR options.

LISTCOMPOBS

  • lists all observations that are found in the comparison data set but not in the base data set.

LISTCOMPVAR

  • lists all variables that are found in the comparison data set but not in the base data set.

LISTEQUALVAR

  • prints a list of variables whose values are judged equal at all observations in addition to the default list of variables whose values are judged unequal.

LISTOBS

  • lists all observations that are found in only one data set.

  • Interaction: Using LISTOBS is equivalent to using the LISTBASEOBS and LISTCOMPOBS options.

LISTVAR

  • lists all variables that are found in only one data set.

  • Interaction: Using LISTVAR is equivalent to using both the LISTBASEVAR and LISTCOMPVAR options.

MAXPRINT= total ( per-variable, total )

  • specifies the maximum number of differences to print, where

  • total

    • is the maximum total number of differences to print. The default value is 500 unless you use the ALLOBS option (or both the ALLVAR and TRANSPOSE options), in which case the default is 32000.

  • per-variable

    • is the maximum number of differences to print for each variable within a BY group. The default value is 50 unless you use the ALLOBS option (or both the ALLVAR and TRANSPOSE options), in which case the default is 1000. The MAXPRINT= option prevents the output from becoming extremely large when data sets differ greatly.

METHOD=ABSOLUTE EXACT PERCENT RELATIVE<( )>

  • specifies the method for judging the equality of numeric values. The constant (delta) is a number between 0 and 1 that specifies a value to add to the denominator when calculating the equality measure. By default, is 0.

    Unless you use the CRITERION= option, the default method is EXACT. If you use the CRITERION= option, then the default method is RELATIVE( ), where (phi) is a small number that depends on the numerical precision of the computer on which SAS is running and on the value of CRITERION=.

  • See also: The Equality Criterion on page 240

NODATE

  • suppresses the display in the data set summary report of the creation dates and the last modified dates of the base and comparison data sets.

NOMISSBASE

  • judges a missing value in the base data set equal to any value. (By default, a missing value is equal only to a missing value of the same kind, that is .=., .^=.A, .A=.A, .A^=.B, and so on.)

    You can use this option to determine the changes that would be made to the observations in the comparison data set if it were used as the master data set and the base data set were used as the transaction data set in a DATA step UPDATE statement. For information on the UPDATE statement, see the chapter on SAS language statements in SAS Language Reference: Dictionary .

NOMISSCOMP

  • judges a missing value in the comparison data set equal to any value. (By default, a missing value is equal only to a missing value of the same kind, that is .=., .^=.A, .A=.A, .A^=.B, and so on.)

    You can use this option to determine the changes that would be made to the observations in the base data set if it were used as the master data set and the comparison data set were used as the transaction data set in a DATA step UPDATE statement. For information on the UPDATE statement, see the chapter on SAS language statements in SAS Language Reference: Dictionary .

NOMISSING

  • judges missing values in both the base and comparison data sets equal to any value. By default, a missing value is only equal to a missing value of the same kind, that is .=., .^=.A, .A=.A, .A^=.B, and so on.

  • Alias: NOMISS

  • Interaction: Using NOMISSING is equivalent to using both NOMISSBASE and NOMISSCOMP.

NOPRINT

  • suppresses all printed output.

  • Tip: You may want to use this option when you are creating one or more output data sets.

  • Featured in: Example 6 on page 268

NOSUMMARY

  • suppresses the data set, variable, observation, and values comparison summary reports.

  • Tips: NOSUMMARY produces no output if there are no differences in the matching values.

  • Featured in: Example 2 on page 259

NOTE

  • displays notes in the SAS log that describe the results of the comparison, whether or not differences were found.

NOVALUES

  • suppresses the report of the value comparison results.

  • Featured in: Overview: COMPARE Procedure on page 224

OUT= SAS-data-set

  • names the output data set. If SAS-data-set does not exist, then PROC COMPARE creates it. SAS-data-set contains the differences between matching variables.

  • See also: Output Data Set (OUT=) on page 252

  • Featured in: Example 6 on page 268

OUTALL

  • writes an observation to the output data set for each observation in the base data set and for each observation in the comparison data set. The option also writes observations to the output data set that contains the differences and percent differences between the values in matching observations.

  • Tip: Using OUTALL is equivalent to using the following four options: OUTBASE, OUTCOMP, OUTDIF, and OUTPERCENT.

  • See also: Output Data Set (OUT=) on page 252

OUTBASE

  • writes an observation to the output data set for each observation in the base data set, creating observations in which _TYPE_=BASE.

  • See also: Output Data Set (OUT=) on page 252

  • Featured in: Example 6 on page 268

OUTCOMP

  • writes an observation to the output data set for each observation in the comparison data set, creating observations in which _TYPE_=COMP.

  • See also: Output Data Set (OUT=) on page 252

  • Featured in: Example 6 on page 268

OUTDIF

  • writes an observation to the output data set for each pair of matching observations. The values in the observation include values for the differences between the values in the pair of observations. The value of _TYPE_ in each observation is DIF.

  • Default: The OUTDIF option is the default unless you specify the OUTBASE, OUTCOMP, or OUTPERCENT option. If you use any of these options, then you must explicitly specify the OUTDIF option to create _TYPE_=DIF observations in the output data set.

  • See also: Output Data Set (OUT=) on page 252

  • Featured in: Example 6 on page 268

OUTNOEQUAL

  • suppresses the writing of an observation to the output data set when all values in the observation are judged equal. In addition, in observations containing values for some variables judged equal and others judged unequal, the OUTNOEQUAL option uses the special missing value ".E" to represent differences and percent differences for variables judged equal.

  • See also: Output Data Set (OUT=) on page 252

  • Featured in: Example 6 on page 268

OUTPERCENT

  • writes an observation to the output data set for each pair of matching observations. The values in the observation include values for the percent differences between the values in the pair of observations. The value of _TYPE_ in each observation is PERCENT.

  • See also: Output Data Set (OUT=) on page 252

OUTSTATS= SAS-data-set

  • writes summary statistics for all pairs of matching variables to the specified SAS-data-set .

  • Tip: If you want to print a table of statistics in the procedure output, then use the STATS, ALLSTATS, or PRINTALL option.

  • See also: Output Statistics Data Set (OUTSTATS=) on page 253 and Table of Summary Statistics on page 248.

  • Featured in: Example 7 on page 271

PRINTALL

  • invokes the following options: ALLVARS, ALLOBS, ALLSTATS, LISTALL, and WARNING.

  • Featured in: Example 1 on page 255

STATS

  • prints a table of summary statistics for all pairs of matching numeric variables that are judged unequal.

  • See also: Table of Summary Statistics on page 248 for information on the statistics produced.

TRANSPOSE

  • prints the reports of value differences by observation instead of by variable.

  • Interaction: If you also use the NOVALUES option, then the TRANSPOSE option lists only the names of the variables whose values are judged unequal for each observation, not the values and differences.

  • See also: Comparison Results for Observations (Using the TRANSPOSE Option) on page 251.

WARNING

  • displays a warning message in the SAS log when differences are found.

  • Interaction: The ERROR option overrides the WARNING option.

BY Statement

Produces a separate comparison for each BY group.

Main discussion: BY on page 58

BY <DESCENDING> variable-1

    • < <DESCENDING> variable-n >

    • <NOTSORTED>;

Required Arguments

variable

  • specifies the variable that the procedure uses to form BY groups. You can specify more than one variable. If you do not use the NOTSORTED option in the BY statement, then the observations in the data set must be sorted by all the variables that you specify. Variables in a BY statement are called BY variables .

Options

DESCENDING

  • specifies that the observations are sorted in descending order by the variable that immediately follows the word DESCENDING in the BY statement.

NOTSORTED

  • specifies that observations are not necessarily sorted in alphabetic or numeric order.

    The observations are grouped in another way, for example, chronological order.

The requirement for ordering observations according to the values of BY variables is suspended for BY-group processing when you use the NOTSORTED option. The procedure defines a BY group as a set of contiguous observations that have the same values for all BY variables. If observations with the same values for the BY variables are not contiguous, then the procedure treats each contiguous set as a separate BY group.

BY Processing with PROC COMPARE

To use a BY statement with PROC COMPARE, you must sort both the base and comparison data sets by the BY variables. The nature of the comparison depends on whether all BY variables are in the comparison data set and, if they are, whether their attributes match those of the BY variables in the base data set. The following table shows how PROC COMPARE behaves under different circumstances:

Condition

Behavior of PROC COMPARE

All BY variables are in the comparison data set and all attributes match exactly

Compares corresponding BY groups

None of the BY variables are in the comparison data set

Compares each BY group in the base data set with the entire comparison data set

Some BY variables are not in the comparison data set

Writes an error message to the SAS log and terminates

Some BY variables have different types in the two data sets

Writes an error message to the SAS log and terminates

ID Statement

Lists variables to use to match observations.

See also: A Comparison with an ID Variable on page 239

Featured in: Example 5 on page 264

  • ID <DESCENDING> variable-1

    • < <DESCENDING> variable-n >

    • <NOTSORTED>;

Required Arguments

variable

  • specifies the variable that the procedure uses to match observations. You can specify more than one variable, but the data set must be sorted by the variable or variables you specify. These variables are ID variables . ID variables also identify observations on the printed reports and in the output data set.

Options

DESCENDING

  • specifies that the data set is sorted in descending order by the variable that immediately follows the word DESCENDING in the ID statement.

    If you use the DESCENDING option, then you must sort the data sets. SAS does not use an index to process an ID statement with the DESCENDING option. Further, the use of DESCENDING for ID variables must correspond to the use of the DESCENDING option in the BY statement in the PROC SORT step that was used to sort the data sets.

NOTSORTED

  • specifies that observations are not necessarily sorted in alphabetic or numeric order. The data are grouped in another way, for example, chronological order.

  • See also: Comparing Unsorted Data on page 236

Requirements for ID Variables

  • ID variables must be in the BASE= data set or PROC COMPARE stops processing.

  • If an ID variable is not in the COMPARE= data set, then PROC COMPARE writes a warning message to the SAS log and does not use that variable to match observations in the comparison data set (but does write it to the OUT= data set).

  • ID variables must be of the same type in both data sets.

  • You should sort both data sets by the common ID variables (within the BY variables, if any) unless you specify the NOTSORTED option.

Comparing Unsorted Data

If you do not want to sort the data set by the ID variables, then you can use the NOTSORTED option. When you specify the NOTSORTED option, or if the ID statement is omitted, PROC COMPARE matches the observations one-to-one. That is, PROC COMPARE matches the first observation in the base data set with the first observation in the comparison data set, the second with the second, and so on. If you use NOTSORTED, and the ID values of corresponding observations are not the same, then PROC COMPARE prints an error message and stops processing.

If the data sets are not sorted by the common ID variables and if you do not specify the NOTSORTED option, then PROC COMPARE writes a warning message to the SAS log and continues to process the data sets as if you had specified NOTSORTED.

Avoiding Duplicate ID Values

The observations in each data set should be uniquely labeled by the values of the ID variables. If PROC COMPARE finds two successive observations with the same ID values in a data set, then it

  • prints the warning Duplicate Observations for the first occurrence for that data set

  • prints the total number of duplicate observations found in the data set in the observation summary report

  • uses the first observation with the duplicate value for the comparison.

When the data sets are not sorted, PROC COMPARE detects only those duplicate observations that occur in succession.

VAR Statement

Restricts the comparison of the values of variables to those named in the VAR statement.

Featured in: Example 2 on page 259, Example 3 on page 261, and Example 4 on page 262

VAR variable(s) ;

Required Arguments

variable(s)

  • one or more variables that appear in the BASE= and COMPARE= data sets or only in the BASE= data set.

Details

  • If you do not use the VAR statement, then PROC COMPARE compares the values of all matching variables except those appearing in BY and ID statements.

  • If a variable in the VAR statement does not exist in the COMPARE= data set, then PROC COMPARE writes a warning message to the SAS log and ignores the variable.

  • If a variable in the VAR statement does not exist in the BASE= data set, then PROC COMPARE stops processing and writes an error message to the SAS log.

  • The VAR statement restricts only the comparison of values of matching variables. PROC COMPARE still reports on the total number of matching variables and compares their attributes. However, it produces neither error nor warning messages about these variables.

WITH Statement

Compares variables in the base data set with variables that have different names in the comparison data set, and compares different variables that are in the same data set.

Restriction: You must use the VAR statement when you use the WITH statement.

Featured in: Example 2 on page 259, Example 3 on page 261, and Example 4 on page 262

WITH variable(s) ;

Required Arguments

variable(s)

  • one or more variables to compare with variables in the VAR statement.

Comparing Selected Variables

If you want to compare variables in the base data set with variables that have different names in the comparison data set, then specify the names of the variables in the base data set in the VAR statement and specify the names of the matching variables in the WITH statement. The first variable that you list in the WITH statement corresponds to the first variable that you list in the VAR statement, the second with the second, and so on. If the WITH statement list is shorter than the VAR statement list, then PROC COMPARE assumes that the extra variables in the VAR statement have the same names in the comparison data set as they do in the base data set. If the WITH statement list is longer than the VAR statement list, then PROC COMPARE ignores the extra variables.

A variable name can appear any number of times in the VAR statement or the WITH statement. By selecting VAR and WITH statement lists, you can compare the variables in any permutation.

If you omit the COMPARE= option in the PROC COMPARE statement, then you must use the WITH statement. In this case, PROC COMPARE compares the values of variables with different names in the BASE= data set.




Base SAS 9.1.3 Procedures Guide (Vol. 1)
Base SAS 9.1 Procedures Guide, Volumes 1, 2, 3 and 4
ISBN: 1590472047
EAN: 2147483647
Year: 2004
Pages: 260

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net