Procedure features:
PROC COMPARE statement options
BASE=
PRINTALL
COMPARE=
Data sets:
PROCLIB.ONE, PROCLIB.TWO on page 224
This example shows the most complete report that PROC COMPARE produces as procedure output.
Declare the PROCLIB SAS data library.
libname proclib ' SAS-data-library ';
Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=40;
Create a complete report of the differences between two data sets. BASE= and COMPARE= specify the data sets to compare. PRINTALL prints a full report of the differences.
proc compare base=proclib.one compare=proclib.two printall; title 'Comparing Two Data Sets: Full Report'; run;
A > in the output marks information that is in the full report but not in the default report. The additional information includes a listing of variables found in one data set but not the other, a listing of observations found in one data set but not the other, a listing of variables with all equal values, and summary statistics. For an explanation of the statistics, see Table of Summary Statistics on page 248.
Comparing Two Data Sets: Full Report 1 COMPARE Procedure Comparison of PROCLIB.ONE with PROCLIB.TWO (Method=EXACT) Data Set Summary Dataset Created Modified NVar NObs Label PROCLIB.ONE 11SEP97:16:19:59 11SEP97:16:20:01 5 4 First Data Set PROCLIB.TWO 11SEP97:16:20:01 11SEP97:16:20:01 6 5 Second Data Set Variables Summary Number of Variables in Common: 5. Number of Variables in PROCLIB.TWO but not in PROCLIB.ONE: 1. Number of Variables with Conflicting Types: 1. Number of Variables with Differing Attributes: 3. Listing of Variables in PROCLIB.TWO but not in PROCLIB.ONE Variable Type Length > major Char 8 Listing of Common Variables with Conflicting Types Variable Dataset Type Length student PROCLIB.ONE Num 8 PROCLIB.TWO Char 8
Comparing Two Data Sets: Full Report 2 COMPARE Procedure Comparison of PROCLIB.ONE with PROCLIB.TWO (Method=EXACT) Listing of Common Variables with Differing Attributes Variable Dataset Type Length Format Label year PROCLIB.ONE Char 8 Year of Birth PROCLIB.TWO Char 8 state PROCLIB.ONE Char 8 PROCLIB.TWO Char 8 Home State gr1 PROCLIB.ONE Num 8 4.1 PROCLIB.TWO Num 8 5.2 Comparison Results for Observations > Observation 5 in PROCLIB.TWO not found in PROCLIB.ONE. Observation Summary Observation Base Compare First Obs 1 1 First Unequal 1 1 Last Unequal 4 4 Last Match 4 4 Last Obs . 5 Number of Observations in Common: 4. Number of Observations in PROCLIB.TWO but not in PROCLIB.ONE: 1. Total Number of Observations Read from PROCLIB.ONE: 4. Total Number of Observations Read from PROCLIB.TWO: 5. Number of Observations with Some Compared Variables Unequal: 4. Number of Observations with All Compared Variables Equal: 0.
Comparing Two Data Sets: Full Report 3 COMPARE Procedure Comparison of PROCLIB.ONE with PROCLIB.TWO (Method=EXACT) Values Comparison Summary Number of Variables Compared with All Observations Equal: 1. Number of Variables Compared with Some Observations Unequal: 3. Total Number of Values which Compare Unequal: 6. Maximum Difference: 20. Variables with All Equal Values > Variable Type Len Label year CHAR 8 Year of Birth Variables with Unequal Values Variable Type Len Compare Label Ndif MaxDif state CHAR 8 Home State 2 gr1 NUM 8 2 1.000 gr2 NUM 8 2 20.000
Comparing Two Data Sets: Full Report 4 COMPARE Procedure Comparison of PROCLIB.ONE with PROCLIB.TWO (Method=EXACT) Value Comparison Results for Variables __________________________________________________________ Year of Birth Base Value Compare Value Obs year year ________ ________ ________ 1 1970 1970 2 1971 1971 3 1969 1969 4 1970 1970 __________________________________________________________ __________________________________________________________ Home State Base Value Compare Value Obs state state ________ ________ ________ 1 NC NC 2 MD MA 3 PA PA 4 MA MD __________________________________________________________
Comparing Two Data Sets: Full Report 5 COMPARE Procedure Comparison of PROCLIB.ONE with PROCLIB.TWO (Method=EXACT) Value Comparison Results for Variables __________________________________________________________ Base Compare Obs gr1 gr1 Diff. % Diff ________ _________ _________ _________ _________ 1 85.0 84.00 -1.0000 -1.1765 2 92.0 92.00 0 0 3 78.0 79.00 1.0000 1.2821 4 87.0 87.00 0 0 ________ _________ _________ _________ _________ > N 4 4 4 4 Mean 85.5000 85.5000 0 0.0264 Std 5.8023 5.4467 0.8165 1.0042 Max 92.0000 92.0000 1.0000 1.2821 Min 78.0000 79.0000 -1.0000 -1.1765 StdErr 2.9011 2.7234 0.4082 0.5021 t 29.4711 31.3951 0.0000 0.0526 Prob>t <.0001 <.0001 1.0000 0.9614 Ndif 2 50.000% DifMeans 0.000% 0.000% 0 r, rsq 0.991 0.983 __________________________________________________________
Comparing Two Data Sets: Full Report 6 COMPARE Procedure Comparison of PROCLIB.ONE with PROCLIB.TWO (Method=EXACT) Value Comparison Results for Variables __________________________________________________________ Base Compare Obs gr2 gr2 Diff. % Diff ________ _________ _________ _________ _________ 1 87.0000 87.0000 0 0 2 92.0000 92.0000 0 0 3 72.0000 73.0000 1.0000 1.3889 4 94.0000 74.0000 -20.0000 -21.2766 ________ _________ _________ _________ _________ > N 4 4 4 4 Mean 86.2500 81.5000 -4.7500 -4.9719 Std 9.9457 9.4692 10.1776 10.8895 Max 94.0000 92.0000 1.0000 1.3889 Min 72.0000 73.0000 -20.0000 -21.2766 StdErr 4.9728 4.7346 5.0888 5.4447 t 17.3442 17.2136 -0.9334 -0.9132 Prob>t 0.0004 0.0004 0.4195 0.4285 Ndif 2 50.000% DifMeans -5.507% -5.828% -4.7500 r, rsq 0.451 0.204 __________________________________________________________
Procedure features:
PROC COMPARE statement option
NOSUMMARY
VAR statement
WITH statement
Data sets:
PROCLIB.ONE, PROCLIB.TWO on page 224.
This example compares a variable from the base data set with a variable in the comparison data set. All summary reports are suppressed.
Declare the PROCLIB SAS data library.
libname proclib ' SAS-data-library ';
Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=40;
Suppress all summary reports of the differences between two data sets. BASE= specifies the base data set and COMPARE= specifies the comparison data set. NOSUMMARY suppresses all summary reports.
proc compare base=proclib.one compare=proclib.two nosummary;
Specify one variable from the base data set to compare with one variable from the comparison data set. The VAR and WITH statements specify the variables to compare. This example compares GR1 from the base data set with GR2 from the comparison data set.
var gr1; with gr2; title 'Comparison of Variables in Different Data Sets'; run;
Comparison of Variables in Different Data Sets 1 COMPARE Procedure Comparison of PROCLIB.ONE with PROCLIB.TWO (Method=EXACT) NOTE: Data set PROCLIB.TWO contains 1 observations not in PROCLIB.ONE. NOTE: Values of the following 1 variables compare unequal: gr1^=gr2 Value Comparison Results for Variables __________________________________________________________ Base Compare Obs gr1 gr2 Diff. % Diff ________ _________ _________ _________ _________ 1 85.0 87.0000 2.0000 2.3529 3 78.0 73.0000 -5.0000 -6.4103 4 87.0 74.0000 -13.0000 -14.9425 __________________________________________________________
Procedure features:
VAR statement
WITH statement
Data sets:
PROCLIB.ONE, PROCLIB.TWO on page 224.
This example compares one variable from the base data set with two variables in the comparison data set.
Declare the PROCLIB SAS data library.
libname proclib ' SAS-data-library ';
Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=40;
Suppress all summary reports of the differences between two data sets. BASE= specifies the base data set and COMPARE= specifies the comparison data set. NOSUMMARY suppresses all summary reports.
proc compare base=proclib.one compare=proclib.two nosummary;
Specify one variable from the base data set to compare with two variables from the comparison data set. The VAR and WITH statements specify the variables to compare. This example compares GR1 from the base data set with GR1 and GR2 from the comparison data set.
var gr1 gr1; with gr1 gr2; title 'Comparison of One Variable with Two Variables'; run;
The Value Comparison Results section shows the result of the comparison.
Comparison of One Variable with Two Variables 1 COMPARE Procedure Comparison of PROCLIB.ONE with PROCLIB.TWO (Method=EXACT) NOTE: Data set PROCLIB.TWO contains 1 observations not in PROCLIB.ONE. NOTE: Values of the following 2 variables compare unequal: gr1^=gr1 gr1^=gr2 Value Comparison Results for Variables __________________________________________________________ Base Compare Obs gr1 gr1 Diff. % Diff ________ _________ _________ _________ _________ 1 85.0 84.00 -1.0000 -1.1765 3 78.0 79.00 1.0000 1.2821 __________________________________________________________ __________________________________________________________ Base Compare Obs gr1 gr2 Diff. % Diff ________ _________ _________ _________ _________ 1 85.0 87.0000 2.0000 2.3529 3 78.0 73.0000 -5.0000 -6.4103 4 87.0 74.0000 -13.0000 -14.9425 __________________________________________________________
Procedure features:
PROC COMPARE statement options
ALLSTATS
BRIEFSUMMARY
VAR statement
WITH statement
Data set:
PROCLIB.ONE on page 224.
This example shows that PROC COMPARE can compare two variables that are in the same data set.
Declare the PROCLIB SAS data library.
libname proclib ' SAS-data-library ';
Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=40;
Create a short summary report of the differences within one data set. ALLSTATS prints summary statistics. BRIEFSUMMARY prints only a short comparison summary.
proc compare base=proclib.one allstats briefsummary;
Specify two variables from the base data set to compare. The VAR and WITH statements specify the variables in the base data set to compare. This example compares GR1 with GR2. Because there is no comparison data set, the variables GR1 and GR2 must be in the base data set.
var gr1; with gr2; title 'Comparison of Variables in the Same Data Set'; run;
Comparison of Variables in the Same Data Set 1 COMPARE Procedure Comparisons of variables in PROCLIB.ONE (Method=EXACT) NOTE: Values of the following 1 variables compare unequal: gr1^=gr2 Value Comparison Results for Variables __________________________________________________________ Base Compare Obs gr1 gr2 Diff. % Diff ________ _________ _________ _________ _________ 1 85.0 87.0000 2.0000 2.3529 3 78.0 72.0000 -6.0000 -7.6923 4 87.0 94.0000 7.0000 8.0460 ________ _________ _________ _________ _________ N 4 4 4 4 Mean 85.5000 86.2500 0.7500 0.6767 Std 5.8023 9.9457 5.3774 6.5221 Max 92.0000 94.0000 7.0000 8.0460 Min 78.0000 72.0000 -6.0000 -7.6923 StdErr 2.9011 4.9728 2.6887 3.2611 t 29.4711 17.3442 0.2789 0.2075 Prob>t <.0001 0.0004 0.7984 0.8489 Ndif 3 75.000% DifMeans 0.877% 0.870% 0.7500 r, rsq 0.898 0.807 __________________________________________________________
Procedure features:
ID statement
In this example, PROC COMPARE compares only the observations that have matching values for the ID variable.
Declare the PROCLIB SAS data library.
libname proclib ' SAS-data-library ';
Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=40;
Create the PROCLIB.EMP95 and PROCLIB.EMP96 data sets. PROCLIB.EMP95 and PROCLIB.EMP96 contain employee data. IDNUM works well as an ID variable because it has unique values. A DATA step on page 1405 creates PROCLIB.EMP95. A DATA step on page 1406 creates PROCLIB.EMP96.
data proclib.emp95; input #1 idnum . @6 name . #2 address . #3 salary 6.; datalines; 2388 James Schmidt 100 Apt. C Blount St. SW Raleigh NC 27693 92100 2457 Fred Williams 99 West Lane Garner NC 27509 33190 ... more data lines ... 3888 Kim Siu 5662 Magnolia Blvd Southeast Cary NC 27513 77558 ; data proclib.emp96; input #1 idnum . @6 name . #2 address . #3 salary 6.; datalines; 2388 James Schmidt 100 Apt. C Blount St. SW Raleigh NC 27693 92100 2457 Fred Williams 99 West Lane Garner NC 27509 33190 ... more data lines ... 6544 Roger Monday 3004 Crepe Myrtle Court Raleigh NC 27604 47007 ;
Sort the data sets by the ID variable. Both data sets must be sorted by the variable that will be used as the ID variable in the PROC COMPARE step. OUT= specifies the location of the sorted data.
proc sort data=proclib.emp95 out=emp95_byidnum; by idnum; run; proc sort data=proclib.emp96 out=emp96_byidnum; by idnum; run;
Create a summary report that compares observations with matching values for the ID variable. The ID statement specifies IDNUM as the ID variable.
proc compare base=emp95_byidnum compare=emp96_byidnum; id idnum; title 'Comparing Observations that Have Matching IDNUMs'; run;
PROC COMPARE identifies specific observations by the value of IDNUM. In the Value Comparison Results for Variables section, PROC COMPARE prints the nonmatching addresses and nonmatching salaries. For salaries, PROC COMPARE computes the numerical difference and the percent difference. Because ADDRESS is a character variable, PROC COMPARE displays only the first 20 characters . For addresses where the observation has an IDNUM of 0987 , 2776 , or 3888 , the differences occur after the 20th character and the differences do not appear in the output. The plus sign in the output indicates that the full value is not shown. To see the entire value, create an output data set. See Example 6 on page 268.
Comparing Observations that Have Matching IDNUMs 1 COMPARE Procedure Comparison of WORK.EMP95_BYIDNUM with WORK.EMP96_BYIDNUM (Method=EXACT) Data Set Summary Dataset Created Modified NVar NObs WORK.EMP95_BYIDNUM 13MAY98:16:03:36 13MAY98:16:03:36 4 10 WORK.EMP96_BYIDNUM 13MAY98:16:03:36 13MAY98:16:03:36 4 12 Variables Summary Number of Variables in Common: 4. Number of ID Variables: 1. Observation Summary Observation Base Compare ID First Obs 1 1 idnum=0987 First Unequal 1 1 idnum=0987 Last Unequal 10 12 idnum=9857 Last Obs 10 12 idnum=9857 Number of Observations in Common: 10. Number of Observations in WORK.EMP96_BYIDNUM but not in WORK.EMP95_BYIDNUM: 2. Total Number of Observations Read from WORK.EMP95_BYIDNUM: 10. Total Number of Observations Read from WORK.EMP96_BYIDNUM: 12. Number of Observations with Some Compared Variables Unequal: 5. Number of Observations with All Compared Variables Equal: 5. Comparing Observations that Have Matching IDNUMs 2 COMPARE Procedure Comparison of WORK.EMP95_BYIDNUM with WORK.EMP96_BYIDNUM (Method=EXACT) Values Comparison Summary Number of Variables Compared with All Observations Equal: 1. Number of Variables Compared with Some Observations Unequal: 2. Total Number of Values which Compare Unequal: 8. Maximum Difference: 2400. Variables with Unequal Values Variable Type Len Ndif MaxDif address CHAR 42 4 salary NUM 8 4 2400 Value Comparison Results for Variables _______________________________________________________ Base Value Compare Value idnum address address _____ ___________________+ ___________________+ 0987 2344 Persimmons Bran 2344 Persimmons Bran 2776 12988 Wellington Far 12988 Wellington Far 3888 5662 Magnolia Blvd S 5662 Magnolia Blvd S 9857 1000 Taft Ave. Morri 100 Taft Ave. Morris _______________________________________________________ Comparing Observations that Have Matching IDNUMs 3 COMPARE Procedure Comparison of WORK.EMP95_BYIDNUM with WORK.EMP96_BYIDNUM (Method=EXACT) Value Comparison Results for Variables _______________________________________________________ Base Compare idnum salary salary Diff. % Diff _____ _________ _________ _________ _________ 0987 44010 45110 1100 2.4994 3286 87734 89834 2100 2.3936 3888 77558 79958 2400 3.0945 9857 38756 40456 1700 4.3864 _______________________________________________________
Procedure features:
PROC COMPARE statement options:
NOPRINT
OUT=
OUTBASE
OUTBASE
OUTCOMP
OUTDIF
OUTNOEQUAL
Other features: PRINT procedure
Data sets: PROCLIB.EMP95 and PROCLIB.EMP96 on page 265
This example creates and prints an output data set that shows the differences between matching observations.
In Example 5 on page 264, the output does not show the differences past the 20th character. The output data set in this example shows the full values. Further, it shows the observations that occur in only one of the data sets.
Declare the PROCLIB SAS data library.
libname proclib ' SAS-data-library ';
Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.
options nodate pageno=1 linesize=120 pagesize=40;
Sort the data sets by the ID variable. Both data sets must be sorted by the variable that will be used as the ID variable in the PROC COMPARE step. OUT= specifies the location of the sorted data.
proc sort data=proclib.emp95 out=emp95_byidnum; by idnum; run; proc sort data=proclib.emp96 out=emp96_byidnum; by idnum; run;
Specify the data sets to compare. BASE= and COMPARE= specify the data sets to compare.
proc compare base=emp95_byidnum compare=emp96_byidnum
Create the output data set RESULT and include all unequal observations and their differences. OUT= names and creates the output data set. NOPRINT suppresses the printing of the procedure output. OUTNOEQUAL includes only observations that are judged unequal. OUTBASE writes an observation to the output data set for each observation in the base data set. OUTCOMP writes an observation to the output data set for each observation in the comparison data set. OUTDIF writes an observation to the output data set that contains the differences between the two observations.
out=result outnoequal outbase outcomp outdif noprint;
Specify the ID variable. The ID statement specifies IDNUM as the ID variable.
id idnum; run;
Print the output data set RESULT and use the BY and ID statements with the ID variable. PROC PRINT prints the output data set. Using the BY and ID statements with the same variable makes the output easy to read. See Chapter 35, The PRINT Procedure, on page 719 for more information on this technique.
proc print data=result noobs; by idnum; id idnum; title 'The Output Data Set RESULT'; run;
The differences for character variables are noted with an X or a period (.). An X shows that the characters do not match. A period shows that the characters do match. For numeric variables, an E means that there is no difference. Otherwise, the numeric difference is shown. By default, the output data set shows that two observations in the comparison data set have no matching observation in the base data set. You do not have to use an option to make those observations appear in the output data set.
The Output Data Set RESULT 1 idnum _TYPE_ _OBS_ name address salary 0987 BASE 1 Dolly Lunford 2344 Persimmons Branch Apex NC 27505 44010 COMPARE 1 Dolly Lunford 2344 Persimmons Branch Trail Apex NC 27505 45110 DIF 1 ............... .......................XXXXX.XXXXXXXXXXXXX 1100 2776 BASE 5 Robert Jones 12988 Wellington Farms Ave. Cary NC 27512 29025 COMPARE 5 Robert Jones 12988 Wellington Farms Ave. Cary NC 27511 29025 DIF 5 ............... ........................................X. E 3278 COMPARE 6 Mary Cravens 211 N. Cypress St. Cary NC 27512 35362 3286 BASE 6 Hoa Nguyen 2818 Long St. Cary NC 27513 87734 COMPARE 7 Hoa Nguyen 2818 Long St. Cary NC 27513 89834 DIF 6 ............... .......................................... 2100 3888 BASE 7 Kim Siu 5662 Magnolia Blvd Southeast Cary NC 27513 77558 COMPARE 8 Kim Siu 5662 Magnolia Blvd Southwest Cary NC 27513 79958 DIF 7 ............... ........................XX................ 2400 6544 COMPARE 9 Roger Monday 3004 Crepe Myrtle Court Raleigh NC 27604 47007 9857 BASE 10 Kathy Krupski 1000 Taft Ave. Morrisville NC 27508 38756 COMPARE 12 Kathy Krupski 100 Taft Ave. Morrisville NC 27508 40456 DIF 10 ............... ...XXXXXXXXXXXXXX.XXXXX.XXXXXXXXXXX....... 1700
Procedure features:
PROC COMPARE statement options:
NOPRINT
OUTSTATS=
Data sets: PROCLIB.EMP95, PROCLIB.EMP96 on page 265
This example creates an output data set that contains summary statistics for the numeric variables that are compared.
Declare the PROCLIB SAS data library.
libname proclib ' SAS-data-library ';
Set the SAS system options. The NODATE option suppresses the display of the date and time in the output. PAGENO= specifies the starting page number. LINESIZE= specifies the output line length, and PAGESIZE= specifies the number of lines on an output page.
options nodate pageno=1 linesize=80 pagesize=40;
Sort the data sets by the ID variable. Both data sets must be sorted by the variable that will be used as the ID variable in the PROC COMPARE step. OUT= specifies the location of the sorted data.
proc sort data=proclib.emp95 out=emp95_byidnum; by idnum; run; proc sort data=proclib.emp96 out=emp96_byidnum; by idnum; run;
Create the output data set of statistics and compare observations that have matching values for the ID variable. BASE= and COMPARE= specify the data sets to compare. OUTSTATS= creates the output data set DIFFSTAT. NOPRINT suppresses the procedure output. The ID statement specifies IDNUM as the ID variable. PROC COMPARE uses the values of IDNUM to match observations.
proc compare base=emp95_byidnum compare=emp96_byidnum outstats=diffstat noprint; id idnum; run;
Print the output data set DIFFSTAT. PROC PRINT prints the output data set DIFFSTAT.
proc print data=diffstat noobs; title 'The DIFFSTAT Data Set'; run;
The variables are described in Output Statistics Data Set (OUTSTATS=) on page 253.
The DIFFSTAT Data Set 1 _VAR_ _TYPE_ _BASE_ _COMP_ _DIF_ _PCTDIF_ salary N 10.00 10.00 10.00 10.0000 salary MEAN 52359.00 53089.00 730.00 1.2374 salary STD 24143.84 24631.01 996.72 1.6826 salary MAX 92100.00 92100.00 2400.00 4.3864 salary MIN 29025.00 29025.00 0.00 0.0000 salary STDERR 7634.95 7789.01 315.19 0.5321 salary T 6.86 6.82 2.32 2.3255 salary PROBT 0.00 0.00 0.05 0.0451 salary NDIF 4.00 40.00 . . salary DIFMEANS 1.39 1.38 730.00 . salary R,RSQ 1.00 1.00 . .