Examples | SAS.STAT 9.1 Users Guide (Vol. 4)

The following statements generate five imputed data sets to be used in this section. The data set FitMiss was created in the section 'Getting Started' on page 2610. See 'The MI Procedure' chapter for details concerning the MI procedure.

  proc mi data=FitMiss seed=3237851 noprint out=outmi;   var Oxygen RunTime RunPulse;   run;

The Fish data described in the STEPDISC procedure are measurements of 159 fish of seven species caught in Finland's lake Laengelmavesi. For each fish, the length, height, and width are measured. Three different length measurements are recorded: from the nose of the fish to the beginning of its tail ( Length1 ), from the nose to the notch of its tail ( Length2 ), and from the nose to the end of its tail ( Length3 ). See Chapter 67, 'The STEPDISC Procedure,' for more information.

The Fish2 data set is constructed from the Fish data set and contains two species of fish. Some values have been set to missing and the resulting data set has a monotone missing pattern in variables Length3 , Height , Width , and Species . Note that some values of the variable Species have also been altered in the data set.

The following statements create the Fish2 data set. It contains the first two species of fish in the Fish data set.

  /*-------- Fishes of Species Bream and Parkki Pike --------*/   data Fish2 (drop=HtPct WidthPct);   title 'Fish Measurement Data';   input Species $ Length3 HtPct WidthPct @@;   Height= HtPct*Length3/100;   Width= WidthPct*Length3/100;   datalines;   Gp1  30.0 38.4 13.4   Gp1 31.2 40.0 13.8   Gp1  31.1 39.8 15.1   .  33.5 38.0   .      . 34.0 36.6 15.1   Gp1  34.7 39.2 14.2   Gp1  34.5 41.1 15.3   Gp1 35.0 36.2 13.4   Gp1  35.1 39.9 13.8   .  36.2 39.3 13.7   Gp1 36.2 39.4 14.1     .  36.2 39.7 13.3   Gp1  36.4 37.8 12.0     . 37.3 37.3 13.6   Gp1  37.2 40.2 13.9   Gp1  37.2 41.5 15.0   Gp1 38.3 38.8 13.8   Gp1  38.5 38.8 13.5   Gp1  38.6 40.5 13.3   Gp1 38.7 37.4 14.8   Gp1  39.5 38.3 14.1   Gp1  39.2 40.8 13.7     . 39.7 39.1   .    Gp1  40.6 38.1 15.1   Gp1  40.5 40.1 13.8   Gp1 40.9 40.0 14.8   Gp1  40.6 40.3 15.0   Gp1  41.5 39.8 14.1   Gp2 41.6 40.6 14.9   Gp1  42.6 44.5 15.5   Gp1  44.1 40.9 14.3   Gp1 44.0 41.1 14.3   Gp1  45.3 41.4 14.9   Gp1  45.9 40.6 14.7   Gp1 46.5 37.9 13.7   Gp2  16.2 25.6 14.0   Gp2 20.3 26.1 13.9   Gp2  21.2 26.3 13.7   Gp2  22.2 25.3 14.3   Gp2 22.2 28.0 16.1   Gp2  22.8 28.4 14.7   Gp2  23.1 26.7 14.7     . 23.7 25.8 13.9   Gp2  24.7 23.5 15.2   Gp1  24.3 27.3 14.6   Gp2 25.3 27.8 15.1   Gp2  25.0 26.2 13.3   Gp2  25.0 25.6 15.2   Gp2 27.2 27.7 14.1   Gp2  26.7 25.9 13.6   .  26.8 27.6 15.4   Gp2 27.9 25.4 14.0   Gp2  29.2 30.4 15.4   Gp2  30.6 28.0 15.6   Gp2 35.0 27.1 15.3   ;

The following statements generate five imputed data sets to be used in this section. The regression method is used to impute missing values in the variable Width and the discriminant function method is used to impute the variable Species .

  proc mi data=Fish2 seed=1305417 out=outfish;   class Species;   monotone reg (Width)   discrim(Species= Length3 Height Width);   var Length3 Height Width Species;   run;

Examples 1-6 use different input option combinations to combine parameter estimates computed from different procedures, Examples 7-8 combine parameter estimates with CLASS variables, Example 9 shows the use of a TEST statement, and Example 10 combines statistics that are not directly derived from procedures.

Example 45.1. Reading Means and Standard Errors from Variables in a DATA= Data Set

This example creates an ordinary SAS data set that contains sample means and standard errors computed from imputed data sets. These estimates are then combined to generate valid univariate inferences about the population means.

The following statements use the UNIVARIATE procedure to generate sample means and standard errors for the variables in each imputed data set.

  proc univariate data=outmi noprint;   var Oxygen RunTime RunPulse;   output out=outuni mean=Oxygen RunTime RunPulse   stderr=SOxygen SRunTime SRunPulse;   by _Imputation_;   run;

The following statements display the output data set from PROC UNIVARIATE in Output 45.1.1:

  proc print data=outuni;   title 'UNIVARIATE Means and Standard Errors';   run;

Output 45.1.1: UNIVARIATE Output Data Set

  UNIVARIATE Means and Standard Errors   Run                 SRun      SRun   Obs   _Imputation_    Oxygen   RunTime    Pulse    SOxygen     Time     Pulse   1          1        47.0120   10.4441   171.216   0.95984   0.28520   1.59910   2          2        47.2407   10.5040   171.244   0.93540   0.26661   1.75638   3          3        47.4995   10.5922   171.909   1.00766   0.26302   1.85795   4          4        47.1485   10.5279   171.146   0.95439   0.26405   1.75011   5          5        47.0042   10.4913   172.072   0.96528   0.27275   1.84807

The following statements combine the means and standard errors from imputed data sets, The EDF= option requests that the adjusted degrees of freedom be used in the analysis. For sample means based on 31 observations, the complete-data error degrees of freedom is 30.

  proc mianalyze data=outuni edf=30;   modeleffects Oxygen RunTime RunPulse;   stderr SOxygen SRunTime SRunPulse;   run;

Output 45.1.2: Multiple Imputation Variance Information

  The MIANALYZE Procedure   Model Information   Data Set                  WORK.OUTUNI   Number of Imputations     5   Multiple Imputation Variance Information   -----------------Variance-----------------   Parameter         Between         Within          Total       DF   Oxygen           0.041478       0.930853       0.980626   26.298   RunTime          0.002948       0.073142       0.076679   26.503   RunPulse         0.191086       3.114442       3.343744   25.463   Multiple Imputation Variance Information   Relative       Fraction   Increase        Missing       Relative   Parameter     in Variance    Information     Efficiency   Oxygen           0.053471       0.051977       0.989712   RunTime          0.048365       0.047147       0.990659   RunPulse         0.073626       0.070759       0.986046

The 'Model Information' table shown in Output 45.1.2 lists the input data set(s) and the number of imputations.

The 'Multiple Imputation Variance Information' table shown in Output 45.1.2 displays the between-imputation variance, within-imputation variance, and total variance for each univariate inference. It also displays the degrees of freedom for the total variance. The relative increase in variance due to missing values, the fraction of missing information, and the relative efficiency for each imputed variable are also displayed. A detailed description of these statistics is provided in the 'Combining Inferences from Imputed Data Sets' section on page 2624 and the 'Multiple Imputation Efficiency' section on page 2626.

The 'Multiple Imputation Parameter Estimates' table shown in Output 45.1.3 displays the estimated mean and corresponding standard error for each variable. The table also displays a 95% confidence interval for the mean and a t statistic with the associated p -value for testing the hypothesis that the mean is equal to the value specified. You can use the THETA0= option to specify the value for the null hypothesis, which is zero by default. The table also displays the minimum and maximum parameter estimates from the imputed data sets.

Note that the results in this example could also have been obtained with the MI procedure.

Example 45.2. Reading Means and Covariance Matrices from a DATA= COV Data Set

This example creates a COV type data set that contains sample means and covariance matrices computed from imputed data sets. These estimates are then combined to generate valid statistical inferences about the population means.

The following statements use the CORR procedure to generate sample means and a covariance matrix for the variables in each imputed data set.

  proc corr data=outmi cov nocorr noprint out=outcov(type=cov);   var Oxygen RunTime RunPulse;   by _Imputation_;   run;

The following statements display sample means and covariance matrices from the first two imputed data sets in Output 45.2.1.

  proc print data=outcov(obs=12);   title 'CORR Means and Covariance Matrices'   ' (First Two Imputations)';   run;

Output 45.2.1: COV Data Set

  CORR Means and Covariance Matrices (First Two Imputations)   Obs   _Imputation_    _TYPE_     _NAME_      Oxygen      RunTime    RunPulse   1         1          COV      Oxygen       28.5603     -7.2652     -11.812   2         1          COV      RunTime      -7.2652      2.5214       2.536   3         1          COV      RunPulse    -11.8121      2.5357      79.271   4         1          MEAN                  47.0120     10.4441     171.216   5         1          STD                    5.3442      1.5879       8.903   6         1          N                     31.0000     31.0000      31.000   7         2          COV      Oxygen       27.1240     -6.6761     -10.217   8         2          COV      RunTime      -6.6761      2.2035       2.611   9         2          COV      RunPulse    -10.2170      2.6114      95.631   10         2          MEAN                  47.2407     10.5040     171.244   11         2          STD                    5.2081      1.4844       9.779   12         2          N                     31.0000     31.0000      31.000

Note that the covariance matrices in the data set outcov are estimated covariance matrices of variables, V ( y ). The estimated covariance matrix of the sample means is V ( y ) = V ( y ) /n , where n is the sample size , and is not the same as an estimated covariance matrix for variables.

The following statements combine the results for the imputed data sets, and derive both univariate and multivariate inferences about the means. The EDF= option is specified to request that the adjusted degrees of freedom be used in the analysis. For sample means based on 31 observations, the complete-data error degrees of freedom is 30.

  proc mianalyze data=outcov edf=30 wcov bcov tcov mult;   modeleffects Oxygen RunTime RunPulse;   run;

The 'Multiple Imputation Variance Information' and 'Multiple Imputation Parameter Estimates' tables display the same results as in Output 45.1.2 and Output 45.1.3 in Example 45.1.

Output 45.1.3: Multiple Imputation Parameter Estimates

  The MIANALYZE Procedure   Multiple Imputation Parameter Estimates   Parameter        Estimate      Std Error    95% Confidence Limits        DF   Oxygen          47.180993       0.990266      45.1466      49.2154   26.298   RunTime         10.511906       0.276910       9.9432      11.0806   26.503   RunPulse       171.517500       1.828591     167.7549     175.2801   25.463   Multiple Imputation Parameter Estimates   Parameter         Minimum        Maximum   Oxygen          47.004201      47.499541   RunTime         10.444149      10.592244   RunPulse       171.146171     172.071730   Multiple Imputation Parameter Estimates   t for H0:   Parameter          Theta0   Parameter=Theta0   Pr > t   Oxygen                  0              47.64     <.0001   RunTime                 0              37.96     <.0001   RunPulse                0              93.80     <.0001

With the WCOV, BCOV, and TCOV options, the procedure displays the between-imputation covariance matrix, within-imputation covariance matrix, and total covariance matrix assuming that the between-imputation covariance matrix is proportional to the within-imputation covariance matrix in Output 45.2.2.

Output 45.2.2: Covariance Matrices

  The MIANALYZE Procedure   Within-Imputation Covariance Matrix   Oxygen           RunTime          RunPulse   Oxygen         0.930852655   0.226506411   0.461022083   RunTime   0.226506411       0.073141598       0.080316017   RunPulse   0.461022083       0.080316017       3.114441784   Between-Imputation Covariance Matrix   Oxygen           RunTime          RunPulse   Oxygen        0.0414778123      0.0099248946      0.0183701754   RunTime       0.0099248946      0.0029478891      0.0091684769   RunPulse      0.0183701754      0.0091684769      0.1910855259   Total Covariance Matrix   Oxygen           RunTime          RunPulse   Oxygen         1.202882661   0.292700068   0.595750001   RunTime   0.292700068       0.094516313       0.103787365   RunPulse   0.595750001       0.103787365       4.024598310

With the MULT option, the procedure assumes that the between-imputation covariance matrix is proportional to the within-imputation covariance matrix and displays a multivariate inference for all the parameters taken jointly.

The 'Multiple Imputation Multivariate Inference' table displayed in Output 45.2.3 shows a significant p -value for the null hypothesis that the population means are all equal to zero.

Output 45.2.3: Multiple Imputation Multivariate Inference

  The MIANALYZE Procedure   Multiple Imputation Multivariate Inference   Assuming Proportionality of Between/Within Covariance Matrices   Avg Relative   Increase                            F for H0:   in Variance   Num DF   Den DF   Parameter=Theta0     Pr > F   0.292237        3   122.68            12519.7     <.0001

Example 45.3. Reading Regression Results from a DATA= EST Data Set

This example creates an EST type data set that contains regression coefficients and their corresponding covariance matrices computed from imputed data sets. These estimates are then combined to generate valid statistical inferences about the regression model.

The following statements use the REG procedure to generate regression coefficients:

  proc reg data=outmi outest=outreg covout noprint;   model Oxygen= RunTime RunPulse;   by _Imputation_;   run;

The following statements display regression coefficients and their covariance matrices from the first two imputed data sets in Output 45.3.1.

  proc print data=outreg(obs=8);   var _Imputation_ _Type_ _Name_   Intercept RunTime RunPulse;   title 'REG Model Coefficients and Covariance matrices'   ' (First Two Imputations)';   run;

Output 45.3.1: EST Type Data Set

  REG Model Coefficients and Covariance matrices (First Two Imputations)   Obs    _Imputation_    _TYPE_     _NAME_      Intercept     RunTime    RunPulse   1          1         PARMS                     86.544   2.82231   0.05873   2          1         COV       Intercept      100.145   0.53519   0.55077   3          1         COV       RunTime   0.535    0.10774   0.00345   4          1         COV       RunPulse   0.551   0.00345     0.00343   5          2         PARMS                     83.021   3.00023   0.02491   6          2         COV       Intercept       79.032   0.66765   0.41918   7          2         COV       RunTime   0.668    0.11456   0.00313   8          2         COV       RunPulse   0.419   0.00313     0.00264

The following statements combine the results for the imputed data sets. The EDF= option is specified to request that the adjusted degrees of freedom be used in the analysis. For a regression model with three independent variables (including the Intercept) and 31 observations, the complete-data error degrees of freedom is 28.

  proc mianalyze data=outreg edf=28;   modeleffects Intercept RunTime RunPulse;   run;

Output 45.3.2: Multiple Imputation Variance Information

  The MIANALYZE Procedure   Multiple Imputation Variance Information   -----------------Variance----------------   Parameter         Between         Within          Total       DF   Intercept       45.529229      76.543614     131.178689   9.1917   RunTime          0.019390       0.106220       0.129487   18.311   RunPulse         0.001007       0.002537       0.003746   12.137   Multiple Imputation Variance Information   Relative       Fraction   Increase        Missing       Relative   Parameter     in Variance    Information     Efficiency   Intercept        0.713777       0.461277       0.915537   RunTime          0.219051       0.192620       0.962905   RunPulse         0.476384       0.355376       0.933641

The 'Multiple Imputation Variance Information' table shown in Output 45.3.2 displays the between-imputation, within-imputation, and total variances for combining complete-data inferences.

Output 45.3.3: Multiple Imputation Parameter Estimates

  The MIANALYZE Procedure   Multiple Imputation Parameter Estimates   Parameter        Estimate      Std Error    95% Confidence Limits        DF   Intercept       90.837440      11.453327     65.01034     116.6645   9.1917   RunTime   3.032870       0.359844   3.78795   2.2778   18.311   RunPulse   0.068578       0.061204   0.20176       0.0646   12.137   Multiple Imputation Parameter Estimates   Parameter         Minimum        Maximum   Intercept       83.020730     100.839807   RunTime   3.204426   2.822311   RunPulse   0.112840   0.024910   Multiple Imputation Parameter Estimates   t for H0:   Parameter          Theta0   Parameter=Theta0   Pr > t   Intercept               0               7.93     <.0001   RunTime                 0              -8.43     <.0001   RunPulse                0              -1.12     0.2842

The 'Multiple Imputation Parameter Estimates' table shown in Output 45.3.3 displays the estimated mean and standard error of the regression coefficients. The inferences are based on the t distribution. The table also displays a 95% mean confidence interval and a t test with the associated p -value for the hypothesis that the regression coefficient is equal to zero. Since the p -value for RunPulse is 0.1597, this variable can be removed from the regression model.

Example 45.4. Reading Mixed Model Results from PARMS= and COVB= Data Sets

This example creates data sets containing parameter estimates and covariance matrices computed by a mixed model analysis for a set of imputed data sets. These estimates are then combined to generate valid statistical inferences about the parameters.

The following PROC MIXED statements generate the fixed-effect parameter estimates and covariance matrix for each imputed data set:

  proc mixed data=outmi;   model Oxygen= RunTime RunPulse RunTime*RunPulse/solution covb;   by _Imputation_;   ods output SolutionF=mixparms CovB=mixcovb;   run;

The following statements display parameter estimates from the first two imputed data sets in Output 45.4.1.

  proc print data=mixparms (obs=8);   var _Imputation_ Effect Estimate StdErr;   title 'MIXED Model Coefficients (First Two Imputations)';   run;

Output 45.4.1: PROC MIXED Model Coefficients

  MIXED Model Coefficients (First Two Imputations)   Obs    _Imputation_    Effect              Estimate      StdErr   1          1         Intercept             148.09     81.5231   2          1         RunTime   8.8115      7.8794   3          1         RunPulse   0.4123      0.4684   4          1         RunTime*RunPulse     0.03437     0.04517   5          2         Intercept            64.3607     64.6034   6          2         RunTime   1.1270      6.4307   7          2         RunPulse             0.08160      0.3688   8          2         RunTime*RunPulse   0.01069     0.03664

The following statements display the covariance matrices associated with the parameter estimates from the first two imputed data sets in Output 45.4.2. Note that the variables Col1 , Col2 , Col3 , and Col4 are used to identify the effects Intercept , RunTime , RunPulse , and RunTime*RunPulse through the variable Row .

  proc print data=mixcovb (obs=8);   var _Imputation_ Row Effect Col1 Col2 Col3 Col4;   title 'Covariance Matrices (First Two Imputations)';   run;

Output 45.4.2: PROC MIXED Covariance Matrices

  Covariance Matrices (First Two Imputations)   Obs _Imputation_ Row Effect                Col1     Col2     Col3     Col4   1       1         1 Intercept         6646.01   637.40   38.1515   3.6542   2       1         2 RunTime   637.40  62.0842   3.6548   0.3556   3       1         3 RunPulse   38.1515   3.6548   0.2194   0.02099   4       1         4 RunTime*RunPulse   3.6542   0.3556   0.02099 0.002040   5       2         1 Intercept         4173.59   411.46   23.7889   2.3441   6       2         2 RunTime   411.46  41.3545   2.3414   0.2353   7       2         3 RunPulse   23.7889   2.3414   0.1360   0.01338   8       2         4 RunTime*RunPulse   2.3441   0.2353   0.01338 0.001343

For univariate inference, only parameter estimates and their associated standard errors are needed. The following statements use the MIANALYZE procedure with the input PARMS= data set to produce univariate results.

  proc mianalyze parms=mixparms edf=28;   modeleffects Intercept RunTime RunPulse RunTime*RunPulse;   run;

Output 45.4.3: Multiple Imputation Variance Information

  The MIANALYZE Procedure   Multiple Imputation Variance Information   -----------------Variance----------------   Parameter                Between         Within          Total       DF   Intercept            1972.654530    4771.948777    7139.134213    11.82   RunTime                14.712602      45.549686      63.204808   13.797   RunPulse                0.062941       0.156717       0.232247   12.046   RunTime*RunPulse        0.000470       0.001490       0.002055   13.983   Multiple Imputation Variance Information   Relative       Fraction   Increase        Missing       Relative   Parameter            in Variance    Information     Efficiency   Intercept               0.496063       0.365524       0.931875   RunTime                 0.387601       0.305893       0.942348   RunPulse                0.481948       0.358274       0.933136   RunTime*RunPulse        0.378863       0.300674       0.943276

The 'Multiple Imputation Variance Information' table shown in Output 45.4.3 displays the between-imputation, within-imputation, and total variances for combining complete-data inferences.

Output 45.4.4: Multiple Imputation Parameter Estimates

  The MIANALYZE Procedure   Multiple Imputation Parameter Estimates   Parameter             Estimate      Std Error    95% Confidence Limits        DF   Intercept           136.071356      84.493397   48.3352     320.4779    11.82   RunTime   7.457186       7.950145   24.5322       9.6178   13.797   RunPulse   0.328104       0.481920   1.3777       0.7215   12.046   RunTime*RunPulse      0.025364       0.045328   0.0719       0.1226   13.983   Multiple Imputation Parameter Estimates   Parameter              Minimum        Maximum   Intercept            64.360719     186.549814   RunTime   11.514341   1.127010   RunPulse   0.602162      0.081597   RunTime*RunPulse   0.010690      0.047429   Multiple Imputation Parameter Estimates   t for H0:   Parameter               Theta0   Parameter=Theta0   Pr > t   Intercept                    0               1.61     0.1337   RunTime                      0   0.94    0.3644   RunPulse                     0   0.68    0.5089   RunTime*RunPulse             0               0.56    0.5846

The 'Multiple Imputation Parameter Estimates' table shown in Output 45.4.4 displays the estimated mean and standard error of the regression coefficients.

Since each covariance matrix contains variables Row , Col1 , Col2 , Col3 , and Col4 for parameters, the EFFECTVAR=ROWCOL option is needed when specifying the COVB= option. The following statements illustrate the use of the MIANALYZE procedure with input PARMS= and COVB(EFFECTVAR=ROWCOL)= data sets:

  proc mianalyze parms=mixparms edf=28   covb(effectvar=rowcol)=mixcovb;   modeleffects Intercept RunTime RunPulse RunTime*RunPulse;   run;

Example 45.5. Reading Generalized Linear Model Results from PARMS=, PARMINFO=, and COVB= Data Sets

This example creates data sets containing parameter estimates and corresponding covariance matrices computed by a generalized linear model analysis for a set of imputed data sets. These estimates are then combined to generate valid statistical inferences about the model parameters.

The following statements use PROC GENMOD to generate the parameter estimates and covariance matrix for each imputed data set:

  proc genmod data=outmi;   model Oxygen= RunTime RunPulse/covb;   by _Imputation_;   ods output ParameterEstimates=gmparms   ParmInfo=gmpinfo   CovB=gmcovb;   run;

The following statements print parameter estimates and covariance matrix from the first two imputed data sets in Output 45.5.1.

  proc print data=gmparms (obs=8);   var _Imputation_ Parameter Estimate StdErr;   title 'GENMOD Model Coefficients (First Two Imputations)';   run;

Output 45.5.1: PROC GENMOD Model Coefficients

  GENMOD Model Coefficients (First Two Imputations)   Obs    _Imputation_    Parameter    Estimate      StdErr   1          1         Intercept     86.5440      9.5107   2          1         RunTime   2.8223      0.3120   3          1         RunPulse   0.0587      0.0556   4          1         Scale          2.6692      0.3390   5          2         Intercept     83.0207      8.4489   6          2         RunTime   3.0002      0.3217   7          2         RunPulse   0.0249      0.0488   8          2         Scale          2.5727      0.3267

The following statements display the parameter information table in Output 45.5.2. The table identifies parameter names used in the covariance matrices. The parameters Prm1 , Prm2 , and Prm3 are used for effects Intercept , RunTime , and RunPulse in each covariance matrix.

  proc print data=gmpinfo (obs=6);   title 'GENMOD Parameter Information (First Two Imputations)';   run;

Output 45.5.2: PROC GENMOD Model Information

  GENMOD Parameter Information (First Two Imputations)   Obs    _Imputation_    Parameter     Effect   1          1           Prm1       Intercept   2          1           Prm2       RunTime   3          1           Prm3       RunPulse   4          2           Prm1       Intercept   5          2           Prm2       RunTime   6          2           Prm3       RunPulse

The following statements display the covariance matrices from the first two imputed data sets in Output 45.5.3. Note that the GENMOD procedure computes maximum likelihood estimates for each covariance matrix.

  proc print data=gmcovb (obs=8);   var _Imputation_ RowName Prm1 Prm2 Prm3;   title 'GENMOD Covariance Matrices (First Two Imputations)';   run;

Output 45.5.3: PROC GENMOD Covariance Matrices

  GENMOD Covariance Matrices (First Two Imputations)   Row   Obs    _Imputation_    Name          Prm1         Prm2         Prm3   1          1         Prm1     90.453923   0.483394   0.497473   2          1         Prm2   0.483394    0.0973159   0.003113   3          1         Prm3   0.497473   0.003113    0.0030954   4          1         Scale    2.765E-17   3.05E-17    2.759E-18   5          2         Prm1     71.383332   0.603037   0.378616   6          2         Prm2   0.603037    0.1034766   0.002826   7          2         Prm3   0.378616   0.002826    0.0023843   8          2         Scale    1.132E-14    2.181E-16   7.62E-17

The following statements use the MIANALYZE procedure with input PARMS=, PARMINFO=, and COVB= data sets:

  proc mianalyze parms=gmparms covb=gmcovb parminfo=gmpinfo;   modeleffects Intercept RunTime RunPulse;   run;

Since the GENMOD procedure computes maximum likelihood estimates for the covariance matrix, the EDF= option is not used. The resulting model coefficients are identical to the estimates in Example 45.3 in Output 45.3.3 but the standard errors are slightly different because in this example, maximum likelihood estimates for the standard errors are combined without the EDF= option, whereas in Example 45.3, unbiased estimates for the standard errors are combined with the EDF= option.

Example 45.6. Reading GLM Results from PARMS= and XPXI= Data Sets

This example creates data sets containing parameter estimates and corresponding ( X ² X ) ^{ˆ’ 1} matrices computed by a general linear model analysis for a set of imputed data sets. These estimates are then combined to generate valid statistical inferences about the model parameters.

The following statements use PROC GLM to generate the parameter estimates and ( X ² X ) ^{ˆ’ 1} matrix for each imputed data set:

  proc glm data=outmi;   model Oxygen= RunTime RunPulse/inverse;   by _Imputation_;   ods output ParameterEstimates=glmparms   InvXPX=glmxpxi;   quit;

The following statements display parameter estimates and standard errors from imputed data sets in Output 45.6.1.

  proc print data=glmparms (obs=6);   var _Imputation_ Parameter Estimate StdErr;   title 'GLM Model Coefficients (First Two Imputations)';   run;

Output 45.6.1: PROC GLM Model Coefficients

  GLM Model Coefficients (First Two Imputations)   Obs    _Imputation_    Parameter        Estimate          StdErr   1          1         Intercept      86.5440339     10.00726811   2          1         RunTime   2.8223108      0.32824165   3          1         RunPulse   0.0587292      0.05854109   4          2         Intercept      83.0207303      8.88996885   5          2         RunTime   3.0002288      0.33847204   6          2         RunPulse   0.0249103      0.05137859

The following statements display ( X ² X ) ^{ˆ’ 1} matrices from imputed data sets in Output 45.6.2.

  proc print data=glmxpxi (obs=8);   var _Imputation_ Parameter Intercept RunTime RunPulse;   title 'GLM X''X Inverse Matrices (First Two Imputations)';   run;

Output 45.6.2: PROC GLM (X ² X) ^{ˆ’ 1} Matrices

  GLM X'X Inverse Matrices (First Two Imputations)   Obs    _Imputation_    Parameter       Intercept         RunTime        RunPulse   1          1         Intercept    12.696250656   0.067849956   0.069826009   2          1         RunTime   0.067849956    0.0136594055   0.000436938   3          1         RunPulse   0.069826009   0.000436938    0.0004344762   4          1         Oxygen       86.544033929   2.822310769   0.058729234   5          2         Intercept    10.784620785   0.091107072   0.057201387   6          2         RunTime   0.091107072    0.0156332765   0.000426902   7          2         RunPulse   0.057201387   0.000426902    0.0003602208   8          2         Oxygen       83.020730343   3.000228818   0.024910305

The standard errors for the estimates in the output glmparms data set are needed to create the covariance matrix from the ( X ² X ) ^{ˆ’ 1} matrix. The following statements use the MIANALYZE procedure with input PARMS= and XPXI= data sets to produce the same results as displayed in Example 45.3 in Output 45.3.2 and Output 45.3.3:

  proc mianalyze parms=glmparms xpxi=glmxpxi edf=28;   modeleffects Intercept RunTime RunPulse;   run;

Example 45.7. Reading Logistic Model Results from PARMS= and COVB= Data Sets

This example creates data sets containing parameter estimates and corresponding covariance matrices computed by a logistic regression analysis for a set of imputed data sets. These estimates are then combined to generate valid statistical inferences about the model parameters.

The following statements use PROC LOGISTIC to generate the parameter estimates and covariance matrix for each imputed data set.

  proc logistic data=outfish;   class Species;   model Species= Height Width Height*Width/ covb;   by _Imputation_;   ods output ParameterEstimates=lgsparms   CovB=lgscovb;   run;

The following statements displays the logistic regression coefficients from the first two imputations in Output 45.7.1.

  proc print data=lgsparms (obs=8);   title 'LOGISTIC Model Coefficients (First Two Imputations)';   run;

Output 45.7.1: PROC LOGISTIC Model Coefficients

  LOGISTIC Model Coefficients (First Two Imputations)   Prob   Obs _Imputation_ Variable        DF  Estimate    StdErr   WaldChiSq   ChiSq   1       1      Intercept        1   4.2188    7.8679      0.2875  0.5918   2       1      Height           1    2.4568    1.0579      5.3929  0.0202   3       1      Width            1   3.3480    2.8541      1.3761  0.2408   4       1      Height*Width     1   0.1331    0.1441      0.8527  0.3558   5       2      Intercept        1   10.9235    9.1880      1.4135  0.2345   6       2      Height           1    3.1578    1.5208      4.3116  0.0379   7       2      Width            1   1.7683    2.9749      0.3533  0.5522   8       2      Height*Width     1   0.2714    0.1892      2.0575  0.1515

The following statements displays the covariance matrices associated with parameter estimates from the first two imputations in Output 45.7.2.

  proc print data=lgscovb (obs=8);   title 'LOGISTIC Model Covariance Matrices (First Two Imputations)';   run;

Output 45.7.2: PROC LOGISTIC Covariance Matrices

  LOGISTIC Model Covariance Matrices (First Two Imputations)   Height   Obs   _Imputation_   Parameter     Intercept     Height      Width      Width   1         1        Intercept     61.90439   2.39611   18.8182   0.923732   2         1        Height   2.39611    1.119218   0.76837   0.11322   3         1        Width   18.8182   0.76837   8.145619   0.18386   4         1        HeightWidth   0.923732   0.11322   0.18386   0.020762   5         2        Intercept     84.41847   5.94636   20.9352   1.389396   6         2        Height   5.94636    2.312748   1.08263   0.24839   7         2        Width   20.9352   1.08263   8.849757   0.1547   8         2        HeightWidth   1.389396   0.24839   0.1547   0.035796

The following statements use the MIANALYZE procedure with input PARMS= and COVB= data sets.

  proc mianalyze parms=lgsparms   covb(effectvar=stacking)=lgscovb;   modeleffects Intercept Height Width Height*Width;   run;

Output 45.7.3: Multiple Imputation Variance Information

  The MIANALYZE Procedure   Multiple Imputation Variance Information   -----------------Variance----------------   Parameter            Between         Within          Total       DF   Intercept          15.218807      70.592292      88.854861   94.689   Height              0.181361       1.626663       1.844296   287.26   Width               0.804258       8.428402       9.393511   378.93   Height*Width        0.006765       0.026888       0.035006    74.37   Multiple Imputation Variance Information   Relative       Fraction   Increase        Missing       Relative   Parameter        in Variance    Information     Efficiency   Intercept           0.258705       0.221798       0.957525   Height              0.133791       0.124081       0.975785   Width               0.114507       0.107441       0.978964   Height*Width        0.301942       0.251772       0.952060

The 'Multiple Imputation Variance Information' table shown in Output 45.7.3 displays the between-imputation, within-imputation, and total variances for combining complete-data inferences.

Output 45.7.4: Multiple Imputation Parameter Estimates

  The MIANALYZE Procedure   Multiple Imputation Parameter Estimates   Parameter           Estimate      Std Error    95% Confidence Limits        DF   Intercept   7.085702       9.426286   25.8000     11.62863   94.689   Height              2.757779       1.358049       0.0848      5.43077   287.26   Width   2.678006       3.064884   8.7043      3.34830   378.93   Height*Width   0.191947       0.187099   0.5647      0.18083    74.37   Multiple Imputation Parameter Estimates   Parameter            Minimum        Maximum   Intercept   11.769173   4.203658   Height              2.439954       3.285454   Width   3.349258   1.626538   Height*Width   0.291998   0.131535   Multiple Imputation Parameter Estimates   t for H0:   Parameter             Theta0   Parameter=Theta0   Pr > t   Intercept                  0   0.75     0.4541   Height                     0               2.03     0.0432   Width                      0   0.87     0.3828   Height*Width               0   1.03     0.308  3

The 'Multiple Imputation Parameter Estimates' table shown in Output 45.7.4 displays the combined parameter estimates with associated standard errors.

Example 45.8. Reading Mixed Model Results with CLASS Variables

This example creates data sets containing parameter estimates and corresponding covariance matrices with CLASS variables computed by a mixed regression model analysis for a set of imputed data sets. These estimates are then combined to generate valid statistical inferences about the model parameters.

The following statements use PROC MIXED to generate the parameter estimates and covariance matrix for each imputed data set:

  proc mixed data=outfish;   class Species;   model Length3= Species Height Width/ solution covb;   by _Imputation_;   ods output SolutionF=mxparms CovB=mxcovb;   run;

The following statements displays the mixed model coefficients from the first two imputations in Output 45.8.1.

  proc print data=mxparms (obs=10);   var _Imputation_ Effect Species Estimate StdErr;   title 'MIXED Model Coefficients (First Two Imputations)';   run;

Output 45.8.1: PROC MIXED Model Coefficients

  MIXED Model Coefficients (First Two Imputations)   Obs   _Imputation_    Effect       Species    Estimate      StdErr   1          1         Intercept                 6.8381      1.0290   2          1         Species        Gp1   0.05924      0.7253   3          1         Species        Gp2             0           .   4          1         Height                    0.9185      0.1732   5          1         Width                     3.2526      0.5321   6          2         Intercept                 6.9417      0.9868   7          2         Species        Gp1   0.3178      0.7290   8          2         Species        Gp2             0           .   9          2         Height                    0.9544      0.1683   10          2         Width                     3.1697      0.5079

The following statements use the MIANALYZE procedure with input PARMS= data set.

  proc mianalyze parms(classvar=full)=mxparms;   class Species;   modeleffects Intercept Species Height Width;   run;

Output 45.8.2: Multiple Imputation Variance Information

  The MIANALYZE Procedure   Multiple Imputation Variance Information   -----------------Variance-----------------   Parameter    Species         Between         Within          Total       DF   Intercept                   0.013257       1.017462       1.033370    16879   Species      Gp1            0.068045       0.519627       0.601281    216.9   Species      Gp2                   0              .              .        .   Height                      0.002691       0.028993       0.032222   398.26   Width                       0.014947       0.270396       0.288332   1033.6   Multiple Imputation Variance Information   Relative       Fraction   Increase        Missing       Relative   Parameter    Species     in Variance    Information     Efficiency   Intercept                   0.015635       0.015511       0.996907   Species      Gp1            0.157139       0.143659       0.972071   Species      Gp2                   .              .              .   Height                      0.111380       0.104703       0.979489   Width                       0.066334       0.064017       0.987358

The 'Multiple Imputation Variance Information' table shown in Output 45.8.2 displays the between-imputation, within-imputation, and total variances for combining complete-data inferences.

Output 45.8.3: Multiple Imputation Parameter Estimates

  The MIANALYZE Procedure   Multiple Imputation Parameter Estimates   Parameter Species       Estimate      Std Error   95% Confidence Limits       DF   Intercept               6.844098       1.016548     4.85156    8.836638    16879   Species   Gp1   0.184298       0.775423   1.71263    1.344030    216.9   Species   Gp2                  0              .      .          .              .   Height                  0.928624       0.179506     0.57573    1.281522   398.26   Width                   3.237105       0.536966     2.18344    4.290772   1033.6   Multiple Imputation Parameter Estimates   Parameter Species        Minimum        Maximum   Intercept               6.713049       6.976758   Species   Gp1   0.580012       0.033160   Species   Gp2                  0              0   Height                  0.879314       1.004623   Width                   3.064954       3.360809   Multiple Imputation Parameter Estimates   t for H0:   Parameter Species         Theta0   Parameter=Theta0   Pr > t   Intercept                      0               6.73     <.0001   Species   Gp1                  0   0.24     0.8124   Species   Gp2                  0                .        .   Height                         0               5.17     <.0001   Width                          0               6.03     <.0001

The 'Multiple Imputation Parameter Estimates' table shown in Output 45.8.3 displays the combined parameter estimates with associated standard errors.

Example 45.9. Using a TEST statement

The following statements use the REG procedure to generate regression coefficients:

  proc reg data=outmi outest=outreg covout noprint;   model Oxygen= RunTime RunPulse;   by _Imputation_;   run;

The following statements combine the results for the imputed data sets. A TEST statement is used to test linear hypotheses of Intercept=0 and RunTime=RunPulse.

  proc mianalyze data=outreg edf=28;   modeleffects Intercept RunTime RunPulse;   test Intercept, RunTime=RunPulse / mult;   run;

Output 45.9.1: Test Specification

  The MIANALYZE Procedure   Test: Test 1   Test Specification   ------------------L Matrix------------------   Parameter       Intercept         RunTime        RunPulse               C   TestPrm1         1.000000               0               0               0   TestPrm2                0        1.000000       -1.000000               0

The 'Test Specification' table shown in Output 45.9.1 displays the L matrix and the c vector in a TEST statement. Since there is no label specified for the TEST statement, 'Test 1' is used as the label.

Output 45.9.2: Multiple Imputation Variance Information

  The MIANALYZE Procedure   Test: Test 1   Multiple Imputation Variance Information   -----------------Variance-----------------   Parameter         Between         Within          Total       DF   TestPrm1        45.529229      76.543614     131.178689   9.1917   TestPrm2         0.014715       0.114324       0.131983   20.598   Multiple Imputation Variance Information   Relative       Fraction   Increase        Missing       Relative   Parameter     in Variance    Information     Efficiency   TestPrm1         0.713777       0.461277       0.915537   TestPrm2         0.154459       0.141444       0.972490

The 'Multiple Imputation Variance Information' table shown in Output 45.9.2 displays the between-imputation variance, within-imputation variance, and total variance for each univariate inference. A detailed description of these statistics is provided in the 'Combining Inferences from Imputed Data Sets' section on page 2624 and the 'Multiple Imputation Efficiency' section on page 2626.

Output 45.9.3: Multiple Imputation Parameter Estimates

  The MIANALYZE Procedure   Test: Test 1   Multiple Imputation Parameter Estimates   Parameter        Estimate      Std Error    95% Confidence Limits        DF   TestPrm1        90.837440      11.453327     65.01034     116.6645   9.1917   TestPrm2   2.964292       0.363294   3.72070   2.2079   20.598   Multiple Imputation Parameter Estimates   t for H0:   Parameter         Minimum        Maximum              C   Parameter=C   Pr > t   TestPrm1        83.020730     100.839807              0          7.93     <.0001   TestPrm2   3.091586   2.763582              0   8.16     <.0001

The 'Multiple Imputation Parameter Estimates' table shown in Output 45.9.3 displays the estimated mean and standard error of the linear components . The inferences are based on the t distribution. The table also displays a 95% mean confidence interval and a t test with the associated p -value for the hypothesis that each linear component of L ² is equal to zero.

Output 45.9.4: Multiple Imputation Multivariate Inference

  The MIANALYZE Procedure   Test: Test 1   Multiple Imputation Multivariate Inference   Assuming Proportionality of Between/Within Covariance Matrices   Avg Relative   Increase                            F for H0:   in Variance   Num DF   Den DF   Parameter=Theta0     Pr > F   0.419868        2   35.053              60.34     <.0001

Example 45.10. Combining Correlation Coefficients

This example combines sample correlation coefficients computed from a set of imputed data sets using Fisher's z transformation.

Fisher's z transformation of the sample correlation r is

The statistic z is approximately normally distributed with mean

and variance 1 / ( n ˆ’ 3), where is the population correlation coefficient and n is the number of observations.

The following statements use the CORR procedure to compute the correlation r and its associated Fisher's z statistic between variables Oxygen and RunTime for each imputed data set. The ODS statement is used to save Fisher's z statistic in an output data set.

  proc corr data=outmi fisher(biasadj=no);   var Oxygen RunTime;   by _Imputation_;   ods output FisherPearsonCorr= outz;   run;

The following statements display the number of observations and Fisher's z statistic for each imputed data set in Output 45.10.1.

  proc print data=outz;   title 'Fisher''s Correlation Statistics';   var _Imputation_ NObs ZVal;   run;

Output 45.10.1: Output z Statistics

  Fisher's Correlation Statistics   Obs    _Imputation_        NObs          ZVal   1           1               31   1.27869   2           2               31   1.30715   3           3               31   1.27922   4           4               31   1.39243   5           5               31   1.40146

The following statements generate the standard error associated with the z statistic,

  data outz;   set outz;   StdZ= 1. / sqrt(NObs-3);   run;

The following statements use the MIANALYZE procedure to generate a combined parameter estimate and its variance, as shown in Output 45.10.2. The ODS statement is used to save the parameter estimates in an output data set.

  proc mianalyze data=outz;   ods output ParameterEstimates=parms;   modeleffects ZVal;   stderr StdZ;   run;

Output 45.10.2: Combining Fisher's z statistics

  The MIANALYZE Procedure   Multiple Imputation Parameter Estimates   Parameter        Estimate      Std Error    95% Confidence Limits        DF   ZVal   1.331787       0.200327   1.72587   0.93771   330.23   Multiple Imputation Parameter Estimates   Parameter         Minimum        Maximum   ZVal   1.401459   1.278686   Multiple Imputation Parameter Estimates   t for H0:   Parameter          Theta0   Parameter=Theta0   Pr > t   ZVal                    0   6.65     <.0001

In addition to the estimate for z , PROC MIANALYZE also generates 95% confidence limits for z , _. ₀₂₅ and _. ₉₇₅ . The following statements print the estimate and 95% confidence limits for z in Output 45.10.3.

  proc print data=parms;   title 'Parameter Estimates with 95% Confidence Limits';   var Estimate LCLMean UCLMean;   run;

Output 45.10.3: Parameter Estimates with 95% Confidence Limits

  Parameter Estimates with 95% Confidence Limits   Obs        Estimate     LCLMean     UCLMean   1   1.331787   1.72587   0.93771

An estimate of the correlation coefficient and 95% confidence limits are then generated from the following inverse transformation as described in the 'Correlation Coefficients' section on page 2630

for z = _0.25 , ,and ₉₇₅ .

The following statements generate and display an estimate of the correlation coefficient and its 95% confidence limits.

  data corr_ci;   set parms;   r= tanh(Estimate);   r_lower= tanh(LCLMean);   r_upper= tanh(UCLMean);   run;   proc print data=corr_ci;   title 'Estimated Correlation Coefficient'   ' with 95% Confidence Limits';   var r r_lower r_upper;   run;

Output 45.10.4 : Estimated Correlation Coefficient

  Estimated Correlation Coefficient with 95% Confidence Limits   Obs        r        r_lower     r_upper   1   0.86969   0.93857   0.73417