Examples


The Fish data described in the STEPDISC procedure are measurements of 159 fish of seven species caught in Finland's lake Laengelmavesi. For each fish, the length, height, and width are measured. Three different length measurements are recorded: from the nose of the fish to the beginning of its tail ( Length1 ), from the nose to the notch of its tail ( Length2 ), and from the nose to the end of its tail ( Length3 ). See Chapter 67, 'The STEPDISC Procedure,' for more information.

The Fish1 data set is constructed from the Fish data set and contains only one species of the fish and the three length measurements. Some values have been set to missing and the resulting data set has a monotone missing pattern in variables Length1 Length2 ,and Length3 .The Fish1 data set is used in Example 44.2 with the propensity score method and in Example 44.3 with the regression method.

The Fish2 data set is also constructed from the Fish data set and contains two species of fish. Some values have been set to missing and the resulting data set has a monotone missing pattern in variables Length3 , Height , Width ,and Species .The Fish2 data set is used in Example 44.4 with the logistic regression method and in Example 44.5 with the discriminant function method. Note that some values of the variable Species have also been altered in the data set.

The FitMiss data set created in the 'Getting Started' section is used in other examples. The following statements create the Fish1 data set.

  /*----------- Fishes of Species Bream ---------- */   data Fish1;   title 'Fish Measurement Data';   input Length1 Length2 Length3 @@;   datalines;   23.2 25.4 30.0    24.0 26.3 31.2    23.9 26.5 31.1   26.3 29.0 33.5    26.5 29.0   .     26.8 29.7 34.7   26.8   .    .     27.6 30.0 35.0    27.6 30.0 35.1   28.5 30.7 36.2    28.4 31.0 36.2    28.7   .    .   29.1 31.5   .     29.5 32.0 37.3    29.4 32.0 37.2   29.4 32.0 37.2    30.4 33.0 38.3    30.4 33.0 38.5   30.9 33.5 38.6    31.0 33.5 38.7    31.3 34.0 39.5   31.4 34.0 39.2    31.5 34.5   .     31.8 35.0 40.6   31.9 35.0 40.5    31.8 35.0 40.9    32.0 35.0 40.6   32.7 36.0 41.5    32.8 36.0 41.6    33.5 37.0 42.6   35.0 38.5 44.1    35.0 38.5 44.0    36.2 39.5 45.3   37.4 41.0 45.9    38.0 41.0 46.5   ;  

The Fish2 data set contains two of the seven species in the Fish data set. For each of the two species ( Bream and Parkki ), the length from the nose of the fish to the end of its tail, the height, and the width of each fish are measured. The height and width are recorded as percentages of the length variable.

The following statements create the Fish2 data set.

  /*-------- Fishes of Species Bream and Parkki Pike --------*/   data Fish2 (drop=HtPct WidthPct);   title 'Fish Measurement Data';   input Species $ Length3 HtPct WidthPct @@;   Height= HtPct*Length3/100;   Width= WidthPct*Length3/100;   datalines;   Gp1  30.0 38.4 13.4   Gp1 31.2 40.0 13.8   Gp1  31.1 39.8 15.1   . 33.5 38.0   .       . 34.0 36.6 15.1   Gp1  34.7 39.2 14.2   Gp1  34.5 41.1 15.3   Gp1 35.0 36.2 13.4   Gp1  35.1 39.9 13.8   .  36.2 39.3 13.7   Gp1 36.2 39.4 14.1     .  36.2 39.7 13.3   Gp1  36.4 37.8 12.0     . 37.3 37.3 13.6   Gp1  37.2 40.2 13.9   Gp1  37.2 41.5 15.0   Gp1 38.3 38.8 13.8   Gp1  38.5 38.8 13.5   Gp1  38.6 40.5 13.3   Gp1 38.7 37.4 14.8   Gp1  39.5 38.3 14.1   Gp1  39.2 40.8 13.7     . 39.7 39.1   .    Gp1  40.6 38.1 15.1   Gp1  40.5 40.1 13.8   Gp1 40.9 40.0 14.8   Gp1  40.6 40.3 15.0   Gp1  41.5 39.8 14.1   Gp2 41.6 40.6 14.9   Gp1  42.6 44.5 15.5   Gp1  44.1 40.9 14.3   Gp1 44.0 41.1 14.3   Gp1  45.3 41.4 14.9   Gp1  45.9 40.6 14.7   Gp1  46.5 37.9 13.7   Gp2  16.2 25.6 14.0   Gp2 20.3 26.1 13.9   Gp2  21.2 26.3 13.7   Gp2  22.2 25.3 14.3   Gp2 22.2 28.0 16.1   Gp2  22.8 28.4 14.7   Gp2  23.1 26.7 14.7     . 23.7 25.8 13.9   Gp2  24.7 23.5 15.2   Gp1  24.3 27.3 14.6   Gp2 25.3 27.8 15.1   Gp2  25.0 26.2 13.3   Gp2  25.0 25.6 15.2   Gp2 27.2 27.7 14.1   Gp2  26.7 25.9 13.6   .  26.8 27.6 15.4   Gp2 27.9 25.4 14.0   Gp2  29.2 30.4 15.4   Gp2  30.6 28.0 15.6   Gp2  35.0 27.1 15.3   ;  

Example 44.1. EM Algorithm for MLE

This example uses the EM algorithm to compute the maximum likelihood estimates for parameters of a multivariate normal distribution using data with missing values. The following statements invoke the MI procedure and request the EM algorithm to compute the MLE for ( ¼ , & pound ; ) of a multivariate normal distribution from the input data set FitMiss .

  proc mi data=FitMiss seed=1518971 simple nimpute=0;   em itprint outem=outem;   var Oxygen RunTime RunPulse;   run;  

Note that when you specify the NIMPUTE=0 option, the missing values will not be imputed. The procedure generates the following output:

The ' Model Information ' table shown in Output 44.1.1 describes the method and options used in the procedure if a positive number is specified in the NIMPUTE= option.

Output 44.1.1: Model Information
start example
  Fish Measurement Data   The MI Procedure   Model Information   Data Set                             WORK.FITMISS   Method                               MCMC   Multiple Imputation Chain            Single Chain   Initial Estimates for MCMC           EM Posterior Mode   Start                                Starting Value   Prior                                Jeffreys   Number of Imputations                0   Number of Burn-in Iterations         200   Number of Iterations                 100   Seed for random number generator     1518971  
end example
 

The 'Missing Data Patterns' table shown in Output 44.1.2 lists distinct missing data patterns with corresponding frequencies and percents. Here, a value of 'X' means that the variable is observed in the corresponding group and a value of '.' means that the variable is missing. The table also displays group -specific variable means.

Output 44.1.2: Missing Data Patterns
start example
  The MI Procedure   Missing Data Patterns   Run     Run   Group    Oxygen    Time    Pulse        Freq     Percent   1    X         X       X              21       67.74   2    X         X       .               4       12.90   3    X         .       .               3        9.68   4    .         X       X               1        3.23   5    .         X       .               2        6.45   Missing Data Patterns   -----------------Group Means----------------   Group          Oxygen         RunTime        RunPulse   1       46.353810       10.809524      171.666667   2       47.109500       10.137500               .   3       52.461667               .               .   4               .       11.950000      176.000000   5               .        9.885000               .  
end example
 

With the SIMPLE option, the procedure displays simple descriptive univariate statistics for available cases in the 'Univariate Statistics' table shown in Output 44.1.3 and correlations from pairwise available cases in the 'Pairwise Correlations ' table shown in Output 44.1.4.

Output 44.1.3: Univariate Statistics
start example
  The MI Procedure   Univariate Statistics   Variable           N          Mean       Std Dev       Minimum       Maximum   Oxygen            28      47.11618       5.41305      37.38800      60.05500   RunTime           28      10.68821       1.37988       8.63000      14.03000   RunPulse          22     171.86364      10.14324     148.00000     186.00000   Univariate Statistics   --- Missing Values--   Variable        Count    Percent   Oxygen              3       9.68   RunTime             3       9.68   RunPulse            9      29.03  
end example
 
Output 44.1.4: Pairwise Correlations
start example
  The MI Procedure   Pairwise Correlations   Oxygen           RunTime          RunPulse   Oxygen         1.000000000   0.849118562   0.343961742   RunTime   0.849118562       1.000000000       0.247258191   RunPulse   0.343961742       0.247258191       1.000000000  
end example
 

With the EM statement, the procedure displays the initial parameter estimates for EM in the 'Initial Parameter Estimates for EM' table shown in Output 44.1.5.

Output 44.1.5: Initial Parameter Estimates for EM
start example
  The MI Procedure   Initial Parameter Estimates for EM   _TYPE_    _NAME_            Oxygen         RunTime        RunPulse   MEAN                     47.116179       10.688214      171.863636   COV       Oxygen         29.301078               0               0   COV       RunTime                0        1.904067               0   COV       RunPulse               0               0      102.885281  
end example
 

With the ITPRINT option in the EM statement, the 'EM (MLE) Iteration History' table shown in Output 44.1.6 displays the iteration history for the EM algorithm.

Output 44.1.6: EM (MLE) Iteration History
start example
  The MI Procedure   EM (MLE) Iteration History   _Iteration_   2 Log L         Oxygen         RunTime        RunPulse   0     289.544782      47.116179       10.688214      171.863636   1     263.549489      47.116179       10.688214      171.863636   2     255.851312      47.139089       10.603506      171.538203   3     254.616428      47.122353       10.571685      171.426790   4     254.494971      47.111080       10.560585      171.398296   5     254.483973      47.106523       10.556768      171.389208   6     254.482920      47.104899       10.555485      171.385257   7     254.482813      47.104348       10.555062      171.383345   8     254.482801      47.104165       10.554923      171.382424   9     254.482800      47.104105       10.554878      171.381992   10     254.482800      47.104086       10.554864      171.381796   11     254.482800      47.104079       10.554859      171.381708   12     254.482800      47.104077       10.554858      171.381669  
end example
 

The 'EM (MLE) Parameter Estimates' table shown in Output 44.1.7 displays the maximum likelihood estimates for ¼ and of a multivariate normal distribution from the data set FitMiss .

Output 44.1.7: EM (MLE) Parameter Estimates
start example
  The MI Procedure   EM (MLE) Parameter Estimates   _TYPE_    _NAME_            Oxygen         RunTime        RunPulse   MEAN                     47.104077       10.554858      171.381669   COV       Oxygen         27.797931   6.457975   18.031298   COV       RunTime   6.457975        2.015514        3.516287   COV       RunPulse   18.031298        3.516287       97.766857  
end example
 

You can also output the EM (MLE) parameter estimates into an output data set with the OUTEM= option. The following statements list the observations in the output data set outem .

  proc print data=outem;   title 'EM Estimates';   run;  

The output data set outem shown in Output 44.1.8 is a TYPE=COV data set. The observation with _TYPE_ = ˜MEAN' contains the MLE for the parameter ¼ and the observations with _TYPE_ = ˜COV' contain the MLE for the parameter of a multivariate normal distribution from the data set FitMiss .

Output 44.1.8: EM Estimates
start example
  EM Estimates   Obs    _TYPE_     _NAME_      Oxygen      RunTime    RunPulse   1      MEAN                  47.1041     10.5549     171.382   2      COV      Oxygen       27.7979   6.4580   18.031   3      COV      RunTime   6.4580      2.0155       3.516   4      COV      RunPulse   18.0313      3.5163      97.767  
end example
 

Example 44.2: Propensity Score Method

This example uses the propensity score method to impute missing values for variables in a data set with a monotone missing pattern. The following statements invoke the MI procedure and request the propensity score method. The resulting data set is named outex2 .

  proc mi data=Fish1 seed=899603 out=outex2;   monotone propensity;   var Length1 Length2 Length3;   run;  

Note that the VAR statement is required and the data set must have a monotone missing pattern with variables as ordered in the VAR statement. The procedure generates the following output:

The 'Model Information' table shown in Output 44.2.1 describes the method and options used in the multiple imputation process. By default, five imputations are created for the missing data.

Output 44.2.1: Model Information
start example
  The MI Procedure   Model Information   Data Set                             WORK.FISH1   Method                               Monotone   Number of Imputations                5   Seed for random number generator     899603  
end example
 

When monotone methods are used in the imputation, MONOTONE is displayed as the method. The 'Monotone Model Specification' table shown in Output 44.2.2 displays the detailed model specification. By default, the observations are sorted into five groups based on their propensity scores.

Output 44.2.2: Monotone Model Specification
start example
  The MI Procedure   Monotone Model Specification   Imputed   Method                     Variables   Propensity(Groups= 5)     Length2 Length3  
end example
 

Without covariates specified for imputed variables Length2 and Length3 , the variable Length1 is used as the covariate for Length2 , and variables Length1 and Length2 are used as covariates for Length3 .

The 'Missing Data Patterns' table shown in Output 44.2.3 lists distinct missing data patterns with corresponding frequencies and percents. Here, values of 'X' and '.' indicate that the variable is observed or missing in the corresponding group. The table confirms a monotone missing pattern for these three variables.

Output 44.2.3: Missing Data Patterns
start example
  The MI Procedure   Missing Data Patterns   Group    Length1    Length2    Length3        Freq     Percent   1    X          X          X                30       85.71   2    X          X          .                 3        8.57   3    X          .          .                 2        5.71   Missing Data Patterns   -----------------Group Means----------------   Group         Length1         Length2         Length3   1       30.603333       33.436667       38.720000   2       29.033333       31.666667               .   3       27.750000               .               .  
end example
 

For the imputation process, first, missing values of Length2 in Group 3 are imputed using observed values of Length1 . Then the missing values of Length3 in Group 2 are imputed using observed values of Length1 and Length2 .Andfinally, the missing values of Length3 in Group 3 are imputed using observed values of Length1 and imputed values of Length2 .

After the completion of m imputations, the 'Multiple Imputation Variance Information' table shown in Output 44.2.4 displays the between-imputation variance, within-imputation variance, and total variance for combining complete-data inferences. It also displays the degrees of freedom for the total variance. The relative increase in variance due to missingness, the fraction of missing information, and the relative efficiency for each variable are also displayed. A detailed description of these statistics is provided in the 'Combining Inferences from Multiply Imputed Data Sets' section on page 2561.

Output 44.2.4: Variance Information
start example
  The MI Procedure   Multiple Imputation Variance Information   -----------------Variance-----------------   Variable         Between         Within          Total       DF   Length2         0.001500       0.465422       0.467223   32.034   Length3         0.049725       0.547434       0.607104   27.103   Multiple Imputation Variance Information   Relative       Fraction   Increase        Missing       Relative   Variable     in Variance    Information     Efficiency   Length2         0.003869       0.003861       0.999228   Length3         0.108999       0.102610       0.979891  
end example
 

The 'Multiple Imputation Parameter Estimates' table shown in Output 44.2.5 displays the estimated mean and standard error of the mean for each variable. The inferences are based on the t -distributions. For each variable, the table also displays a 95% mean confidence interval and a t -statistic with the associated p -value for the hypothesis that the population mean is equal to the value specified in the MU0= option, which is zero by default.

Output 44.2.5: Parameter Estimates
start example
  The MI Procedure   Multiple Imputation Parameter Estimates   Variable            Mean      Std Error    95% Confidence Limits        DF   Length2        33.006857       0.683537     31.61460     34.39912   32.034   Length3        38.361714       0.779169     36.76328     39.96015   27.103   Multiple Imputation Parameter Estimates   t for H0:   Variable         Minimum        Maximum            Mu0    Mean=Mu0   Pr > t   Length2        32.957143      33.060000              0       48.29     <.0001   Length3        38.080000      38.545714              0       49.23     <.0001  
end example
 

The following statements list the first ten observations of the data set outex2 , as shown in Output 44.2.6. The missing values are imputed from observed values with similar propensity scores.

Output 44.2.6: Imputed Data Set
start example
  First 10 Observations of the Imputed Data Set   Obs   _Imputation_    Length1    Length2    Length3   1          1           23.2       25.4       30.0   2          1           24.0       26.3       31.2   3          1           23.9       26.5       31.1   4          1           26.3       29.0       33.5   5          1           26.5       29.0       38.6   6          1           26.8       29.7       34.7   7          1           26.8       29.0       35.0   8          1           27.6       30.0       35.0   9          1           27.6       30.0       35.1   10          1           28.5       30.7       36.2  
end example
 
  proc print data=outex2(obs=10);   title 'First 10 Observations of the Imputed Data Set';   run;  

Example 44.3.2: Regression Method

This example uses the regression method to impute missing values for all variables in a data set with a monotone missing pattern. The following statements invoke the MI procedure and request the regression method for variable Length2 and the predictive mean matching method for variable Length3 . The resulting data set is named outex3 .

  proc mi data=Fish1 round=.1 mu0= 0 35 45   seed=13951639 out=outex3;   monotone reg(Length2/ details)   regpmm(Length3= Length1 Length2 Length1*Length2/ details);   var Length1 Length2 Length3;   run;  

The ROUND= option is used to round the imputed values to the same precision as observed values. The values specified with the ROUND= option are matched with the variables Length1 , Length2 ,and Length3 in the order listed in the VAR statement. The MU0= option requests t tests for the hypotheses that the population means corresponding to the variables in the VAR statement are Length2 =35 and Length3 =45.

Note that an optimal K= value is currently not available for the REGPMM option in the literature on multiple imputation. The default K=5 is experimental and may change in future releases.

The 'Missing Data Patterns' table lists distinct missing data patterns with corresponding frequencies and percents. It is identical to the table displayed in Output 44.2.3 in the previous example.

The 'Monotone Model Specification' table shown in Output 44.3.1 displays the model specification.

Output 44.3.1: Monotone Model Specification
start example
  Fish Measurement Data   The MI Procedure   Monotone Model Specification   Imputed   Method                    Variables   Regression                Length2   Regression-PMM(K= 5)     Length3  
end example
 

With the DETAILS option, the parameters estimated from the observed data and the parameters used in each imputation are displayed in Output 44.3.2 and Output 44.3.3.

Output 44.3.2: Regression Model
start example
  The MI Procedure   Regression Models for Monotone Method   Imputed                             ----------------Imputation----------------   Variable   Effect        Obs-Data              1              2              3   Length2    Intercept   0.04249   0.049184   0.055470   0.051346   Length2    Length1        0.98587       1.001934       0.995275       0.992294   Regression Models for Monotone Method   Imputed                ---------Imputation---------   Variable   Effect                 4               5   Length2    Intercept   0.064193   0.030719   Length2    Length1         0.983122        0.995883  
end example
 
Output 44.3.3: Regression Predicted Mean Matching Model
start example
  The MI Procedure   Regression Models for Monotone Predicted Mean Matching Method   Imputed                                ---------------Imputation---------------   Variable Effect              Obs Data             1             2             3   Length3   Intercept   0.01304      0.004134   0.011417   0.034177   Length3   Length1   0.01332      0.025320   0.037494      0.308765   Length3   Length2             0.98918      0.955510      1.025741      0.673374   Length3   Length1*Length2   0.02521   0.034964   0.022017   0.017919   Regression Models for Monotone Predicted Mean Matching Method   Imputed                    ---------Imputation---------   Variable Effect                       4               5   Length3   Intercept   0.010532        0.004685   Length3   Length1              0.156606   0.147118   Length3   Length2              0.828384        1.146440   Length3   Length1*Length2   0.029335   0.034671  
end example
 

After the completion of five imputations by default, the 'Multiple Imputation Variance Information' table shown in Output 44.3.4 displays the between-imputation variance, within-imputation variance, and total variance for combining complete-data inferences. The relative increase in variance due to missingness, the fraction of missing information, and the relative efficiency for each variable are also displayed. These statistics are described in the 'Combining Inferences from Multiply Imputed Data Sets' section on page 2561.

Output 44.3.4: Variance Information
start example
  The MI Procedure   Multiple Imputation Variance Information   -----------------Variance-----------------   Variable         Between         Within          Total       DF   Length2         0.000133       0.439512       0.439672    32.15   Length3         0.000386       0.486913       0.487376   32.131   Multiple Imputation Variance Information   Relative       Fraction   Increase        Missing       Relative   Variable     in Variance    Information     Efficiency   Length2         0.000363       0.000363       0.999927   Length3         0.000952       0.000951       0.999810  
end example
 

The 'Multiple Imputation Parameter Estimates' table shown in Output 44.3.5 displays a 95% mean confidence interval and a t -statistic with its associated p -value for each of the hypotheses requested with the MU0= option.

Output 44.3.5: Parameter Estimates
start example
  The MI Procedure   Multiple Imputation Parameter Estimates   Variable            Mean      Std Error    95% Confidence Limits        DF   Length2        33.104571       0.663078     31.75417     34.45497    32.15   Length3        38.424571       0.698123     37.00277     39.84637   32.131   Multiple Imputation Parameter Estimates   t for H0:   Variable         Minimum        Maximum            Mu0    Mean=Mu0   Pr > t   Length2        33.088571      33.117143      35.000000       -2.86     0.0074   Length3        38.397143      38.445714      45.000000       -9.42     <.0001  
end example
 

The following statements list the first ten observations of the data set outex3 in Output 44.3.6. Note that the imputed values of Length2 are rounded to the same precision as the observed values.

  proc print data=outex3(obs=10);   title 'First 10 Observations of the Imputed Data Set';   run;  
Output 44.3.6: Imputed Data Set
start example
  First 10 Observations of the Imputed Data Set   Obs   _Imputation_    Length1    Length2    Length3   1          1           23.2       25.4       30.0   2          1           24.0       26.3       31.2   3          1           23.9       26.5       31.1   4          1           26.3       29.0       33.5   5          1           26.5       29.0       34.7   6          1           26.8       29.7       34.7   7          1           26.8       28.8       34.7   8          1           27.6       30.0       35.0   9          1           27.6       30.0       35.1   10          1           28.5       30.7       36.2  
end example
 

Example 44.4. Logistic Regression Method for CLASS Variables

This example uses logistic regression method to impute values for a binary variable in a data set with a monotone missing pattern.

The logistic regression method is used for the binary and ordinal CLASS variables. Since the variable Species is not an ordinal variable, only the first two species are used.

  proc mi data=Fish2 seed=1305417 out=outex4;   class Species;   monotone logistic(Species= Height Width Height*Width/ details);   var Height Width Species;   run;  

The 'Model Information' table shown in Output 44.4.1 describes the method and options uses the regression imputation process.

Output 44.4.1: Model Information
start example
  The MI Procedure   Model Information   Data Set                             WORK.FISH2   Method                               Monotone   Number of Imputations                5   Seed for random number generator     1305417  
end example
 

The 'Monotone Model Specification' table shown in Output 44.4.2 describes methods and imputed variables in the imputation model. The procedure uses the logistic regression method to impute variable Species in the model. Missing values in other variables are not imputed.

Output 44.4.2: Monotone Model Specification
start example
  The MI Procedure   Monotone Model Specification   Imputed   Method                 Variables   Logistic Regression    Species  
end example
 

The 'Missing Data Patterns' table shown in Output 44.4.3 lists distinct missing data patterns with corresponding frequencies and percents. The table confirms a monotone missing pattern for these variables.

Output 44.4.3: Missing Data Patterns
start example
  The MI Procedure   Missing Data Patterns   --------Group Means-------   Group Height Width Species         Freq   Percent        Height         Width   1  X       X      X              47     85.45     12.097645      4.808204   2  X       X      .               6     10.91     11.411050      4.567050   3  X       .      .               2      3.64     14.126350             .  
end example
 

With the DETAILS option, parameters estimated from the observed data and the parameters used in each imputation are displayed in the 'Logistic Models for Monotone Method' table in Output 44.4.4.

Output 44.4.4: Logistic Regression Model
start example
  The MI Procedure   Logistic Models for Monotone Method   Imputed                             ---------------Imputation---------------   Variable Effect           Obs-Data             1             2             3   Species   Intercept        2.65234      1.794014      5.392323      5.859932   Species   Height           7.73757      3.727095     11.790557     12.200408   Species   Width   5.25709   1.209209   8.492849   8.696497   Species   Height*Width   1.12990   1.593964   1.989302   3.087310   Logistic Models for Monotone Method   Imputed                 ---------Imputation---------   Variable Effect                    4               5   Species   Intercept   0.649860        6.393629   Species   Height            2.449332       13.644077   Species   Width             0.629963   10.767135   Species   Height*Width      0.979165   2.389491  
end example
 

The following statements list the first ten observations of the data set outex4 in Output 44.4.5.

  proc print data=outex4(obs=10);   title 'First 10 Observations of the Imputed Data Set';   run;  
Output 44.4.5: Imputed Data Set
start example
  First 10 Observations of the Imputed Data Set   Obs   _Imputation_    Species    Length3     Height     Width   1          1           Gp1        30.0     11.5200    4.0200   2          1           Gp1        31.2     12.4800    4.3056   3          1           Gp1        31.1     12.3778    4.6961   4          1                      33.5     12.7300     .   5          1           Gp1        34.0     12.4440    5.1340   6          1           Gp1        34.7     13.6024    4.9274   7          1           Gp1        34.5     14.1795    5.2785   8          1           Gp1        35.0     12.6700    4.6900   9          1           Gp1        35.1     14.0049    4.8438   10          1           Gp1        36.2     14.2266    4.9594  
end example
 

Note that a missing value of the variable Species is not imputed if the corresponding covariates are missing and not imputed, as shown by observation 4 in the table.

Example 44.5. Discriminant Function Method for CLASS Variables

This example uses discriminant monotone methods to impute values of a CLASS variable from the observed observation values in a data set with a monotone missing pattern.

The following statements impute the continuous variables Height and Width with the regression method and the CLASS variable Species with the discriminant function method.

  proc mi data=Fish2 seed=7545417 nimpute=3 out=outex5;   class Species;   monotone reg(Height Width)   discrim(Species= Length3 Height Width/ details);   var Length3 Height Width Species;   run;  

The 'Model Information' table shown in Output 44.5.1 describes the method and options used in the multiple imputation process.

Output 44.5.1: Model Information
start example
  The MI Procedure   Model Information   Data Set                             WORK.FISH2   Method                               Monotone   Number of Imputations                3   Seed for random number generator     7545417  
end example
 

The 'Monotone Model Specification' table shown in Output 44.5.2 describes methods and imputed variables in the imputation model. The procedure uses the regression method to impute variables Height and Width , and uses the logistic regression method to impute variable Species in the model.

Output 44.5.2: Monotone Model Specification
start example
  The MI Procedure   Monotone Model Specification   Imputed   Method                   Variables   Regression               Height Width   Discriminant Function    Species  
end example
 

The 'Missing Data Patterns' table shown in Output 44.5.3 lists distinct missing data patterns with corresponding frequencies and percents. The table confirms a monotone missing pattern for these variables.

Output 44.5.3: Missing Data Patterns
start example
  The MI Procedure   Missing Data Patterns   Group    Length3    Height    Width    Species        Freq     Percent   1    X          X         X        X                47       85.45   2    X          X         X        .                 6       10.91   3    X          X         .        .                 2        3.64   Missing Data Patterns   -----------------Group Means----------------   Group         Length3          Height           Width   1       33.497872       12.097645        4.808204   2       32.366667       11.411050        4.567050   3       36.600000       14.126350               .  
end example
 

With the DETAILS option, parameters estimated from the observed data and parameters used in each imputation are displayed in Output 44.5.4.

Output 44.5.4: Discriminant Model
start example
  The MI Procedure   Group Means for Monotone Discriminant Method   ----------------Imputation----------------   Species    Variable     Obs-Data              1              2              3   Gp1        Length3       0.61625       0.707861       0.662448       0.505410   Gp1        Height        0.67244       0.750984       0.732151       0.594226   Gp1        Width         0.57896       0.643334       0.665698       0.515014   Gp2        Length3   0.98925   0.776131   0.987989   0.887032   Gp2        Height   1.08272   0.934081   1.081832   1.004799   Gp2        Width   0.86963   0.680065   0.811745   0.722943  
end example
 

The following statements list the first ten observations of the data set outex5 in Output 44.5.5 Note that all missing values of variables Width and Species are imputed.

  proc print data=outex5(obs=10);   title 'First 10 Observations of the Imputed Data Set';   run;  
Output 44.5.5: Imputed Data Set
start example
  First 10 Observations of the Imputed Data Set   Obs   _Imputation_    Species    Length3     Height     Width   1          1           Gp1        30.0     11.5200    4.02000   2          1           Gp1        31.2     12.4800    4.30560   3          1           Gp1        31.1     12.3778    4.69610   4          1           Gp1        33.5     12.7300    4.67966   5          1           Gp2        34.0     12.4440    5.13400   6          1           Gp1        34.7     13.6024    4.92740   7          1           Gp1        34.5     14.1795    5.27850   8          1           Gp1        35.0     12.6700    4.69000   9          1           Gp1        35.1     14.0049    4.84380   10          1           Gp1        36.2     14.2266    4.95940  
end example
 

Example 44.6. MCMC Method

This example uses the MCMC method to impute missing values for a data set with an arbitrary missing pattern. The following statements invoke the MI procedure and specify the MCMC method with six imputations.

  proc mi data=FitMiss seed=21355417 nimpute=6 mu0=50 10 180 ;   mcmc chain=multiple displayinit initial=em(itprint);   var Oxygen RunTime RunPulse;   run;  
Output 44.6.1: Model Information
start example
  The MI Procedure   Model Information   Data Set                             WORK.FITMISS   Method                               MCMC   Multiple Imputation Chain            Multiple Chains   Initial Estimates for MCMC           EM Posterior Mode   Start                                Starting Value   Prior                                Jeffreys   Number of Imputations                6   Number of Burn-in Iterations         200   Seed for random number generator     21355417  
end example
 

The 'Model Information' table shown in Output 44.6.1 describes the method used in the multiple imputation process. With CHAIN=MULTIPLE, the procedure uses multiple chains and completes the default 200 burn-in iterations before each imputation. The 200 burn-in iterations are used to make the iterations converge to the stationary distribution before the imputation.

By default, the procedure uses a noninformative Jeffreys prior to derive the posterior mode from the EM algorithm as the starting values for the MCMC process.

The 'Missing Data Patterns' table shown in Output 44.6.2 lists distinct missing data patterns with corresponding statistics.

Output 44.6.2: Missing Data Patterns
start example
  The MI Procedure   Missing Data Patterns   Run     Run   Group    Oxygen    Time    Pulse        Freq     Percent   1    X         X       X              21       67.74   2    X         X       .               4       12.90   3    X         .       .               3        9.68   4    .         X       X               1        3.23   5    .         X       .               2        6.45   Missing Data Patterns   -----------------Group Means----------------   Group          Oxygen         RunTime        RunPulse   1       46.353810       10.809524      171.666667   2       47.109500       10.137500               .   3       52.461667               .               .   4               .       11.950000      176.000000   5               .        9.885000               .  
end example
 

With the ITPRINT option in INITIAL=EM, the procedure displays the 'EM (Posterior Mode) Iteration History' table in Output 44.6.3.

Output 44.6.3: EM (Posterior Mode) Iteration History
start example
  The MI Procedure   EM (Posterior Mode) Iteration History   _Iteration_   2 Log L   2 Log Posterior         Oxygen         RunTime   0      254.482800         282.909549      47.104077       10.554858   1      255.081168         282.051584      47.104077       10.554857   2      255.271408         282.017488      47.104077       10.554857   3      255.318622         282.015372      47.104002       10.554523   4      255.330259         282.015232      47.103861       10.554388   5      255.333161         282.015222      47.103797       10.554341   6      255.333896         282.015222      47.103774       10.554325   7      255.334085         282.015222      47.103766       10.554320   EM (Posterior Mode) Iteration History   _Iteration_        RunPulse   0      171.381669   1      171.381652   2      171.381644   3      171.381842   4      171.382053   5      171.382150   6      171.382185   7      171.382196  
end example
 

With the DISPLAYINIT option in the MCMC statement, the 'Initial Parameter Estimates for MCMC' table shown in Output 44.6.4 displays the starting mean and covariance estimates used in MCMC. The same starting estimates are used for the MCMC process for multiple chains because the EM algorithm is applied to the same data set in each chain. You can explicitly specify different initial estimates for different imputations, or you can use the bootstrap to generate different parameter estimates from the EM algorithm for the MCMC process.

Output 44.6.4: Initial Parameter Estimates
start example
  The MI Procedure   Initial Parameter Estimates for MCMC   _TYPE_    _NAME_            Oxygen         RunTime        RunPulse   MEAN                     47.103766       10.554320      171.382196   COV       Oxygen         24.549967   5.726112   15.926036   COV       RunTime   5.726112        1.781407        3.124798   COV       RunPulse   15.926036        3.124798       83.164045  
end example
 

Output 44.6.5 and Output 44.6.6 display variance information and parameter estimates from the multiple imputation.

Output 44.6.5: Variance Information
start example
  The MI Procedure   Multiple Imputation Variance Information   -----------------Variance-----------------   Variable         Between         Within          Total       DF   Oxygen          0.051560       0.928170       0.988323   25.958   RunTime         0.003979       0.070057       0.074699   25.902   RunPulse        4.118578       4.260631       9.065638   7.5938   Multiple Imputation Variance Information   Relative       Fraction   Increase        Missing       Relative   Variable     in Variance    Information     Efficiency   Oxygen          0.064809       0.062253       0.989731   RunTime         0.066262       0.063589       0.989513   RunPulse        1.127769       0.575218       0.912517  
end example
 
Output 44.6.6: Parameter Estimates
start example
  The MI Procedure   Multiple Imputation Parameter Estimates   Variable            Mean      Std Error    95% Confidence Limits        DF   Oxygen         47.164819       0.994145      45.1212      49.2085   25.958   RunTime        10.549936       0.273312       9.9880      11.1118   25.902   RunPulse      170.969836       3.010920     163.9615     177.9782   7.5938   Multiple Imputation Parameter Estimates   t for H0:   Variable         Minimum        Maximum            Mu0    Mean=Mu0   Pr > t   Oxygen         46.858020      47.363540      50.000000   2.85     0.0084   RunTime        10.476886      10.659412      10.000000        2.01     0.0547   RunPulse      168.252615     172.894991     180.000000   3.00     0.0182  
end example
 

Example 44.7. Producing Monotone Missingness with MCMC

This example uses the MCMC method to impute just enough missing values for a data set with an arbitrary missing pattern so that each imputed data set has a monotone missing pattern based on the order of variables in the VAR statement.

The following statements invoke the MI procedure and specify the IMPUTE=MONOTONE option to create the imputed data set with a monotone missing pattern. You must specify a VAR statement to provide the order of variables for the imputed data to achieve a monotone missing pattern.

  proc mi data=FitMiss seed=17655417 out=outex7;   mcmc impute=monotone;   var Oxygen RunTime RunPulse;   run;  
Output 44.7.1: Model Information
start example
  The MI Procedure   Model Information   Data Set                             WORK.FITMISS   Method                               Monotone-data MCMC   Multiple Imputation Chain            Single Chain   Initial Estimates for MCMC           EM Posterior Mode   Start                                Starting Value   Prior                                Jeffreys   Number of Imputations                5   Number of Burn-in Iterations         200   Number of Iterations                 100   Seed for random number generator     17655417  
end example
 

The 'Model Information' table shown in Output 44.7.1 describes the method used in the multiple imputation process.

The 'Missing Data Patterns' table shown in Output 44.7.2 lists distinct missing data patterns with corresponding statistics. Here, an 'X' means that the variable is observed in the corresponding group, a '.' means that the variable is missing and will be imputed to achieve the monotone missingness for the imputed data set, and an 'O' means that the variable is missing and will not be imputed. The table also displays group-specific variable means.

Output 44.7.2: Missing Data Pattern
start example
  The MI Procedure   Missing Data Patterns   Run     Run   Group    Oxygen    Time    Pulse        Freq     Percent   1    X         X       X              21       67.74   2    X         X       O               4       12.90   3    X         O       O               3        9.68   4    .         X       X               1        3.23   5    .         X       O               2        6.45   Missing Data Patterns   -----------------Group Means----------------   Group          Oxygen         RunTime        RunPulse   1       46.353810       10.809524      171.666667   2       47.109500       10.137500               .   3       52.461667               .               .   4               .       11.950000      176.000000   5               .        9.885000               .  
end example
 

As shown in the table, the MI procedure only needs to impute three missing values from Group 4 and Group 5 to achieve a monotone missing pattern for the imputed data set.

When using the MCMC method to produce an imputed data set with a monotone missing pattern, tables of variance information and parameter estimates are not created.

The following statements are used just to show the monotone missingness of the output data set outex7 .

  proc mi data=outex7 nimpute=0;   var Oxygen RunTime RunPulse;   run;  
Output 44.7.3: Monotone Missing Data Pattern
start example
  The MI Procedure   Missing Data Patterns   Run     Run   Group    Oxygen    Time    Pulse        Freq     Percent   1    X         X       X             110       70.97   2    X         X       .              30       19.35   3    X         .       .              15        9.68   Missing Data Patterns   -----------------Group Means---------------   Group          Oxygen         RunTime        RunPulse   1       46.152428       10.861364      171.863636   2       47.796038       10.053333               .   3       52.461667               .               .  
end example
 

The 'Missing Data Patterns' table shown in Output 44.7.3 displays a monotone missing data pattern.

The following statements impute one value for each missing value in the monotone missingness data set outex7 .

  proc mi data=outex7 nimpute=1 seed=51343672 out=outds;   monotone method=reg;   var Oxygen RunTime RunPulse;   by _Imputation_;   run;  

You can then analyze these data sets by using other SAS procedures and combine these results by using the MIANALYZE procedure. Note that the VAR statement is required with a MONOTONE statement to provide the variable order for the monotone missing pattern.

Example 44.8. Checking Convergence in MCMC

This example uses the MCMC method with a single chain. It also displays time-series and autocorrelation plots to check convergence for the single chain.

The following statements use the MCMC method to create an iteration plot for the successive estimates of the mean of Oxygen . Note that iterations during the burn-in period are indicated with negative iteration numbers . These statements also create an autocorrelation function plot for the variable Oxygen .

  proc mi data=FitMiss seed=42037921 noprint nimpute=2;   mcmc timeplot(mean(Oxygen)) acfplot(mean(Oxygen));   var Oxygen RunTime RunPulse;   run;  
Output 44.8.1: Time-Series Plot for Oxygen
start example
click to expand
end example
 

With the TIMEPLOT(MEAN(Oxygen)) option, the procedure displays a time-series plot for the mean of Oxygen in Output 44.8.1.

By default, the MI procedure displays solid line segments that connect data points in the time-series plot. The plot shows no apparent trends for the variable Oxygen .

Output 44.8.2: Autocorrelation Function Plot for Oxygen
start example
click to expand
end example
 

With the ACFPLOT(MEAN(oxygen)) option, the procedure displays an autocorrelation plot for the mean of Oxygen in Output 44.8.2.

By default, the MI procedure uses the star sign (*) as the plot symbol to display the points in the plot, a solid line to display the reference line of zero autocorrelation, and a pair of dashed lines to display approximately 95% confidence limits for the autocorrelations. The autocorrelation function plot shows no significant positive or negative autocorrelation.

The following statements use display options to modify the autocorrelation function plot for Oxygen in Output 44.8.3.

  proc mi data=FitMiss seed=42037921 noprint nimpute=2;   mcmc acfplot(mean(Oxygen) / symbol=dot lref=2);   var Oxygen RunTime RunPulse;   run;  
Output 44.8.3: Autocorrelation Function Plot for Oxygen
start example
click to expand
end example
 

You can also create plots for the worst linear function, the means of other variables, the variances of variables, and covariances between variables. Alternatively, you can use the OUTITER option to save statistics such as the means, standard deviations, covariances, ˆ’ 2 log LR statistic, ˆ’ 2 log LR statistic of the posterior mode, and worst linear function from each iteration in an output data set. Then you can do a more in-depth time-series analysis of the iterations with other procedures, such as PROC AUTOREG and PROC ARIMA in the SAS/ETS User's Guide .

With the experimental ODS GRAPHICS statement specified in the following statements

  ods html;   ods graphics on;   proc mi data=FitMiss seed=42037921 noprint nimpute=2;   mcmc timeplot(mean(Oxygen)) acfplot(mean(Oxygen));   var Oxygen RunTime RunPulse;   run;   ods graphics off;   ods html close;  

the MI procedure produces the experimental graphs, as shown in Output 44.8.4 and Output 44.8.5.

Output 44.8.4: Time-Series Plot for Oxygen (Experimental)
start example
click to expand
end example
 
Output 44.8.5: Autocorrelation Function Plot for Oxygen (Experimental)
start example
click to expand
end example
 

For general information about ODS graphics see Chapter 15, 'Statistical Graphics Using ODS.' For specific information about the graphics available in the MI procedure, see the 'ODS Graphics' section on page 2567.

Example 44.9. Saving and Using Parameters for MCMC

This example uses the MCMC method with multiple chains as specified in Example 44.6. It saves the parameter values used for each imputation in an output data set of type EST called miest . This output data set can then be used to impute missing values in other similar input data sets. The following statements invoke the MI procedure and specify the MCMC method with multiple chains to create three imputations.

  proc mi data=FitMiss seed=21355417 nimpute=6 mu0=50 10 180 ;   mcmc chain=multiple initial=em outest=miest;   var Oxygen RunTime RunPulse;   run;  

The following statements list the parameters used for the imputations in Output 44.9.1. Note that the data set includes observations with _ TYPE_ = ˜SEED' containing the seed to start the next random number generator.

  proc print data=miest(obs=15);   title 'Parameters for the Imputations';   run;  
Output 44.9.1: OUTEST Data Set
start example
  Parameters for the Imputations   Obs _Imputation_ _TYPE_ _NAME_           Oxygen        RunTime       RunPulse   1       1       SEED             825240167.00   825240167.00   825240167.00   2       1       PARM                    46.77          10.47         169.41   3       1       COV   Oxygen            30.59   8.32   50.99   4       1       COV   RunTime   8.32           2.90          17.03   5       1       COV   RunPulse   50.99          17.03         200.09   6       2       SEED            1895925872.00 1895925872.00   1895925872.00   7       2       PARM                    47.41          10.37         173.34   8       2       COV   Oxygen            22.35   4.44   21.18   9       2       COV   RunTime   4.44           1.76           1.25   10       2       COV   RunPulse   21.18           1.25         125.67   11       3       SEED             137653011.00   137653011.00   137653011.00   12       3       PARM                    48.21          10.36         170.52   13       3       COV   Oxygen            23.59   5.25   19.76   14       3       COV   RunTime   5.25           1.66           5.00   15       3       COV   RunPulse   19.76           5.00         110.99  
end example
 

The following statements invoke the MI procedure and use the INEST= option in the MCMC statement.

  proc mi data=FitMiss;   mcmc inest=miest;   var Oxygen RunTime RunPulse;   run;  
Output 44.9.2: Model Information
start example
  The MI Procedure   Model Information   Data Set                             WORK.FITMISS   Method                               MCMC   INEST Data Set                       WORK.MIEST   Number of Imputations                6  
end example
 

The 'Model Information' table shown in Output 44.9.2 describes the method used in the multiple imputation process. The remaining tables for the example are identical to the tables in Output 44.6.2, Output 44.6.4, Output 44.6.5,andOutput 44.6.6 in Example 44.6.

Example 44.10. Transforming to Normality

This example applies the MCMC method to the FitMiss data set in which the variable Oxygen is transformed. Assume that Oxygen is skewed and can be transformed to normality with a logarithmic transformation. The following statements invoke the MI procedure and specify the transformation. The TRANSFORM statement specifies the log transformation for Oxygen . Note that the values displayed for Oxygen in all of the results correspond to transformed values.

  proc mi data=FitMiss seed=32937921 mu0=50 10 180 out=outex10;   transform log(Oxygen);   mcmc chain=multiple displayinit;   var Oxygen RunTime RunPulse;   run;  

The 'Missing Data Patterns' table shown in Output 44.10.1 lists distinct missing data patterns with corresponding statistics for the FitMiss data. Note that the values of Oxygen shown in the tables are transformed values.

Output 44.10.1: Missing Data Pattern
start example
  The MI Procedure   Missing Data Patterns   Run     Run   Group    Oxygen    Time    Pulse        Freq     Percent   1    X         X       X              21       67.74   2    X         X       .               4       12.90   3    X         .       .               3        9.68   4    .         X       X               1        3.23   5    .         X       .               2        6.45   Transformed Variables: Oxygen   Missing Data Patterns   -----------------Group Means----------------   Group          Oxygen         RunTime        RunPulse   1        3.829760       10.809524      171.666667   2        3.851813       10.137500               .   3        3.955298               .               .   4               .       11.950000      176.000000   5               .        9.885000               .   Transformed Variables: Oxygen  
end example
 

The 'Variable Transformations' table shown in Output 44.10.2 lists the variables that have been transformed.

Output 44.10.2: Variable Transformations
start example
  The MI Procedure   Variable Transformations   Variable    _Transform_   Oxygen      LOG  
end example
 

The 'Initial Parameter Estimates for MCMC' table shown in Output 44.10.3 displays the starting mean and covariance estimates used in the MCMC process.

Output 44.10.3: Initial Parameter Estimates
start example
  The MI Procedure   Initial Parameter Estimates for MCMC   _TYPE_    _NAME_            Oxygen         RunTime        RunPulse   MEAN                      3.846122       10.557605      171.382949   COV       Oxygen          0.010827   0.120891   0.328772   COV       RunTime   0.120891        1.744580        3.011180   COV       RunPulse   0.328772        3.011180       82.747609   Transformed Variables: Oxygen  
end example
 

Output 44.10.4 displays variance information from the multiple imputation.

Output 44.10.4: Variance Information
start example
  The MI Procedure   Multiple Imputation Variance Information   -----------------Variance-----------------   Variable         Between         Within          Total       DF   * Oxygen       0.000016175       0.000401       0.000420   26.499   RunTime         0.001762       0.065421       0.067536   27.118   RunPulse        0.205979       3.116830       3.364004   25.222   * Transformed Variables   Multiple Imputation Variance Information   Relative       Fraction   Increase        Missing       Relative   Variable     in Variance    Information     Efficiency   * Oxygen          0.048454       0.047232       0.990642   RunTime         0.032318       0.031780       0.993684   RunPulse        0.079303       0.075967       0.985034   * Transformed Variables  
end example
 

Output 44.10.5 displays parameter estimates from the multiple imputation. Note that the parameter value of ¼ has also been transformed using the logarithmic transformation.

Output 44.10.5: Parameter Estimates
start example
  The MI Procedure   Multiple Imputation Parameter Estimates   Variable            Mean      Std Error    95% Confidence Limits        DF   * Oxygen          3.845175       0.020494       3.8031       3.8873   26.499   RunTime        10.560131       0.259876      10.0270      11.0932   27.118   RunPulse      171.802181       1.834122     168.0264     175.5779   25.222   * Transformed Variables   Multiple Imputation Parameter Estimates   t for H0:   Variable         Minimum        Maximum            Mu0    Mean=Mu0   Pr > t   * Oxygen          3.838599       3.848456       3.912023   3.26     0.0030   RunTime        10.493031      10.600498      10.000000        2.16     0.0402   RunPulse      171.251777     172.498626     180.000000   4.47     0.0001   * Transformed Variables  
end example
 

The following statements list the first ten observations of the data set outmi in Output 44.10.6. Note that the values for Oxygen are in the original scale.

  proc print data=outex10(obs=10);   title 'First 10 Observations of the Imputed Data Set';   run;  
Output 44.10.6: Imputed Data Set in Original Scale
start example
  First 10 Observations of the Imputed Data Set   Run   Obs   _Imputation_     Oxygen    RunTime     Pulse   1          1         44.6090    11.3700    178.000   2          1         45.3130    10.0700    185.000   3          1         54.2970     8.6500    156.000   4          1         59.5710     7.1440    167.012   5          1         49.8740     9.2200    170.092   6          1         44.8110    11.6300    176.000   7          1         38.5834    11.9500    176.000   8          1         43.7376    10.8500    158.851   9          1         39.4420    13.0800    174.000   10          1         60.0550     8.6300    170.000  
end example
 

Note that the preceding results can also be produced from the following statements without using a TRANSFORM statement. A transformed value of log(50)=3.91202 is used in the MU0= option.

  data temp;   set FitMiss;   LogOxygen= log(Oxygen);   run;   proc mi data=temp seed=14337921 mu0=3.91202 10 180 out=outtemp;   mcmc chain=multiple displayinit;   var LogOxygen RunTime RunPulse;   run;   data outex10;   set outtemp;   Oxygen= exp(LogOxygen);   run;  

Example 44.11. Multistage Imputation

This example uses two separate imputation procedures to complete the imputation process. The first MI procedure uses the MCMC method to impute just enough missing values for a data set with an arbitrary missing pattern so that each imputed data set has a monotone missing pattern. The second MI procedure uses a MONOTONE statement to impute missing values for data sets with monotone missing patterns.

The following statements are identical to Example 44.7. The statements invoke the MI procedure and specify the the IMPUTE=MONOTONE option to create the imputed data set with a monotone missing pattern.

  proc mi data=FitMiss seed=17655417 out=outex11;   mcmc impute=monotone;   var Oxygen RunTime RunPulse;   run;  

The 'Missing Data Patterns' table shown in Output 44.11.1 lists distinct missing data patterns with corresponding statistics. Here, an 'X' means that the variable is observed in the corresponding group, a '.' means that the variable is missing and will be imputed to achieve the monotone missingness for the imputed data set, and an 'O' means that the variable is missing and will not be imputed. The table also displays group-specific variable means.

Output 44.11.1: Missing Data Pattern
start example
  The MI Procedure   Missing Data Patterns   Run     Run   Group    Oxygen    Time    Pulse        Freq     Percent   1    X         X       X              21       67.74   2    X         X       O               4       12.90   3    X         O       O               3        9.68   4    .         X       X               1        3.23   5    .         X       O               2        6.45   Missing Data Patterns   -----------------Group Means----------------   Group          Oxygen         RunTime        RunPulse   1       46.353810       10.809524      171.666667   2       47.109500       10.137500               .   3       52.461667               .               .   4               .       11.950000      176.000000   5               .        9.885000               .  
end example
 

As shown in the table, the MI procedure only needs to impute three missing values from Group 4 and Group 5 to achieve a monotone missing pattern for the imputed data set. When the MCMC method is used to produce an imputed data set with a monotone missing pattern, tables of variance information and parameter estimates are not created.

The following statements impute one value for each missing value in the monotone missingness data set outex11 .

  proc mi data=outex11   nimpute=1 seed=51343672   out=outex11a;   monotone reg;   var Oxygen RunTime RunPulse;   by _Imputation_;   run;  

You can then analyze these data sets by using other SAS procedures and combine these results by using the procedure MIANALYZE. Note that the VAR statement is required with a MONOTONE statement to provide the variable order for the monotone missing pattern.

The 'Model Information' table displayed in Output 44.11.2 shows that a monotone method is used to generate imputed values in the first BY group.

Output 44.11.2: Model Information
start example
  ----------------------------- Imputation Number=1 ------------------------------   The MI Procedure   Model Information   Data Set                             WORK.OUTEX11   Method                               Monotone   Number of Imputations                1   Seed for random number generator     51343672  
end example
 

The 'Monotone Model Specification' table shown in Output 44.11.3 describes methods and imputed variables in the imputation model. The procedure uses the regression method to impute variables RunTime and RunPulse in the model.

Output 44.11.3: Monotone Model Specification
start example
  ----------------------------- Imputation Number=1 ------------------------------   The MI Procedure   Monotone Model Specification   Imputed   Method        Variables   Regression    RunTime RunPulse  
end example
 

The 'Missing Data Patterns' table shown in Output 44.11.4 lists distinct missing data patterns with corresponding statistics. It shows a monotone missing pattern for the imputed data set.

Output 44.11.4: Missing Data Pattern
start example
  ----------------------------- Imputation Number=1 ------------------------------   The MI Procedure   Missing Data Patterns   Run     Run   Group    Oxygen    Time    Pulse        Freq     Percent   1    X         X       X              22       70.97   2    X         X       .               6       19.35   3    X         .       .               3        9.68   Missing Data Patterns   -----------------Group Means----------------   Group          Oxygen         RunTime        RunPulse   1       46.057479       10.861364      171.863636   2       46.745227       10.053333               .   3       52.461667               .               .  
end example
 

The following statements list the first ten observations of the data set outex11a in Output 44.11.5.

  proc print data=outex11a(obs=10);   title 'First 10 Observations of the Imputed Data Set';   run;  
Output 44.11.5: Imputed Data Set
start example
  First 10 Observations of the Imputed Data Set   Run   Obs   _Imputation_     Oxygen    RunTime     Pulse   1          1         44.6090    11.3700    178.000   2          1         45.3130    10.0700    185.000   3          1         54.2970     8.6500    156.000   4          1         59.5710     7.1569    169.914   5          1         49.8740     9.2200    159.315   6          1         44.8110    11.6300    176.000   7          1         39.8345    11.9500    176.000   8          1         45.3196    10.8500    151.252   9          1         39.4420    13.0800    174.000   10          1         60.0550     8.6300    170.000  
end example
 

This example presents an alternative to the full-data MCMC imputation. When imputation of only a few missing values are needed to achieve a monotone missing pattern for the imputed data set. The example uses a monotone MCMC method that impute fewer missing values in each iteration and achieves approximate stationarity in fewer iterations (Schafer 1997, p. 227). The example also demonstrates how to combine the monotone MCMC method with a method for monotone missing data, which does not rely on iterations of steps.




SAS.STAT 9.1 Users Guide (Vol. 4)
SAS.STAT 9.1 Users Guide (Vol. 4)
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 91

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net