Details


Missing Values

Observations containing missing values are omitted from the analysis.

Input Data Sets

The input data set can be an ordinary SAS data set or one of several specially structured data sets created by statistical procedures available with SAS/STAT software. For more information on these data sets, see Appendix A, 'Special SAS Data Sets.' The BY variable in these data sets becomes the CLASS variable in PROC STEPDISC. These specially structured data sets include

  • TYPE=CORR data sets created by PROC CORR using a BY statement

  • TYPE=COV data sets created by PROC PRINCOMP using both the COV option and a BY statement

  • TYPE=CSSCP data sets created by PROC CORR using the CSSCP option and a BY statement, where the OUT= data set is assigned TYPE=CSSCP with the TYPE= data set option

  • TYPE=SSCP data sets created by PROC REG using both the OUTSSCP= option and a BY statement

When the input data set is TYPE=CORR, TYPE=COV, or TYPE=CSSCP, the STEPDISC procedure reads the number of observations for each class from the observations with _TYPE_='N' and the variable means in each class from the observations with _TYPE_='MEAN'. The procedure then reads the within-class correlations from the observations with _TYPE_='CORR', the standard deviations from the observations with _TYPE_='STD' (data set TYPE=CORR), the within-class covariances from the observations with _TYPE_='COV' (data set TYPE=COV), or the within-class corrected sums of squares and crossproducts from the observations with _TYPE_='CSSCP' (data set TYPE=CSSCP).

When the data set does not include any observations with _TYPE_='CORR' (data set TYPE=CORR), _TYPE_='COV' (data set TYPE=COV), or _TYPE_='CSSCP' (data set TYPE=CSSCP) for each class, PROC STEPDISC reads the pooled within-class information from the data set. In this case, the STEPDISC procedure reads the pooled within-class correlations from the observations with _TYPE_='PCORR', the pooled within-class standard deviations from the observations with _TYPE_='PSTD' (data set TYPE=CORR), the pooled within-class covariances from the observations with _TYPE_='PCOV' (data set TYPE=COV), or the pooled within-class corrected SSCP matrix from the observations with_TYPE_='PSSCP' (data set TYPE=CSSCP).

When the input data set is TYPE=SSCP, the STEPDISC procedure reads the number of observations for each class from the observations with _TYPE_='N', the sum of weights of observations from the variable INTERCEPT in observations with _TYPE_='SSCP' and _NAME_='INTERCEPT', the variable sums from the variable= variablenames in observations with _TYPE_='SSCP' and _NAME_='INTERCEPT', and the uncorrected sums of squares and crossproducts from the variable= variablenames in observations with _TYPE_='SSCP' and _NAME_= variablenames .

Computational Resources

In the following discussion, let

  • n = number of observations

  • c = number of class levels

  • v = number of variables in the VAR list

  • l = length of the CLASS variable

  • t = v + c ˆ’ 1 .

Memory Requirements

The amount of memory in bytes for temporary storage needed to process the data is

click to expand

Additional temporary storage of 72 bytes at each step is also required to store the results.

Time Requirements

The following factors determine the time requirements of a stepwise discriminant analysis.

  • The time needed for reading the data and computing covariance matrices is proportional to nv 2 . The STEPDISC procedure must also look up each class level in the list. This is faster if the data are sorted by the CLASS variable. The time for looking up class levels is proportional to a value ranging from n to n ln( c ).

  • The time needed for stepwise discriminant analysis is proportional to the number of steps required to select the set of variables in the discrimination model. The number of steps required depends on the data set itself and the selection method and criterion used in the procedure. Each forward or backward step takes time proportional to ( v + c ) 2 .

Displayed Output

The STEPDISC procedure displays the following output:

  • Class Level Information, including the values of the classification variable, the Frequency of each value, the Weight of each value, and the Proportion of each value in the total sample

Optional output includes

  • Within-Class SSCP Matrices for each group

  • Pooled Within-Class SSCP Matrix

  • Between-Class SSCP Matrix

  • Total-Sample SSCP Matrix

  • Within-Class Covariance Matrices for each group

  • Pooled Within-Class Covariance Matrix

  • Between-Class Covariance Matrix, equal to the between-class SSCP matrix divided by n ( c ˆ’ 1) /c , where n is the number of observations and c is the number of classes

  • Total-Sample Covariance Matrix

  • Within-Class Correlation Coefficients and Pr > r to test the hypothesis that the within-class population correlation coefficients are zero

  • Pooled Within-Class Correlation Coefficients and Pr > r to test the hypothesis that the partial population correlation coefficients are zero

  • Between-Class Correlation Coefficients and Pr > r to test the hypothesis that the between-class population correlation coefficients are zero

  • Total-Sample Correlation Coefficients and Pr > r to test the hypothesis that the total population correlation coefficients are zero

  • descriptive Simple Statistics including N (the number of observations), Sum, Mean, Variance, and Standard Deviation for the total sample and within each class

  • Total-Sample Standardized Class Means, obtained by subtracting the grand mean from each class mean and dividing by the total-sample standard deviation

  • Pooled Within-Class Standardized Class Means, obtained by subtracting the grand mean from each class mean and dividing by the pooled within-class stan-dard deviation

At each step, the following statistics are displayed:

  • for each variable considered for entry or removal: Partial R-Square, the squared (partial) correlation, the F statistic, and Pr > F , the probability level, from a one-way analysis of covariance

  • the minimum Tolerance for entering each variable. A variable is entered only if its tolerance and the tolerances for all variables already in the model are greater than the value specified in the SINGULAR= option. The tolerance for the entering variable is 1 ˆ’ R 2 from regressing the entering variable on the other variables already in the model. The tolerance for a variable already in the model is 1 ˆ’ R 2 from regressing that variable on the entering variable and the other variables already in the model. With m variables already in the model, for each entering variable, m + 1 multiple regressions are performed using the entering variable and each of the m variables already in the model as a dependent variable. These m + 1 tolerances are computed for each entering variable, and the minimum tolerance is displayed for each.

    The tolerance is computed using the total-sample correlation matrix. It is customary to compute tolerance using the pooled within-class correlation matrix (Jennrich 1977), but it is possible for a variable with excellent discriminatory power to have a high total-sample tolerance and a low pooled within-class tolerance. For example, PROC STEPDISC enters a variable that yields perfect discrimination (that is, produces a canonical correlation of one), but a program using pooled within-class tolerance does not.

  • the variable Label, if any

  • the name of the variable chosen

  • the variables already selected or removed

  • Wilks' Lambda and the associated F approximation with degrees of freedom and Pr < F , the associated probability level after the selected variable has been entered or removed. Wilks' lambda is the likelihood ratio statistic for testing the hypothesis that the means of the classes on the selected variables are equal in the population (see the 'Multivariate Tests' section in Chapter 2, 'Introduction to Regression Procedures.' ) Lambda is close to zero if any two groups are well separated.

  • Pillai's Trace and the associated F approximation with degrees of freedom and Pr > F , the associated probability level after the selected variable has been entered or removed. Pillai's trace is a multivariate statistic for testing the hypothesis that the means of the classes on the selected variables are equal in the population (see the 'Multivariate Tests' section in Chapter 2).

  • Average Squared Canonical Correlation (ASCC). The ASCC is Pillai's trace divided by the number of groups minus 1. The ASCC is close to 1 if all groups are well separated and if all or most directions in the discriminant space show good separation for at least two groups.

  • Summary to give statistics associated with the variable chosen at each step. The summary includes the following:

    • Step number

    • Variable Entered or Removed

    • Number In, the number of variables in the model

    • Partial R-Square

    • the F Value for entering or removing the variable

    • Pr > F , the probability level for the F statistic

    • Wilks' Lambda

    • Pr < Lambda basedonthe F approximation to Wilks' Lambda

    • Average Squared Canonical Correlation

    • Pr > ASCC basedonthe F approximation to Pillai's trace

    • the variable Label, if any

ODS Table Names

PROC STEPDISC assigns a name to each table it creates. You can use these names to reference the table when using the Output Delivery System (ODS) to select tables and create output data sets. These names are listed in the following table. For more information on ODS, see Chapter 14, 'Using the Output Delivery System.'

Table 67.2: ODS Tables Produced in PROC STEPDISC

ODS Table Name

Description

PROC STEPWISE Option

BCorr

Between-class correlations

BCORR

BCov

Between-class covariances

BCOV

BSSCP

Between-class SSCP matrix

BSSCP

Counts

Number of observations, variables, classes, df

default

CovDF

DF for covariance matrices, not printed

any *COV option

Levels

Class level information

default

Messages

Entry/removal messages

default

Multivariate

Multivariate statistics

default

PCorr

Pooled within-class correlations

PCORR

PCov

Pooled within-class covariances

PCOV

PSSCP

Pooled within-class SSCP matrix

PSSCP

PStdMeans

Pooled standardized class means

STDMEAN

SimpleStatistics

Simple statistics

SIMPLE

Steps

Stepwise selection entry/removal

default

Summary

Stepwise selection summary

default

TCorr

Total-sample correlations

TCORR

TCov

Total-sample covariances

TCOV

TSSCP

Total-sample SSCP matrix

TSSCP

TStdMeans

Total standardized class means

STDMEAN

Variables

Variable lists

default

WCorr

Within-class correlations

WCORR

WCov

Within-class covariances

WCOV

WSSCP

Within-class SSCP matrices

WSSCP




SAS.STAT 9.1 Users Guide (Vol. 6)
SAS.STAT 9.1 Users Guide (Vol. 6)
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 127

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net