WHAT IS CLUSTER ANALYSIS?


Cluster analysis is the name for a group of multivariate techniques whose primary purpose is to group objects based on the characteristics they possess. Cluster analysis classifies objects (e.g., respondents, products, or other entities) so that each object is very similar to others in the cluster with respect to some predetermined selection criterion. The resulting clusters of objects should then exhibit high internal (within-cluster) homogeneity and high external (between-cluster) heterogeneity. Thus, if the classification is successful, the objects within clusters will be close together when plotted geometrically , and different clusters will be far apart.

In cluster analysis, the concept of the variate is again a central issue, but in a quite different way from other multivariate techniques. The cluster variate is the set of variables representing the characteristics used to compare objects in the cluster analysis. Because the cluster variate includes only the variables used to compare objects, it determines the "character" of the objects. Cluster analysis is the only multivariate technique that does not estimate the variate empirically but instead uses the variate as specified by the experimenter. The focus of cluster analysis is on the comparison of objects based on the variate, not on the estimation of the variate itself. This makes the experimenter's definition of the variate a critical step in cluster analysis.

Cluster analysis has been referred to as Q analysis, typology construction, classification analysis, and numerical taxonomy. This variety of names is due in part to the usage of clustering methods in such diverse disciplines as psychology, biology, sociology, economics, engineering, and business. Although the names differ across disciplines, the methods all have a common dimension: classification according to natural relationships (Aldenderfer and Blashfield, 1984; Anderburg, 1973; Bailey, 1994; Sneath and Sokal, 1973; Everitt, 1980). This common dimension represents the essence of all clustering approaches. As such, the primary value of cluster analysis lies in the classification of data, as suggested by "natural" groupings of the data themselves . Cluster analysis is comparable to factor analysis in its objective of assessing structure. But cluster analysis differs from factor analysis in that cluster analysis groups objects, whereas factor analysis is primarily concerned with grouping variables.

Cluster analysis is a useful data analysis tool in many different situations. For example, a researcher who has collected data by means of a questionnaire may be faced with a large number of observations that are meaningless unless classified into manageable groups. Cluster analysis can perform this data reduction procedure objectively by reducing the information from an entire population or sample to information about specific, smaller subgroups. For example, if we can understand the attitudes of a population by identifying the major groups within the population, then we have reduced the data for the entire population into profiles of a number of groups. In this fashion, the researcher has a more concise , understandable description of the observations, with minimal loss of information.

Cluster analysis is also useful when a researcher wishes to develop hypotheses concerning the nature of the data or to examine previously stated hypotheses. For example, an engineer may believe that attitudes toward performance of a car versus comfortable ride could be used to separate consumers into logical segments or groups. Cluster analysis can classify the performance consumers by their attitudes versus consumers who prefer comfort , and the resulting clusters, if any, can be profiled for demographic similarities and differences.

These examples are just a small fraction of the types of applications of cluster analysis. Ranging from the derivation of taxonomies in biology for grouping all living organisms, to psychological classifications based on personality and other personal traits, to segmentation analyses of marketers, cluster analysis has always had a strong tradition of grouping people. This tradition has been extended to classifying objects, including the market structure, analyses of the similarities and differences among new products, and performance evaluations of firms to identify groupings based on the firms' strategies or strategic orientations. The result has been an explosion of applications in almost every area of inquiry, creating not only a wealth of knowledge on the use of cluster analysis but also the need for a better understanding of the technique to minimize its misuse.

Yet, along with the benefits of cluster analysis come some caveats. Cluster analysis can be characterized as descriptive, atheoretical, and noninferential. Cluster analysis has no statistical basis upon which to draw statistical inferences from a sample to a population, and it is used primarily as an exploratory technique. The solutions are not unique, as the cluster membership for any number of solutions is dependent upon many elements of the procedure, and many different solutions can be obtained by varying one or more elements. Moreover, cluster analysis will always create clusters, regardless of the "true" existence of any structure in the data. Finally, the cluster solution is totally dependent upon the variables used as the basis for the similarity measure. The addition or deletion of relevant variables can have a substantial impact on the resulting solution. Thus, the experimenter must take particular care in assessing the impact of each decision involved in performing a cluster analysis.




Six Sigma and Beyond. Statistics and Probability
Six Sigma and Beyond: Statistics and Probability, Volume III
ISBN: 1574443127
EAN: 2147483647
Year: 2003
Pages: 252

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net