Chapter VIII - Mining Text Documents for Thematic Hierarchies Using Self-Organizing Maps
Data Mining: Opportunities and Challenges
by John Wang (ed) 
Idea Group Publishing 2003
In this chapter, we presented a method to automatically generate category hierar-chies and identify category themes. The documents were first transformed to a set of feature vectors. The vectors were used as input to train the self-organizing map. Two maps the word cluster map and the document cluster map were obtained by labeling the neurons in the map with words and documents, respectively. An automatic category generation process was applied to the document cluster map to find some dominating neurons that are centroids of some super-clusters. The category terms of super-clusters were also determined. The same processes were applied recursively to each super-clusters to reveal the structure of the categories. Our method used neither human-provided terms nor predefined category structure. Text categorization can easily be achieved in our method.

