This chapter emphasized the statistical approach to data mining. Meaning of data is therefore linked to its properties relative to statistical models. It also emphasizes a Bayesian approach, mainly because of its intellectual appeal. The techniques proposed for analytical models and EM classification can be applied to very large data sets, possibly after small tunings of the "obvious"algorithm. On the other hand, Markov Chain Monte Carlo methods are not used today on gigantic data sets. The methods described here are easy to apply using general-purpose software like C, Matlab, Octave, R, etc. An ongoing development is the adaptation of the simple family of models described here to large and complex applications. This leads to increased complexity. In the Bayesian paradigm, increasing complexity means using more detailed models and more application-specific assumptions in prior distributions. But this is also a complete description of what can be done, besides improving performance by careful implementation and integration, and carefully selected approximate computational methods.
| |||||||||||||||||||||||||||||||||
|