GLOBAL GRAPHICAL MODEL CHOICE | Data Mining: Opportunities and Challenges

data mining: opportunities and challenges

Chapter I - A Survey of Bayesian Data Mining
Data Mining: Opportunities and Challenges
by John Wang (ed)
Idea Group Publishing 2003


	Brought to you by Team-Fly

If we have many variables, their interdependencies can be modeled as a graph with vertices corresponding to the variables. The example in Figure 3 is from Madigan and Raftery(1994) and shows the dependencies in a data matrix related to heart disease. Of course, a graph of this kind can give a data probability to the data matrix in a way analogous to the calculations in the previous section, although the formulas become rather involved and the number of possible graphs increases dramatically with the number of variables. It is completely infeasible to list and evaluate all graphs if there is more than a handful of variables. An interesting possibility to simplify the calculations would use some kind of separation, so that an edge in the model could be given a score independent of the inclusion or exclusion of most other potential edges. Indeed, the derivations of last section show how this works. Let C in that example be a compound variable, obtained by merging columns c1, cd. If two models G and G' differ only by the presence and absence of the edge (A, B), and if there is no path between A and B except through vertex set C, then the expressions for and above will become factors of the expressions for and , respectively, and the other factors will be the same in the two expressions. Thus, the Bayes factor relating the probabilities of G and G' is the same as that relating M4 and M3. This result is independent of the choice of distributions and priors of the model, since the structure of the derivation follows the structure of the graph of the model it is equally valid for Gaussian or other data models, as long as the parameters of the participating distributions are assumed independent in the prior assumptions.

click to expand
Figure 3: Symptoms and causes relevant to heart problem.

We can now think of various "greedy" methods for building high probability interaction graphs relating the variables (columns in the data matrix). It is convenient and customary to restrict attention to either decomposable (chordal) graphs or directed acyclic graphs. Chordal graphs are fundamental in many applications of describing relationships between variables (typically variables in systems of equations or inequalities). They can be characterized in many different but equivalent ways (see Rose, 1970). One simple way is to consider a decomposable graph as consisting of the union of a number of maximally connected complete graphs (cliques, or maximally connected subgraphs), in such a way that (i) there is at least one vertex that appears in only one clique (a simplicial vertex), and (ii) if an edge to a simplicial vertex is removed, another decomposable graph remains, and (iii) the graph without any edges is decomposable. A characteristic feature of a simplicial vertex is that its neighbors are completely connected by edges. If the graph G' obtained by adding an edge between s and n to G is also decomposable, we will call such an edge a permissible edge of G. This concept implies a generation structure (a directed acyclic graph whose vertices are decomposable graphs on the set of vertices) containing all decomposable graphs on the variable set. An interesting feature of this generation process is that it is easy to compute the Bayes factor comparing the posterior probabilities of the graphs G and G' as graphical models of the data: let s correspond to A, n to B, and the compound variable obtained by fusing the neighbors of s to C in the analysis of Section 5. Without explicit prior model probabilities we have:

A search for high probability graphs can now be organized as follows:

Start from the graph G0 without edges.
Repeat: find a number of permissible edges that give the highest Bayes factor, and add the edge if the factor is greater than 1. Keep a set of highest probability graphs encountered.
Then repeat: For the high probability graphs found in the previous step, find simplicial edges whose removal increases the Bayes factor the most.

For each graph kept in this process, its Bayes factor relative to G0 can be found by multiplying the Bayes factors in the generation sequence. A procedure similar to this one is reported by Madigan and Raftery (1994), and its results on small variable sets was found good, in that it found the best graphs reported in other approaches. For directed graphical models, a similar method of obtaining high probability graphs is known as the K2 algorithm (Berthold & Hand, 1999).


	Brought to you by Team-Fly