20.5 Boruvka s Algorithm | Algorithms in Java, Part 5: Graph Algorithms (3rd Edition) (Pt.5)

20.5 Boruvka's Algorithm

The next MST algorithm that we consider is also the oldest. Like Kruskal's algorithm, we build the MST by adding edges to a spreading forest of MST subtrees; but we do so in stages, adding several MST edges at each stage. At each stage, we find the shortest edge that connects each MST subtree with a different one, then add all such edges to the MST.

Our union-find ADT from Chapter 1 again leads to an efficient implementation. For this problem, it is convenient to extend the interface to make the find operation available to clients . We use this method to associate an index with each subtree so that we can tell quickly to which subtree a given vertex belongs. With this capability, we can implement efficiently each of the operations for Boruvka's algorithm.

First, we maintain a vertex-indexed array that identifies, for each MST subtree, the nearest neighbor. Then, we perform the following operations on each edge in the graph:

If it connects two vertices in the same tree, discard it.
Otherwise, check the nearest-neighbor distances between the two trees the edge connects and update them if appropriate.

After this scan of all the graph edges, the nearest-neighbor array has the information that we need to connect the subtrees. For each vertex index, we perform a union operation to connect it with its nearest neighbor. In the next stage, we discard all the longer edges that connect other pairs of vertices in the now-connected MST subtrees. Figures 20.14 and 20.15 illustrate this process on our sample algorithm

Figure 20.14. Boruvka's MST algorithm

The diagram at the top shows a directed edge from each vertex to its closest neighbor. These edges show that 0-2 , 1-7 , and 3-5 are each the shortest edge incident on both their vertices , 6-7 is 6 's shortest edge, and 4-3 is 4 's shortest edge. These edges all belong to the MST and comprise a forest of MST subtrees (center), as computed by the first phase of Boruvka's algorithm. In the second phase, the algorithm completes the MST computation (bottom) by adding the edge 0-7 , which is the shortest edge incident on any of the vertices in the subtrees it connects, and the edge 4-7 , which is the shortest edge incident on any of the vertices in the bottom subtree .

Figure 20.15. Union-find array in Boruvka's algorithm

This figure depicts the contents of the union-find array corresponding to the example depicted in Figure 20.14. Initially, each entry contains its own index, indicating a forest of singleton vertices. After the first stage, we have three components , represented by the indices , 1 , and 3 (the union-find trees are all flat for this tiny example). After the second stage, we have just one component, represented by 1 .

Program 20.9 is a direct implementation of Boruvka's algorithm. There are three major factors that make this implementation efficient:

The cost of each find operation is essentially constant.
Each stage decreases the number of MST subtrees in the forest by at least a factor of 2.
A substantial number of edges is discarded during each stage.

It is difficult to quantify precisely all these factors, but the following bound is easy to establish:

Property 20.11

The running time of Boruvka's algorithm for computing the MST of a graph is O ( E lg V lg ^* E ).

Proof : Since the number of trees in the forest is halved at each stage, the number of stages is no larger than lg V . The time for each stage is at most proportional to the cost of E find operations, which is less than E lg ^* E , or linear for practical purposes.

The running time given in Property 20.11 is a conservative upper bound since it does not take into account the substantial reduction in the number of edges during each stage. The find operations take constant time in the early passes, and there are very few edges in the later passes . Indeed, for many graphs, the number of edges decreases exponentially with the number of vertices, and the total running time is proportional to E . For example, as illustrated in Figure 20.16, the algorithm finds the MST of our larger sample graph in just four stages.

Figure 20.16. Boruvka's MST algorithm

The MST evolves in just four stages for this example (top to bottom).

Program 20.9 Boruvka's MST algorithm

This implementation of Boruvka's MST algorithm uses a version of the union-find ADT from Chapter 4 (with the single-parameter find added to the interface) to associate indices with MST subtrees as they are built. Each phase checks all the remaining edges; those that connect disjoint subtrees are kept for the next phase. The array a has the edges not yet discarded and not yet in the MST. The index N is used to store those being saved for the next phase (the code resets E from N at the end of each phase) and the index h is used to access the next edge to be checked. Each component's nearest neighbor is kept in the array b with find component numbers as indices. At the end of each phase, each component is united with its nearest neighbor and the nearest-neighbor edges added to the MST.

 class GraphMST { private UF uf; private Edge[] a, b, mst; GraphMST(Graph G) { Edge z = new Edge(0, 0, maxWT); uf = new UF(G.V()); a = GraphUtilities.edges(G); b = new Edge[G.V()]; mst = new Edge[G.V()+1]; int N, k = 1; for (int E = G.E(); E != 0; E = N) { int h, i, j; for (int t = 0; t < G.V(); t++) b[t] = z; for (h = 0, N = 0; h < E; h++) { Edge e = a[h]; i = uf.find(e.v()); j = uf.find(e.w()); if (i == j) continue; if (e.wt() < b[i].wt()) b[i] = e; if (e.wt() < b[j].wt()) b[j] = e; a[N++] = e; } for (h = 0; h < G.V(); h++) if (b[h] != z) if (!uf.find(i = b[h].v(), j = b[h].w())) { uf.unite(i, j); mst[k++] = b[h]; } } } }

It is possible to remove the lg ^* E factor to lower the theoretical bound on the running time of Boruvka's algorithm to be proportional to E lg V , by representing MST subtrees with doubly linked lists instead of using the union and find operations. This improvement is sufficiently more complicated to implement and the potential performance improvement sufficiently marginal that it may not be worth considering for use in practice (see Exercises 20.67 and 20.68).

Boruvka's is the oldest of the algorithms that we consider: It was originally conceived in 1926 for a power-distribution application. The method was rediscovered by Sollin in 1961; it later attracted attention as the basis for MST algorithms with efficient asymptotic performance and as the basis for parallel MST algorithms.

Exercises

20.62 Show, in the style of Figure 20.14, the result of computing the MST of the network defined in Exercise 20.27 with Boruvka's algorithm.

20.63 Why does Program 20.9 do a find test before doing the union operation? Hint : Consider equal-length edges.

20.64 Explain why b(h) could be null in the test in Program 20.9 that guards the union operation.

20.65 Describe a family of graphs with V vertices and E edges for which the number of edges that survive each stage of Boruvka's algorithm is sufficiently large that the worst-case running time is achieved.

20.66 Develop an implementation of Boruvka's algorithm that is based on a presorting of the edges.

20.67 Develop an implementation of Boruvka's algorithm that uses doubly linked circular lists to represent MST subtrees so that subtrees can be merged and renamed in time proportional to E during each stage (and the equivalence-relations ADT is therefore not needed).

20.68 Do empirical studies to compare your implementation of Boruvka's algorithm in Exercise 20.67 with the implementation in the text (Program 20.9) for various weighted graphs (see Exercises 20.9 “14).

20.69 Do empirical studies to tabulate the number of stages and the number of edges processed per stage in Boruvka's algorithm for various weighted graphs (see Exercises 20.9 “14).

20.70 Develop an implementation of Boruvka's algorithm that constructs a new graph (one vertex for each tree in the forest) at each stage.

20.71 Write a client program that does dynamic graphical animations of Boruvka's algorithm (see Exercises 20.53 and 20.61). Test your program on random Euclidean neighbor graphs and on grid graphs (see Exercises 20.18 and 20.20), using as many points as you can process in a reasonable amount of time.