18.3 Graph-Search ADT Methods

DFS and the other graph-search methods that we consider later in this chapter all involve following graph edges from vertex to vertex, with the goal of systematically visiting every vertex and every edge in the graph. But following graph edges from vertex to vertex can lead us to all the vertices in only the same connected component as the starting vertex. In general, of course, graphs might not be connected, so we need one call on a search method for each connected component. We will typically use graph-search methods that perform the following steps until all of the vertices of the graph have been marked as having been visited:

Find an unmarked vertex (a start vertex).
Visit (and mark as visited) all the vertices in the connected component that contains the start vertex.

The method for marking vertices is not specified in this description, but we most often use the same method that we used for the DFS implementations in Section 18.2: We initialize all entries in a private vertex-indexed array to a negative integer, and mark vertices by setting their corresponding entry to a nonnegative value. Using this procedure amounts to using a single bit (the sign bit) for the mark; most implementations are also concerned with keeping other information associated with marked vertices in the array (such as, for the DFS implementation in Section 18.2, the order in which vertices are marked). The method for looking for a vertex in the next connected component is also not specified, but we most often use a scan through the array in order of increasing index.

We pass an edge to the search method (using a dummy self-loop in the first call for each connected component), instead of passing its destination vertex, because the edge tells us how we reached the vertex. Knowing the edge corresponds to knowing which passage led to a particular intersection in a maze. This information is useful in many DFS classes. When we are simply keeping track of which vertices we have visited, this information is of little consequence; but more interesting problems require that we always know from whence we came.

Program 18.2 is an implementation that illustrates these choices. It is a DFS-based class for computing a spanning forest, but first we just note that it uses the ord array to keep track of which vertices have been visited, allowing it to visit each vertex precisely once, as illustrated in Figure 18.8. Typically, the classes that we consider also examine all edges incident upon each vertex visited. In such cases, knowing that we visit all vertices tells us that we visit all edges as well, as in Tr maux traversal.

Figure 18.8. Graph search

The table at the bottom shows vertex marks (contents of the ord vector) during a typical search of the graph at the top. Initially, the constructor unmarks all vertices by setting the marks all to -1 (indicated by an asterisk). Then it calls searchC for the dummy edge 0-0 , which marks all of the vertices in the same connected component as (second row) by setting them to nonnegative values (indicated by s). In this example, it marks , 1 , 4 , and 9 with the values through 3 in that order. Next, it scans from left to right to find the unmarked vertex 2 and calls search for the dummy edge 2-2 (third row), which marks the seven vertices in the same connected component as 2 . Continuing the left-to-right scan, it calls search for 8-8 to mark 8 and 11 (bottom row). Finally, the constructor completes the search by discovering that 9 through 12 are all marked .

To compute a spanning forest, Program 18.2 adds a private array st to hold a parent-link representation of the tree ( initialized in the constructor); defines a searchC that is similar to searchC from Program 18.1, except that it takes an edge v-w as parameter and to set st[w] to v ; and adds a query method that allows clients to learn the parent of any vertex. Spanning forests are of interest in many applications, but our primary interest in them in this chapter is their relevance in understanding the dynamic behavior of the DFS, the topic of Section 18.4.

In a connected graph, the constructor in Program 18.2 calls searchC once for 0-0 and then finds that all the other vertices are marked. In a graph with more than one connected component, the constructor checks all the connected components in a straightforward manner. DFS is the first of several methods that we consider for searching a connected component. No matter which method (and no matter what graph representation) we use, we visit all the graph vertices.

Property 18.2

A graph-search method checks each edge and marks each vertex in a graph if and only if the search method that it uses marks each vertex and checks each edge in the connected component that contains the start vertex .

Proof : By induction on the number of connected components.

Thus, graph-search methods provide a systematic way of processing each vertex and each edge in a graph. Generally , our implementations are designed to run in linear or near-linear time, by doing a fixed amount of processing per edge. We prove this fact now for DFS, noting that the same proof technique works for several other search strategies.

Property 18.3

DFS of a graph represented with an adjacency matrix requires time proportional to V ² .

Proof : An argument similar to the proof of Property 18.1 shows that searchC not only marks all vertices connected to the start vertex but also calls itself exactly once for each such vertex (to mark that vertex). An argument similar to the proof of Property 18.2 shows that invoking the constructor leads to exactly one invocation of searchC for each graph vertex. In searchC , the iterator checks every entry in the vertex's row in the adjacency matrix. In other words, the search checks each entry in the adjacency matrix precisely once.

Program 18.2 Depth-first search

This class illustrates the manner in which we search in graphs that may not be connected. It is a DFS class that builds a spanning forest. The constructor builds a parent-link representation of the forest in st and of a preorder walk of the forest in ord . Clients can use a GraphDFS object to find any given vertex's parent in the forest ( ST ), or any given vertex's position in the walk ( order ). Properties of these forests and representations are the topic of Section 18.4.

All of the graph-processing classes that we consider have this same basic structure: They define a component-processing method like the recursive method searchC that, when called with a self-loop to v as its second parameter, sets a vertex-indexed array to some value of interest for each vertex t in the same connected component as v . They use the vertex-indexed array to know which vertices have been visited and call their component-processing method once for each connected component in the graph.

 class GraphDFS { private Graph G; private int cnt; private int[] ord, st; private void searchC(Edge e) { int w = e.w; ord[w] = cnt++; st[e.w] = e.v; AdjList A = G.getAdjList(w); for (int t = A.beg(); !A.end(); t = A.nxt()) if (ord[t] == -1) searchC(new Edge(w, t)); } GraphDFS(Graph G, int v) { this.G = G; cnt = 0; ord = new int[G.V()]; st = new int[G.V()]; for (int t = 0; t < G.V(); t++) { ord[t] = -1; st[t] = -1; } for (int t = 0; t < G.V(); t++) if (ord[t] == -1) searchC(new Edge(t, t)); } int order(int v) { return ord[v]; } int ST(int v) { return st[v]; } }

Property 18.4

DFS of a graph represented with adjacency lists requires time proportional to V + E .

Proof : From the argument just outlined, it follows that we call the recursive method precisely V times (hence the V term ), and we examine each entry on each adjacency list (hence the E term).

The primary implication of Properties 18.3 and 18.4 is that they establish the running time of DFS to be linear in the size of the data structure used to represent the graph. In most situations, we are also justified in thinking of the running time of DFS as being linear in the size of the graph, as well: If we have a dense graph (with the number of edges proportional to V ² ) then either representation gives this result; if we have a sparse graph, then we assume use of an adjacency-lists representation. Indeed, we normally think of the running time of DFS as being linear in E . That statement is technically not true if we are using adjacency matrices for sparse graphs or for extremely sparse graphs with E << V and most vertices isolated, but we can usually avoid the former situation, and we can remove isolated vertices (see Exercise 17.34) in the latter situation.

As we shall see, these arguments all apply to any algorithm that has a few of the same essential features of DFS. If the algorithm marks each vertex and examines all the latter's incident vertices (and does any other work that takes time per vertex bounded by a constant), then these properties apply. More generally, if the time per vertex is bounded by some function f ( V,E ), then the time for the search is guaranteed to be proportional to E + V f ( V,E ). In Section 18.8, we see that DFS is one of a family of algorithms that has just these characteristics; in Chapters 19 through 22, we see that algorithms from this family serve as the basis for a substantial fraction of the code that we consider in this book.

Much of the graph-processing code that we examine is ADT-implementation code for some particular task, where we develop a class that does a basic search to compute structural information in other vertex-indexed arrays. Many of our graph-processing classes are of this nature because we typically can uncover a graph's structure by searching it. We normally add code to the search method that is executed when each vertex is marked, instead of working with a more generic search (for example, one that calls a specified method each time a vertex is visited), solely to keep the code compact and self-contained. Providing a more general ADT mechanism for clients to process all the vertices with a client-supplied method is a worthwhile exercise (see Exercises 18.13 and 18.14).

In Sections 18.5 and 18.6, we examine numerous graph-processing methods that are based on DFS. In Sections 18.7 and 18.8, we look at other constructor implementations and at some graph-processing methods that are based on them. Although we do not build this layer of abstraction into our code, we take care to identify the basic graph-search strategy underlying each algorithm that we develop. For example, we use the term DFS class to refer to any implementation that is based on the recursive DFS scheme. The simple- path “search class Program 17.11 and the spanning-forest class Program 18.2 are examples of DFS classes.

Many graph-processing methods are based on the use of vertex-indexed arrays. We typically include such arrays as private data fields in class implementations to hold information about the structure of graphs (which is discovered during the search) that helps us solve the problem at hand. Examples of such arrays are the deg array in Program 17.11 and the ord array in Program 18.1. Some implementations that we will examine use multiple arrays to learn complicated structural properties.

Our convention in graph-search methods is to initialize all entries in vertex-indexed arrays to -1 and to set the entries corresponding to each vertex visited to nonnegative values in the search method. Any such array can play the role of the ord array (marking vertices as visited) in Programs 18.2 and 18.3. When a graph-search method is based on using or computing a vertex-indexed array, we often just implement the search and use that array to mark vertices, rather than maintaining the ord array.

The specific outcome of a graph search depends not just on the nature of the search method but also on the graph representation and even the order in which the constructor examines the vertices. For specificity in the examples and exercises in this book, we use the term standard adjacency-lists DFS to refer to the process of inserting a sequence of edges into a graph ADT implemented with an adjacency-lists representation (Program 17.9), then doing a DFS with, for example, Program 18.2. For the adjacency-matrix representation, the order of edge insertion does not affect search dynamics, but we use the parallel term standard adjacency-matrix DFS to refer to the process of inserting a sequence of edges into a graph ADT implemented with an adjacency-matrix representation (Program 17.7), then doing a DFS with, for example, Program 18.2.

Exercises

18.8 Show, in the style of Figure 18.5, a trace of the recursive method calls made for a standard adjacency-matrix DFS of the graph

18.9 Show, in the style of Figure 18.7, a trace of the recursive method calls made for a standard adjacency-lists DFS of the graph

18.10 Modify the adjacency-matrix graph ADT implementation in Program 17.7 to use a dummy vertex that is connected to all the other vertices. Then, provide a simplified DFS implementation that takes advantage of this change.

18.11 Do Exercise 18.10 for the adjacency-lists ADT implementation in Program 17.9.

18.12 There are 13! different permutations of the vertices in the graph depicted in Figure 18.8. How many of these permutations could specify the order in which vertices are visited by a standard (adjacency-matrix or adjacency lists) DFS?

18.13 Implement a graph ADT client method that calls a client-supplied method for each vertex in the graph.

18.14 Implement a graph ADT client that calls a client-supplied method for each edge in the graph. Such a method might be a reasonable alternative to extracting the edges (see Program 17.2).