# GRAPHICAL QUERY SETS

If the set Q of previously answered queries is graphical (that is, if the answer map of Q is a graph), then the procedure for testing the membership of the target-query Q in the closure of Q can be efficiently implemented in the two cases and moreover, if Q is an elementary query, then the tightest lower and upper bounds on the value of Q can be efficiently computed with flow-network methods in the case . Before showing that, in the next subsection we introduce some basic definitions from the theory of graphs (Bondy & Murty, 1976) and of flow networks (Ahuja, Magnanti & Orlin, 1993).

### Graphs and Networks

An edge e is incident to vertex v if v belongs to e. An edge incident to exactly one vertex is called a self-loop, and an edge that is not a self-loop is called a link. Two vertices are adjacent if they belong to some edge, of which they are the endpoints. Let H = (V, E) be a graph. Given a subset E of E, the subgraph of H induced by E is the graph (V, E); moreover, the subgraph of H induced by EE is denoted by HE. Given a subset V of V, the subgraph of H induced by V is the graph (V, E) where E = {{u, v} E: u and v are both in V}; moreover, the subgraph of H induced by VV is denoted by HV. A path in H is a sequence (v1,, vk) of distinct vertices such that, for all h < k, (vh, vh+1) is an edge of H; then, the vertices v1 and vk are called the endpoints of the path. If (v1,, vk) is a path in H, and the pair {v1, vk) is an edge of H, then the sequence (v1,, vk, v1) is called a cycle, which is said to be even (or odd) if k is even (respectively, odd). H is bipartite if it contains no odd cycles. Two vertices of H are connected if they are the endpoints of some path in H, and H is connected if every two vertices of H are connected. The subgraphs of H induced by its maximal sets of pairwise connected vertices are called the connected components of H. The bipartite components of H are the connected components of H that are bipartite. Most of the results stated in this section rely on the following fact (Dantzig, 1963; Conforti & Rao, 1987):

The rank of the incidence matrix of H is equal to |V|r where r is the number of bipartite components of H.

For subsets V and V′′ of V, we denote by [V, V′′]H the (possibly empty) set of edges of H with one end in V and the other in V′′. An edge cut of H is a nonempty subset of E of the form [V, V V]H where V is a proper subset of V. If H is a bipartite connected graph with at least one edge, then there exists a bipartition {V1, V2} of V such that the edge set E of H coincides with the edge cut [V1, V2]H; then the sets V1 and V2 are called the sides of H. A bond of H is a minimal (with respect to set-inclusion) edge cut of H. (Note that every edge cut of H is the disjoint union of bonds of H.) An edge e of H is a cut edge (or a "bridge") if {e} is a bond of H. A bond F of a bipartite graph H is simple (Malvestuto, 1993) or "bipartite" (Kao, 1997) if, for each connected component H of HF, F is incident to at most one side of H (see Figure 3).

Figure 3: A Simple Bond of a Bipartite Graph

A mixed graph is a generalization of an ordinary graph. A mixed graph is a pair H = (V, E E) where E is a collection of subsets of V of cardinality 1 or 2, and E is a set of ordered pairs of elements of V. The elements of E are called undirected edges of H, and the elements of are called directed edges of H. If the set of undirected edges of H is empty, then H is called a digraph. A directed path in H from vertex u to vertex u is a sequence (v1,, vk) of distinct vertices such that u = v1, u = vk and for all h < k, {vh, vh+1} is in E or <vh, vh+1> is in. Two vertices u and u are strongly connected if there exist a directed path from u to u and a directed path from u to u. A mixed graph is strongly connected if every two vertices are strongly connected. The strongly connected components of H are the subgraphs of H induced by its maximal sets of pairwise strongly connected vertices. The cross-component edges of H are the directed edges joining two strongly connected components of H.

A network N is a digraph (V,E ) with two distinguished vertices, called the source and the sink of N respectively, where each directed edge e has associated to it a nonnegative quantity (possibly, +), which is called the capacity of e and denoted by c(e). Let s and t be the source and the sink of N, respectively. A flow in N is a real-valued function φ defined on such that

and

the summations being extended over all v such that <v, u> E and <u, v> E, respectively. A maximum flow in N is a flow in N that maximizes the quantity

the summation being extended over all v such that <s, v> E. Maximum flows in N can be computed in O(|V|3) time.

Example 4: Consider the network N shown in Figure 4.

Figure 4

A maximum flow in N is shown in Figure 5.

Figure 5

### Testing Invariance

Let Q be a graphical query set, and let (H, q) be the answer map of Q. Without loss of generality, in what follows we assume that H is a connected graph with m edges and n vertices. We separately discuss the problem of testing the sum expression Σe F x(e) for D-invariance in the cases

The Case

We saw that deciding if the sum expression Σe F x(e) is an -invariant of (H, q) is equivalent to testing the consistency of equation system (3). We now show that this test can be carried out in linear time (Malvestuto & Moscarini, 1999). Since the rank of the incidence matrix of H is equal to n1 if H is bipartite, and is equal to n otherwise, equation system (3) has at most 1 solutions and can be transformed to an equivalent system Λ of mn+1 linear equations in one unknown, denoted by λ. It consists of choosing an arbitrary vertex r of H, and performing a depth-first search (DFS) traversal of H with start-vertex r. During the traversal of H, each vertex v, when v is first visited, will be labeled by a linear expression v(λ) in λ and, for each edge not in the traversal tree of H, a linear equation in λ is added to Λ as is stated by the following algorithm whose output variable inv is True if and only if the sum expression Σe F x(e) is an -invariant of (H, q).

Algorithm -INVARIANCE

Input: A connected graph H, the incidence vector f of F, and a vertex r of H.

Step 1. inv := False; r(λ) := λ; Λ := Ø.

Step 2. Start a DFS traversal of H at vertex r. During the traversal of H, when vertex v is reached using edge e, then

Step 3. Test Λ for consistency. If Λ turns out to be consistent, then set inv := True.

It is clear that equation system (3) is equivalent to equation system Λ. Moreover, an equation in Λ of the form u(λ) + v(λ) = f(e) can have no solutions (i.e., the equation is impossible), one solution (the equation is determined), or solutions (the equation is an identity), and an equation in of the form v(λ) = f(e) has always one solution. Note that an equation in Λ is determined if and only if it corresponds to a non-tree edge of H whose addition to the traversal tree creates an odd cycle. Since Λ is consistent if and only if no equation in Λ is impossible and the determined equations in Λ (if any) have all the same solution, it is easy during the traversal of H to check the consistency of Λ so that one can decide in linear time whether equation system Λ and, hence, equation system (3) is consistent and, if this is the case and λo is a solution of equation system Λ, one can determine a solution y of equation system (3) by taking yv = v(λo).

Example 5: Consider again the query set Q = {Q1, Q2, Q3} of Example 3, whose answer map is shown in Figure 2, and assume that D =. Suppose that F = {e1, e3}. Using algorithm -INVARIANCE, we can label the vertices as shown in Figure 6 (we are assuming that the DFS traversal starts at vertex 1 and reaches vertices 2 and 3 using edges e2 and e3).

Figure 6

Accordingly, the equation system Λ is the pair of the following two equations corresponding to e1 and e3 respectively:

and turns out to be inconsistent. Therefore, we can conclude that the sum expression Σe F x(e) is not an -invariant of (H, q).

Using algorithm -INVARIANCE, one can prove that, if H is bipartite, then the sum expression Σe F x(e) is an -invariant of (H, q) if and only if F is the disjoint union of simple bonds (Malvestuto, 1993; Kao, 1997); as a consequence, if the target-query is an elementary query, that is, F = {e} for some edge e of H, then Q belongs to the closure of Q if and only if e is a cut edge of H.

The Case

We now prove that the procedure for deciding if the sum expression Σe F x(e) is an -invariant of (H, q) can be implemented in O(n3) time using network computation. We distinguish two cases depending on whether H is or is not bipartite.

Assume that H = (V, E) is a bipartite graph with sides V1 and V2. Let Net(H, q) be the network (V {s, t}) with

where the capacities of the directed edges of Net(H, q) are defined as follows

Gusfield (1988) proved that:

1. The solutions of the constraint system associated with Q coincide with the restrictions to E of maximum flows in Net(H, q).

2. Given a maximum flow φ in Net(H, q), let H(φ) be the mixed graph obtained from H by directing each edge {u, v}, u V1, and v V2, with φ(<u, v>) = 0 from V1 to V2; then the set of cross-component edges of H(φ) equals kernel(H).

Therefore, the following algorithm correctly decides if the sum expression Σe F x(e) is an -invariant of (H, q).

Step 1. Construct the network Net(H, q) and find a maximum flow φ in Net(H, q).

Step 2. Let w be the restriction of φ to E. Construct the mixed graph H(φ) obtained from H by directing from V1 to V2 each edge {u, v}, u V1, and v V2, with j(<u, v>) = 0.

Step 3. Set kernel(H) to the set of cross-component edges of H(w).

Step 4. For each connected component H = (V, E) of the graph Hkernel(H) with F E Ø, apply algorithm -INVARIANCE with input H and the incidence vector of F E. If each time the algorithm terminates with inv = True, then and only then conclude that the sum expression Σe F x(e) is an -invariant of (H, q).

Since a maximum flow in Net(H, q) can be computed in O(n3) time, algorithm -INVARIANCE can be implemented in O(n3) time.

Example 6: Let Q = {Q1, Q2, Q3, Q4, Q5, Q6} be a set of additive queries with values in , whose answer map is shown in Figure 7.

Figure 7

Then, Net(H, q) is the network of Figure 4 and a maximum flow in in Net(H, q) is shown in Figure 5. The mixed graph H(φ) is shown in Figure 8.

Figure 8

The cross-component edges of H(φ) are <1, 4>, <1, 6>, and <3, 6>. Therefore, kernel(H) is formed by the three edges e1, e3, and e7. Of course, the elementary query corresponding to each edge in kernel(H) belongs to the closure of Q. The graph Hkernel(H) is shown in Figure 9.

Figure 9

Here, each edge is a cut edge of a bipartite graph and, hence, the corresponding elementary query belongs to the closure of Q. To sum up, every elementary query (and, hence, the target-query) belongs to the closure of Q.

We now discuss the case that H is not bipartite. We associate with H a bipartite, connected graph H* = (V*, E*) which contains 2n vertices and 2ml edges, where l is the number of self-loops of H. The graph H* is constructed as follows. Let G be a bipartite partial graph of H obtained from a spanning tree of H by adding the nontree edges that create even cycles. Let V be a "copy" of V, that is, V V = Ø and |V| = |V|. If v is a vertex of H, then by v we denote the copy of v; moreover, if e = {u, v} is a link of H, then by e we denote the set {u, v}. The vertex set V* of H* is taken to be the union of V and V, that is, V* = V V. The edge set E* of H* is taken to be

where β[e] is defined as follows:

• if e is an edge of G, then b[e] = {e, e};

• if e = {u, v} is a link of H but not an edge of G, then b[e] = {{u, v}, {u, v}};

• if e is a self-loop of H, say {v}, then β[e] = {{v, v}}.

Note that since H is a nonbipartite, connected graph, H* is bipartite and connected. Furthermore, if V1 and V2 are the sides of G, then the sides of H* are V*1 = V1 V2 and V*2 = V1 V2, where Vi = {v: v Vi}, i = 1, 2. For each v in V set q*v := qv, and for each v in V, set q*v := qv.

Example 7: Consider again the query set Q = {Q1, Q2, Q3} of Example 3, whose answer map was shown in Figure 2, and assume that . By choosing as G the graph shown in Figure 10, we obtain the bipartite vertex-weighted graph H* shown in Figure 11.

Figure 10

Figure 11

Let Γ* be the system of linear constraints given by the equation system

where H* is the incidence matrix of H*, with the addition of the nonnegativity constraints

Malvestuto & Mezzini (2000) proved that:

1. If w is a solution of G, then the vector w* with

is a solution of Γ*, and if w* is a solution of Γ*, then the vector w with

is a solution of Γ.

2. kernel(H) = {e E: β[e] kernel(H*)}.

By combining these results with Gusfield's results, one has also in the case that H is not bipartite; the procedure for testing invariance can be implemented in O(n3) time.

Example 7 (continued): The constraint system Γ* reads

The general solution of Γ* is

where ρ an σ are bounded as shown in Figure 12:

Figure 12

We now show how kernel(H) can be graphically obtained. Figure 13 shows Net(H*, q*)and Figure 14 shows a maximum flow φ* in Net(H*, q*).

Figure 13

Figure 14

The mixed graph H*(φ*) coincides with H* which is strongly connected. So, kernel(H*) is empty, and kernel(H) is empty too.

#### Computing the Feasibility Range

Assume that the target-query Q is an elementary query and that e is the edge corresponding to Q. Let q and q′′ be the tightest lower and upper bounds on the value of Q. Here we show how q and q′′ can be efficiently computed in the case .

We distinguish two cases depending on whether H is or is not bipartite, and we shall solve the problem of computing q and q′′ with the method given by Gusfield (1988) in the former case, and by Malvestuto & Mezzini (2002) in the latter case.

Assume that H is a bipartite graph with sides V1 and V2, and let e = {uo, vo} with uo V1 and vo V2. We first find a maximum flow φ in the network Net(H, q). Given φ, let us construct the digraph N with vertex set V and directed-edge set

For each directed edge <u, v> of N, take its capacity to be

where M is a finite number larger than the largest qv for v in V1. Let N1 be the network with underlying digraph N having source uo and sink vo, and let φ1 be a maximum flow in N1. Analogously, let N2 be the network with underlying digraph N having source vo and sink uo, and let φ2 be a maximum flow in N2. If Φ1(uo vo) and Φ2(vo uo) are respectively the values of φ1 and φ2, then q and q′′ are given by

and

Moreover, if H is a complete bipartite graph, then α1 and α2 are simply given by

Therefore, the tightest lower and upper bounds on the value of Q can be found in O(n3) time.

Example 8: Let Q = {Q1, Q2, Q3, Q4, Q5, Q6} be a set of additive queries with values in, whose answer map is shown in Figure 15.

Figure 15

Figure 16 shows a maximum flow φ in Net(H, q).

Figure 16

Suppose that the target-query Q is the query corresponding to the edge {1, 6}. In order to compute the tightest lower bound q and the tightest upper bound q′′ on the value of Q, we construct the digraph N, which is shown in Figure 17.

Figure 17

Next, consider the network N1 on N with source vertex 1 and sink vertex 6, and with capacities defined as follows:

A maximum flow φ1 in N1 is shown in Figure 18.

Figure 18

The value of φ1 is Φ1(1 6) = 11/4 so that, by formula (5), one obtains q = max (0, φ(<1, 6>) Φ1(1 6) + 2) = max (0, 1/4 11/4 + 2) = 0.

Consider now the network N2 which differs from N1 only in that the source and the sink are vertices 6 and 1, respectively. A maximum flow φ2 in N2 is shown in Figure 19.

Figure 19

The value of φ2 is Φ2(6 1) = 1 so that, by formula (6), one obtains

Consider now the case that H is not bipartite. Let H* be a bipartite transform of H and Γ* the constraint system as stated in the "Graphs and Networks" subsection above. Let β[e] be the image of e in H*. If e is a self-loop and β[e] = {e*}, then the tightest lower and upper bounds on the value of Q are given by

so that q and q′′ can be determined in O(n3) time using Gusfield's method applied to H*. Otherwise, that is, if e is a link and β[e] = {e*1, e*2}, let

Then, the tightest lower and upper bounds on the value of Q are given by

With an example we show that α1 and α2 (and hence q and q′′) can be computed in O(n3) time.

Example 7 (continued): Suppose that the target-query Q is the elementary query corresponding to the self-loop {1}. By formula (5), the tightest lower and upper bounds on the value of Q are given by

To compute them, we can proceed as in Example 8 when min x({1, 6}) and max x({1, 6}) were computed. So, one has

Suppose now that the target-query Q is the elementary query corresponding to the edge {2, 3}. The tightest lower and upper bounds on the value of Q are respectively given by formula (7) where

We show how to compute α1 and α2 in cubic time. For both of them, we make use of a maximum flow in Net(H*, q*). Let ϕ be the maximum flow in Net(H*, q*) shown in Figure 20.

Figure 20

Figure 21

Let us begin with the computation of α1. First of all, we compute min x*({2, 3'}) subject to Γ* using the above procedure for bipartite graphs. More precisely, we construct the network N1 on the digraph with source vertex 3' and sink vertex 2, and with capacities defined as follows:

A maximum flow φ1 in N1 is shown in Figure 22.

Figure 22

The value of φ1 is Φ1(3 2) = 5/2 so that, by formula (5), one has min x*({2, 3}) = max (0, φ(<3, 2>) Φ1(3 2) + 2) = max (0, 1/2 5/ 2 + 2) = 0.

After computing min x*({2, 3}) (= 0), we calculate α1 as min x*({2, 3}) plus the quantity

To achieve this, let (H*{2, 3}, q*1) be the vertex-weighted graph obtained from (H*, q*) by deleting the edge {2, 3} and subtracting min x*({2, 3}) (= 0) from the weights of the endpoints of the edge {2, 3} (see Figure 23).

Figure 23

Then, the quantity

equals min x*({2, 3}) in (H*–{2, 3}, q*1), which is found with the same technique employed to find min x*({2, 3}) in (H*, q*). That is, we first find a maximum flow χ in Net(H*{2, 3}, q*1). Let χ be the maximum flow in Net(H*{2, 3}) shown in Figure 24.

Figure 24

Next, let D be the digraph (see Figure 25) with vertex set V* and directed-edge set

Figure 25

Then, we construct the network M1 on D with source vertex 2 and sink vertex 3, and capacities defined as follows:

and compute a maximum flow in M1. Let χ1 be the maximum flow in M1 shown in Figure 26.

Figure 26

The value of χ1 is X1(2 3) = 2 so that, by formula (5), one has that min x*({2, 3}) in (H*{2, 3}, q*1) is equal to

Finally, we obtain

and, by formula (7),

As to α2, we proceed as follows. First of all, we compute max x*({2, 3}) subject to Γ*. To this end, we construct the network N2 which differs from N1 only in the source and the sink which are not set to vertex 3 and vertex 2, respectively. A maximum flow φ2 in N2 is shown in Figure 27.

Figure 27

The value of φ2 is Φ2(2 3) = 1 so that, by formula (6), one has

After computing max x*({2, 3}), we calculate α2 as max x*({2, 3}) plus the quantity

To achieve this, let (H{2, 3}, q*2) be the vertex-weighted graph obtained from H* by deleting edge {2, 3} and subtracting max x*({2, 3}) (= 1) from the weights of the endpoints of edge {2, 3} (see Figure 28).

Figure 28

Then, the quantity

equals max x*({2, 3}) in (H*{2, 3}, q*2), which is found with the same technique employed to find max x*({2, 3}) in (H*, q*). That is, we first find a maximum flow in Net(H*{2, 3}, q*2). Let Ψ the maximum flow in Net(H*{2, 3}, q*2) shown in Figure 29.

Figure 29

Then, we construct the network M2 on D with source vertex 3 and sink vertex 2, and capacities defined as follows:

and compute a maximum flow in M2. Let ψ2 be the maximum flow in M2 shown in Figure 30.

Figure 30

The value of ψ2 is Ψ2(3 2) = 1 so that, by formula (6), one has max x*({3, 2}) = Ψ2(3 2) = 1.

Finally, we obtain

and

Before closing this section, we note that the existence of an efficient membership test, and of an efficient procedure for computing the tightest lower and upper bounds on the value of an elementary query in the case D = Z0+, is an open problem; however, if H is bipartite, then the total unimodularity of its incidence matrix allows us to apply the procedures above stated for the case .

Multidimensional Databases: Problems and Solutions
ISBN: 1591400538
EAN: 2147483647
Year: 2003
Pages: 150