Section 4.10. Preparing Multilevel Operations

4.10 Preparing Multilevel Operations

4.10.1 Problem

Your system performs many multilevel, hierarchical operations, and the performance of those operations has not been satisfactory. You need to improve that poor performance.

4.10.2 Solution

One solution is to store additional accessibility information into a service table, and then use that information when querying your hierarchical table. For example, the following ProjectPaths table records the path to, and the depth of, each vertex in the Projects table:

 CREATE TABLE ProjectPaths(    VertexId INTEGER,    Depth INTEGER,    Path VARCHAR(300) )

After creating the ProjectPaths table, you can use the following procedure to fill the table with the depth and path information for each vertex in the Projects table:

 CREATE PROCEDURE BuildProjectPathsRecursive @VertexId INTEGER AS SET NOCOUNT ON    DECLARE @Path VARCHAR(300)    DECLARE @Depth INTEGER    SELECT @Depth=a.Depth,@Path=a.Path    FROM ProjectPaths a JOIN Projects p ON p.parent=a.vertexId    WHERE @vertexId=p.vertexId            DELETE FROM ProjectPaths WHERE VertexId=@VertexId    INSERT INTO ProjectPaths VALUES(          @VertexId,           isnull(@Depth,0)+1,           isnull(@Path,'.')+CAST(@VertexId AS VARCHAR(15))+'.')    DECLARE subprojects CURSOR LOCAL FOR       SELECT VertexId FROM Projects p WHERE Parent=@VertexId            OPEN subprojects       FETCH NEXT FROM subprojects INTO @VertexId       WHILE @@FETCH_STATUS=0 BEGIN          EXEC BuildProjectPathsRecursive @VertexId          FETCH NEXT FROM subprojects INTO @VertexId       END    CLOSE subprojects    DEALLOCATE subprojects SET NOCOUNT OFF

This procedure takes one parameter, which tells the procedure with which node to start. The procedure then works its way down the hierarchy. To process all nodes in the Projects table, invoke this procedure and pass a value of 1 as follows :

 BuildProjectPathsRecursive 1

The procedure fills the ProjectPaths table with additional information for every vertex. The Depth column records the depth of each vertex. The Path column records the path to each vertex. In the path, the vertex numbers are separated by dots. The ProjectPaths table that will be built contains the following rows:

 VertexId    Depth       Path ----------- ----------- -----------  1           1           .1. 2           2           .1.2. 3           3           .1.2.3. 4           3           .1.2.4. 5           3           .1.2.5. 6           3           .1.2.6. 7           4           .1.2.6.7. 8           2           .1.8. 9           3           .1.8.9. 10          3           .1.8.10. 11          4           .1.8.10.11. 12          3           .1.8.12. 13          2           .1.13. 14          3           .1.13.14. 15          3           .1.13.15. 16          3           .1.13.16. 17          2           .1.17. 18          3           .1.17.18. 19          2           .1.19.

4.10.3 Discussion

The idea for this recipe has been taken from an article published by Itzik Ben-Gan ( SQL Server Magazine , June, 2000). His development of this technique is a recent achievement resulting from his search for an ultimate support structure to improve the efficiency of the classical hierarchical model. Although it was originally promoted as an add-on to an existing hierarchy table, we see no reason why you shouldn't normalize properly and separate the hierarchy and its data from the support structure.

The path leading to every vertex is stored in the ProjectPaths table. This represents the work of traversing the hierarchy, and because it is stored in the ProjectPaths table, it only needs to be done once. Please note that the length of the Path field can be changed according to your needs. It does, however, make sense to keep it reasonably small, especially if you want to index it.

The stored procedure named BuildProjectPathsRecursive fills the ProjectPaths table with the paths to each vertex in the subtree . It uses the recursive traversal algorithm introduced earlier in this chapter and runs the following code for each vertex:

 SELECT @Depth=a.Depth,@Path=a.Path FROM ProjectPaths a JOIN Projects p ON p.parent=a.vertexId WHERE @vertexId=p.vertexId     DELETE FROM ProjectPaths WHERE VertexId=@VertexId INSERT INTO ProjectPaths VALUES(       @VertexId,        isnull(@Depth,0)+1,        isnull(@Path,'.')+CAST(@VertexId AS VARCHAR(15))+'.')

The SELECT statement reads the depth and path data from the parent. Next, any old information for the vertex is deleted from the ProjectPaths table, and new data is inserted. If the @Depth or @Path variables are null, indicating that no access path for the parent exists, then an initial value of 0 is set for the depth, and an initial value of a dot (.) is set for the path. Regardless of how the depth gets set, it is increased by one. That's because the @Depth variable represents the depth of the current node's parent. You have to increment that depth by 1 to get the current node's depth. Similarly, the @Path variable contains the path to the parent. The current vertex ID is appended onto that path to yield the path to the current node. These new depth and path values are then inserted into the ProjectPaths table.

If you prefer nonrecursive algorithms, you can rewrite the recursive BuildProjectPathsRecursive procedure as a nonrecursive procedure. This code is as follows and uses the stack-based technique shown earlier in the recipe titled Section 4.9:

 CREATE PROCEDURE BuildProjectsPaths  @VertexId INTEGER AS SET NOCOUNT ON    DECLARE @lvl INTEGER    CREATE TABLE #stack (       VertexId INTEGER,        Lvl INTEGER    )    SELECT @Lvl = 1      INSERT INTO #stack        SELECT VertexId,1 FROM Projects WHERE VertexId=@VertexID        WHILE @Lvl > 0 BEGIN       IF EXISTS (SELECT * FROM #stack WHERE lvl = @lvl) BEGIN          SELECT TOP 1 @VertexId = VertexId FROM #stack              WHERE lvl = @lvl             ORDER BY VertexId               DELETE FROM ProjectPaths WHERE VertexId=@VertexId           INSERT INTO ProjectPaths             SELECT p.vertexId,                 isnull(a.Depth,0)+1,                isnull(a.Path,'.')+CAST(p.VertexId AS VARCHAR(15))+'.'                FROM ProjectPaths a,Projects p                WHERE @vertexId=p.vertexId AND p.parent*=a.vertexId          DELETE FROM #stack WHERE vertexId = @VertexId          INSERT #stack             SELECT VertexId, @lvl + 1 FROM Projects              WHERE parent = @VertexId          IF @@ROWCOUNT > 0             SELECT @lvl = @lvl + 1       END ELSE           SELECT @lvl = @lvl - 1           END SET NOCOUNT OFF

Maintaining the ProjectPaths Table

Once you create a table like the ProjectsPaths table, how do you maintain it? Some authors recommend that the mechanism to recalculate paths be included into a trigger that is fired whenever a new row is inserted into the hierarchy table. This is a useful recommendation and, depending on your needs, may even be necessary. However, the overhead such a trigger would entail in times of heavy insertion activity might be significant. If you have a table that is updated infrequently and on a batch basis, you may get better overall performance from invoking a procedure such as BuildProjectsPaths, following each batch load.

The advantage of the proposed procedure is that it reinserts the new paths only for the new node that you pass to it and for its possible subtrees. It does not reprocess nodes outside of that hierarchy.

The only thing you need worry about when deleting nodes is that whenever you delete a node from the hierarchy table (Projects), you must also delete the corresponding support row from the service table (ProjectPaths). You can easily achieve this by setting up a cascading delete foreign key on the Projects table:

 ALTER TABLE ProjectPaths ADD    CONSTRAINT ProjectPaths_FK FOREIGN KEY(VertexId)    REFERENCES Projects(VertexId) ON DELETE CASCADE