Section 4.9. Aggregating Hierarchies

   

4.9 Aggregating Hierarchies

4.9.1 Problem

You want to aggregate hierarchical data. For example, you wish to sum the cost of a project or task, beginning from a specific vertex and working your way down through all levels of the hierarchy underneath that vertex.

4.9.2 Solution

With respect to the projects example that we have been using for these general hierarchy recipes, you can use the following stored procedure to summarize the costs of all subprojects for a specific project. You specify the project by passing its vertex ID as a parameter to the procedure.

 CREATE PROCEDURE AggregateProjects @VertexId INTEGER AS SET NOCOUNT ON    DECLARE @lvl INTEGER    DECLARE @Name VARCHAR(20)     DECLARE @Cost INTEGER    DECLARE @Sum INTEGER    CREATE TABLE #stack (       VertexId INTEGER,        Name VARCHAR(20),       Cost INTEGER,       Lvl INTEGER    )        SELECT @Sum=0    SELECT @Lvl = 1        INSERT INTO #stack        SELECT VertexId, Name, Cost, 1 FROM Projects        WHERE VertexID=@VertexId          WHILE @Lvl > 0 BEGIN       IF EXISTS (SELECT * FROM #stack WHERE lvl = @lvl) BEGIN          SELECT TOP 1 @VertexId = VertexId, @Name=Name, @Cost=Cost              FROM #stack WHERE lvl = @lvl             ORDER BY VertexId          /* to change action each vertex, change this line */          SELECT @Sum=@Sum+@Cost          /* ******* */          DELETE FROM #stack WHERE vertexId = @VertexId          INSERT #stack             SELECT VertexId, Name, Cost, @lvl + 1              FROM Projects WHERE parent = @VertexId          IF @@ROWCOUNT > 0             SELECT @lvl = @lvl + 1       END ELSE           SELECT @lvl = @lvl - 1           END    PRINT 'Sum of project costs is '+STR(@Sum)+'.'    DROP TABLE #stack SET NOCOUNT OFF 

The following examples show the results of executing this procedure for all projects (VertexId=1) and for the Specifications project (VertexId=2).

 AggregateProjects 1 Sum of project costs is         231. AggregateProjects 2 Sum of project costs is         33. 

4.9.3 Discussion

The algorithm shown here is adapted from the SQL Server Books Online recommendation for expanding hierarchies. Those familiar with algorithms and data structures will notice that it is a nonrecursive implementation of a recursive traversing algorithm. The code uses an internal stack implemented as a temporary table. The stack holds interesting information from the hierarchical Projects table, plus an additional column named lvl. The lvl column records the level that each entry holds in the hierarchy. The stack definition is as follows :

 CREATE TABLE #stack (       VertexId INTEGER,        Name VARCHAR(20),       Cost INTEGER,       Lvl INTEGER    ) 

After the #stack table is created, two variables are initialized . The @lvl variable tracks the current level on which the code is operating, while the @sum variable accumulates the sum of all costs. Next, an INSERT statement is used to store the root onto the stack. Note that the root in our case is the vertex identified by @VertexId.

The code then loops through each element on the stack. The loop begins by popping one element from the stack. The retrieved data contains a cost, which is added to the total cost being accumulated in @sum:

 SELECT TOP 1 @VertexId = VertexId, @Name=Name, @Cost=Cost     FROM #stack WHERE lvl = @lvl    ORDER BY VertexId SELECT @Sum=@Sum+@Cost 

After accumulating the cost for the vertex just pulled from the stack, the code deletes that read vertex and adds all of its children to the stack:

 DELETE FROM #stack WHERE vertexId = @VertexId INSERT #stack    SELECT VertexId, Name, Cost, @lvl + 1     FROM Projects WHERE parent = @VertexId IF @@ROWCOUNT > 0    SELECT @lvl = @lvl + 1 

The IF statement that you see in this code ensures that if the vertex just deleted from the stack has any children, the @lvl value is increased to indicate that the code is moving one level down in the hierarchy.

The IF EXISTS clause at the beginning of the loop ensures that so long as there are some candidates available on the current level, the loop repeats and browses through them all. The SET NOCOUNT ON directive at the beginning of the procedure just limits the number of messages displayed from the procedure. It does not affect the logic of the algorithm. Without SET NOCOUNT ON, you'll see a steady stream of "(1 row(s) affected)" messages as the code executes.

This algorithm is general enough that it can be used for any kind of operation for which you prefer traversing the hierarchy in a nonrecursive manner. If you want to change the operation that is performed on each node, change the code where current cost is added to the total sum.

The code demonstrates why the general model for hierarchies has a limited application in SQL. When you need to traverse over more than one level efficiently , the code starts to expand to an almost unreadable size .

   


Transact-SQL Cookbook
Transact-SQL Cookbook
ISBN: 1565927567
EAN: 2147483647
Year: 2005
Pages: 152

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net