4.6 Set partitioning in hierarchical trees (SPIHT)

The compression efficiency of EZW is to some extent due to the use of arithmetic coding. Said and Pearlman [16] have introduced a variant of coding of wavelet coefficients by successive approximation, that even without arithmetic coding outperforms EZW (see Figure 4.20). They call it set partitioning in hierarchical trees (SPIHT). Both EZW and SPIHT are spatial tree-based encoding techniques that exploit magnitude correlation across bands of the decomposition. Each generates a fidelity progressive bit stream by encoding, in turn, each bit plane of a quantised dyadic subband decomposition. Both use a significance test on sets of coefficients to efficiently isolate and encode high magnitude coefficients. However, the crucial parts in the SPIHT coding process is the way the subsets of the wavelet coefficients are partitioned and the significant information is conveyed.

One of the main features of this scheme in transmitting the ordering data is that it is based on the fact that the execution path of an algorithm is defined by the results of the comparisons of its branching points. So, if the encoder and decoder have the same sorting algorithm, then the decoder can duplicate the encoder's execution path if it receives the results of the magnitude comparisons. The ordering information can be recovered from the execution path.

The sorting algorithm divides the set of wavelet coefficients, {C_{i, j}}, into partitioning subsets T_m and performs the magnitude test:

(4.26)

If the decoder receives a no to that answer (the subset is insignificant), then it knows that all coefficients in T_m are insignificant. If the answer is yes (the subset is significant), then a certain rule shared by the encoder and decoder is used to partition T_m into new subset T_m,l and the significant test is then applied to the new subsets. This set division process continues until the magnitude test is done to all single coordinate significant subsets in order to identify each significant coefficient.

To reduce the number of magnitude comparisons (message bits) a set partitioning rule that uses an expected ordering in the hierarchy defined by the subband pyramid is defined (similar to Figure 4.12, used in zero tree coding). In section 4.4.2 we saw how the similarities among the subimages of the same orientation can be exploited to create a spatial orientation tree (SOT). The objective is to create new partitions such that subsets expected to be insignificant contain a huge number of elements and subsets expected to be significant contain only one element.

To make clear the relationship between magnitude comparisons and message bits, the following function is used:

(4.27)

to indicate the significance of a set of coordinates T. To simplify the notation of single pixel sets, S_n({(i, j)}) is represented by S_n(i, j).

To see how SPHIT can be implemented, let us assume O(i, j) to represent a set of coordinates of all offsprings of node (i, j). For instance, except for the highest and lowest pyramid levels, O(i, j) is defined in terms of its offsprings as:

(4.28)

We also define D(i, j) as a set of coordinates of all descendants of the node (i, j), and H, a set of coordinates of all spatial orientation tree roots (nodes in the highest pyramid level). Finally, L(i, j) is defined as:

(4.29)

The O(i, j), D(i, j) and L(i, j) in a spatial orientation tree are shown in Figure 4.13.

click to expand
Figure 4.13: Spatial orientation tree and the set partitioning in SPIHT

With the use of parts of the spatial orientation trees as the partitioning subsets in the sorting algorithm, the set partitioning rules are defined as follows:

The initial partition is formed with the sets {(i, j)} and D(i, j), for all (i, j) ∈ H.
If D(i, j) is significant, then it is partitioned into L(i, j) plus the four single-element sets with (k, l) ∈ O(i, j).
If L(i, j) is significant, then it is partitioned into the four sets D(k, l), with (k, l) ∈ O(i, j).
Each of the four sets now has the format of the original set and the same partitioning can be used recursively.

4.6.1 Coding algorithm

Since the order in which the subsets are tested for significance is important, in a practical implementation the significance information is stored in three ordered lists, called the list of insignificant sets (LIS), list of insignificant pixels (LIP) and list of significant pixels (LSP). In all lists each entry is identified by a coordinate (i, j), which in the LIP and LSP represents individual pixels, and in the LIS represents either the set D(i, j) or L(i, j). To differentiate between them, it is said that an LIS entry is of type A if it represents D(i, j), and of type B if it represents L(i, j) [16].

During the sorting pass, the pixels in the LIP, which were insignificant in the previous pass, are tested, and those that become significant are moved to the LSP. Similarly, sets are sequentially evaluated following the LIS order, and when a set is found to be significant it is removed from the list and partitioned. The new subsets with more than one element are added back to the LIS and the single coordinate sets are added to the end of the LIP or the LSP depending whether they are insignificant or significant, respectively. The LSP contains the coordinates of the pixels that are visited in the refinement pass.

Thus the algorithm can be summarised as:

Initialisation: let the initial yardstick, n, be n = ⌊log₂(max₍_{i, j}₎{|C_{i, j}|})⌋. Set the LSP as an empty set list, and add the coordinates (i, j) ∈ H to the LIP, and only those with descendants also to the LIS, as the type A entries.

Sorting pass:

2.1	for each entry (i, j) in the LIP do:
	2.1.1	output S_n(i, j);
	2.1.2	if S_n(i, j) = 1 then move (i, j) to the LSP and output the sign of C_{i, j};
2.2	for each entry (i, j) in the LIS do:
	2.2.1	if the entry is of type A then
		2.2.1.1	output S_n(D(i, j));
		2.2.1.2	if S_n(D(i, j)) = 1 then
			2.2.1.2.1	for each (k, l) ∈ O(i, j) do:
				2.2.1.2.1.1	output S_n(k, l);
				2.2.1.2.1.2	if S_n(k, l) = 1, then add (k, l) to the LSP and output the sign of C_k,l;
				2.2.1.2.1.3	if S_n(k, l) = 0 then add (k, l) to the end of the LIP;
			2.2.1.2.2	if L(i, j) ≠ Φ then move (i, j) to the end of the LIS, as an entry of type B, and go to step 2.2.2; otherwise, remove entry (i, j) from the LIS;
	2.2.2	if the entry is of type B then
		2.2.2.1	output S_n(L(i, j));
		2.2.2.2	if S_n(L(i, j)) = 1 then
			2.2.2.2.1	add each (k, l) ∈ O((i, j) to the end of the LIS as an entry of type A;
			2.2.2.2.2	remove (i, j) from the LIS.

Refinement pass: for each entry (i, j) in the LSP, except those included in the last sorting pass (i.e. with the same n), output the nth most significant bit of |C_i,j|.
Quantisation step update: decrement n by 1 and go to step 2.

One important characteristic of the algorithm is that the entries added to the end of the LIS above are evaluated before the same sorting pass ends. So, when it is said for each entry in the LIS, it means those that are being added to its end. Also similar to EZW, the rate can be precisely controlled because the transmitted information is formed of single bits. The encoder can estimate the progressive distortion reduction and stop at a desired distortion value.

Note that, in this algorithm, the encoder outputs all branching conditions based on the outcome of the wavelet coefficients. Thus, to obtain the desired decoder's algorithm, which duplicates the encoder's execution path as it sorts the significant coefficients, we simply replace the word output with input. The ordering information is recovered when the coordinate of the significant coefficients is added to the end of the LSP. But note that whenever the decoder inputs data, its three control lists (LIS, LIP and LSP) are identical to the ones used by the encoder at the moment it outputs that data, which means that the decoder indeed recovers the ordering from the execution path.

An additional task done by the decoder is to update the reconstructed image. For the value of n when a coordinate is moved to the LSP, it is known that 2ⁿ ≤ |C_{i, j}| < 2ⁿ⁺¹. So, the decoder uses that information, plus the sign bit that is input just after the insertion in the LSP, to set the reconstructed coefficients Ĉ_{i, j} = ±1.5 ×2ⁿ. Similarly, during the refinement pass, the decoder adds or subtracts 2ⁿ^-1 to Ĉ_{i, j} when it inputs the bits of the binary representation of |C_i,j|. In this manner, the distortion gradually decreases during both the sorting and refinement passes.

Finally it is worth mentioning some of the differences between EZW and SPIHT. The first difference is that they use a slightly different spatial orientation tree (SOT). In EZW, each root node in the top LL band has three offspring, one in each high frequency subband at the same decomposition level and all other coefficients have four children in the lower decomposition subband of the same orientation. However, in SPIHT, in a group of 2 x 2 root nodes in the top LL band, the top left node has no descendant and the other three have four offspring each in the high frequency band of the corresponding orientation. Thus SPIHT uses fewer trees with more elements per tree than in EZW. Another important difference is in their set partitioning rules. SPIHT has an additional partitioning step in which a descendant (type A) set is split into four individual child coefficients and a grand descendant (type B) set. EZW explicitly performs a breadth first search of the hierarchical trees, moving from coarser to finer subbands. Although it is not explicit, SPIHT does a roughly breadth first search as well. After partitioning a grand descendant set, SPIHT places the four new descendant sets at the end of the LIS. It is the appending to the LIS that results in the approximate breadth first traversal.