List of Figures | Standard Codecs: Image Compression to Advanced Video Coding (IET Telecommunications Series)

Chapter 1: History of Video Coding

Figure 1.1: Evolution of video coding standards by the ITU-T and ISO/IEC committees

Chapter 2: Video Basics

Figure 2.1: Progressive and interlaced frames
Figure 2.2: Positioning of luminance and chrominance samples (dotted lines indicate macroblock boundaries)
Figure 2.3: Sampling pattern for 4—2—2 (CCIR 601) and 4—2—0 SIF
Figure 2.4: Conversion of CCIR-601 to SIF
Figure 2.5: Sampling pattern of 4—1—1 image format
Figure 2.6: Upsampling and filtering from SIF to CCIR-601 format (a luminance signals) (b chrominance signals)
Figure 2.7: Spatio-temporal relation in CIF format
Figure 2.8: Macroblock structures (a 4—2—0) (b 4—2—2) (c 4—4—4)

Chapter 3: Principles of Video Compression

Figure 3.1: Block diagram of a DPCM codec
Figure 3.2: Joint occurrences of a pair of pixels
Figure 3.3: A fast DCT flow chart
Figure 3.4: Quantisation characteristics
Figure 3.5: Uniform quantisers (a with dead zone) (b without dead zone)
Figure 3.6: (a Interframe) (b motion compensated interframe)
Figure 3.7: The current and previous frames in a search window
Figure 3.8: An example of the CSA search for w = 8 pixels/frame
Figure 3.9: A three-level image pyramid
Figure 3.10: An example of Huffman code for seven symbols
Figure 3.11: Representation of arithmetic coding process with the interval scaled up at each stage for the message eaii!
Figure 3.12: Both lower and upper values in the first half
Figure 3.13: Both lower and upper values in the second half
Figure 3.14: Lower and upper levels in the second and third quarter, respectively
Figure 3.15: A flow chart for binary arithmetic coding
Figure 3.16: A C program of binary arithmetic coding
Figure 3.17: Derivation of the binary bits for the given example
Figure 3.18: Three immediate neighbouring symbols to x
Figure 3.19: A generic interframe predictive coder
Figure 3.20: Block diagram of a decoder
Figure 3.21: A block of 2 × 2 pixels in the current frame and its corresponding block in the previous frame shown in the shaded area

Chapter 4: Subband and Wavelet

Figure 4.1: A bank of bandpass filters
Figure 4.2: A two-band analysis filter
Figure 4.3: A two-band subband encoder/decoder
Figure 4.4: (a) lowpass subband generation and recovery (b) highpass subband generation and recovery
Figure 4.5: Effect of time dilation and translation on the mother wavelet (a) mother wavelet Ψ(t) = Ψ_1,0(t), a = 1, b = 0 (b) wavelet Ψ_1,b(t), a = 1, b ≠ 0 (c) wavelet Ψ _2,0(t) at scale a = 2, b = 0 (d) wavelet Ψ _0.5,0(t) at scale a = 1/2, b = 0
Figure 4.6: Multiresolution spaces
Figure 4.7: (a) Haar scaling function (b) Haar wavelet (c) approximation of a continuous function, x(t), at coarser resolution A₀x(t) (d) higher resolution approximation A₁x(t)
Figure 4.8: One stage wavelet transform (a) analysis (b) synthesis
Figure 4.9: Multiband wavelet transform coding using repeated two-band splits
Figure 4.10: (a) the seven subimages generated by the encoder of Figure 4.9 (b) layout of individual bands
Figure 4.11: Principles of successive approximation
Figure 4.12: Quad tree representation of the bands of the same orientation
Figure 4.13: Spatial orientation tree and the set partitioning in SPIHT
Figure 4.14: Uniform dead zone quantiser with step size ∆_b
Figure 4.15: Eight immediate neighbouring symbols
Figure 4.16: Stripe scanned order in a code block
Figure 4.17: The impact of order of fractional bit plane coding in distortion reduction
Figure 4.18: Rate distortion with optimum trunctaion
Figure 4.19: An illustration of fractional bit plane encoding
Figure 4.20: Compression performance of various wavelet coding algorithms
Figure 4.21

Chapter 5: Coding of Still Pictures (JPEG and JPEG2000)

Figure 5.1: Lossless encoder
Figure 5.2: Three-sample prediction neighbourhood
Figure 5.3: Block diagram of a baseline JPEG encoder
Figure 5.4: Preparing of the DCT coefficients for entropy coding
Figure 5.5: Quantised DCT coefficients of a luminance block
Figure 5.6: Reconstructed images in sequential mode
Figure 5.7: Image reconstruction in progressive mode
Figure 5.8: Hierarchical multiresolution encoding
Figure 5.9: A three-level hierarchical encoder
Figure 5.10: A general block diagram of the JPEG2000 encoder
Figure 5.11: The encoding elements of JPEG2000
Figure 5.12: Correspondence between the spatial data and bit stream
Figure 5.13: Region of interest with better quality
Figure 5.14: Scaling of the ROI coefficients
Figure 5.15: Spatial scalable decoding
Figure 5.16: SNR scalable decoding
Figure 5.17: Effect of single bit error on the reconstructed image, encoded by SPIHT and JPEG2000

Chapter 6: Coding for Videoconferencing (H.261)

Figure 6.1: A block diagram of an H.261 audio-visual encoder
Figure 6.2: Block, macroblock and GOB structure of CIF and QCIF formatted pictures
Figure 6.3: A block diagram of H.261 video encoder
Figure 6.4: Characteristics of MC/NO_MC
Figure 6.5: Characteristics of inter/intra
Figure 6.6: Decision tree for macroblock type
Figure 6.7: Relative addressing of coded MB
Figure 6.8: Examples of bit pattern for indicating the coded/not coded blocks in an MB (black coded, white not coded)
Figure 6.9: A uniform quantiser with threshold
Figure 6.10: Zigzag scanning of 8 × 8 transform coefficients
Figure 6.11: Zigzag scanning and run-index generation
Figure 6.12: An example of run and index frequency and the resulting 2D-VLC table
Figure 6.13: Picture of Claire (a original) (b H.261 coded at 256 kbit/s)
Figure 6.14: H.261 coded at (a 128 kbit/s) (b 64 kbit/s)
Figure 6.15: Coded pictures with loop filter (a 128 kbit/s) (b 64 kbit/s)
Figure 6.16: Loop filter impulse response in various parts of the image
Figure 6.17: Hypothetical reference buffer occupancy

Chapter 7: Coding of Moving Pictures for Digital Storage Media (MPEG-1)

Figure 7.1: Structure of an ISO 11172 stream
Figure 7.2: MPEG-1's prototypical encoder and decoder illustrating end-to-end synchronisation (STC— systems time clock; SCR— systems clock reference; PTS— presentation time stamp; DSM— digital storage media)
Figure 7.3: An example of MPEG-1 GOP
Figure 7.4: An example of slice structure for SIF-625 pictures
Figure 7.5: Possible arrangement of slices in SIF-625
Figure 7.6: MPEG-1 coded video structure
Figure 7.7: A simplified MPEG-1 video encoder
Figure 7.8: Default intra and inter quantisation weighting matrices
Figure 7.9: Telescopic motion search
Figure 7.10: Subpixel search positions, around pixel coordinate A
Figure 7.11: Motion compensated prediction error (a) with half pixel precision (b) without half pixel precision
Figure 7.12: Motion estimation in B-pictures
Figure 7.13: Positions of luminance and chrominance blocks within a macroblock in 4—2—0 format
Figure 7.14: Selection of macroblock types in B-pictures
Figure 7.15: Model decoder
Figure 7.16: A block diagram of an MPEG-1 decoder
Figure 7.17: Example of group of pictures, in the display, decoding and new orders
Figure 7.18: Edited sequences

Chapter 8: Coding of High Quality Moving Pictures (MPEG-2)

Figure 8.1: MPEG-2 systems multiplex of program and transport streams
Figure 8.2: MPEG-2 systems demultiplexing of program and transport streams
Figure 8.3: Two types of scanning method (a zigzag scan) (b alternate scan)
Figure 8.4: Field prediction of field pictures for P-picture MBs
Figure 8.5: Field prediction of field pictures for B-picture MBs
Figure 8.6: A target macroblock is split into two 16 × 8 field blocks
Figure 8.7: Dual prime motion compensated prediction for P-pictures
Figure 8.8: Block diagram of a data partitioning encoder
Figure 8.9: Position of the priority break point in a block of DCT coefficients
Figure 8.10: Data partitioning (a enhanced) (b base picture)
Figure 8.11: Block diagram of a two-layer SNR scalable coder
Figure 8.12: A DCT based base layer encoder
Figure 8.13: A two-layer SNR scalable encoder with drift at the base layer
Figure 8.14: A three-layer drift-free SNR scalable encoder
Figure 8.15: A block diagram of a three-layer SNR decoder
Figure 8.16: Picture quality of the base layer of SNR encoder at 2 Mbit/s
Figure 8.17: Block diagram of a two-layer spatial scalable encoder
Figure 8.18: Principle of spatio-temporal prediction in the spatial scalable encoder
Figure 8.19: Details of spatial scalability encoder
Figure 8.20: (a Base layer picture of a spatial scalable encoder at 2 Mbit/s) (b its enlarged version)
Figure 8.21: A block diagram of a two-layer temporal scalable encoder
Figure 8.22: Spatial and temporal hybrid scalability encoder
Figure 8.23: SNR and spatial hybrid scalability encoder
Figure 8.24: SNR and temporal hybrid scalability encoder
Figure 8.25: SNR, spatial and temporal hybrid scalability encoder
Figure 8.26: Increase in bit rate due to scalability
Figure 8.27: Structure of ALL1 and AALx cells
Figure 8.28: PSNR of MPEG-2 coded video sequence GOP (IPPPPPP...) (a AALx with error rate of 10^-2) (b AAL1 with error rate of 10^-3) (c AAL1 with error rate of 10^-4)
Figure 8.29: PSNR of MPEG-2 coded video sequence with 12 frames per GOP (IPP... IPPP ...IP...) (a AALx with error rate of 10^-2) (b AAL1 with error rate of 10^-3) (c AAL1 with error rate of 10^-4)

Chapter 9: Video Coding for Low Bit Rate Communications (H.263)

Figure 9.1: Motion vector prediction
Figure 9.2: Motion vector prediction for the border macroblocks
Figure 9.3: Redefinition of the candidate predictors MV1, MV2 and MV3 for each luminance block in a macroblock
Figure 9.4: Weighting values for prediction with motion vectors of the luminance blocks on top or bottom of the current luminance block, H₁ (i, j)
Figure 9.5: Weighting values for prediction with motion vectors of luminance blocks to the left or right of current luminance block, H₂(i, j)
Figure 9.6: Weighting values for prediction with motion vector of current block, H₀(i, j)
Figure 9.7: PSNR of Claire sequence coded at 256 kbit/s, with MPEG-1, H.261 and H.263
Figure 9.8: Filtering of pixels at the block boundaries
Figure 9.9: d₁ as a function of d
Figure 9.10: Mapping of a block to a quadrilateral
Figure 9.11: Intensity interpolation of a nongrid pixel
Figure 9.12: Reconstructed pictures with the BMST and BMA motion vectors operating individually
Figure 9.13: Frame by frame reconstruction of the pictures by BMST
Figure 9.14: Mesh-based motion compensation (a mesh) (b motion compensated picture)
Figure 9.15: Performance of spatial transform motion compensation
Figure 9.16: Prediction in PB frames mode
Figure 9.17: Forward and bidirectional prediction for a B-block
Figure 9.18: A reversible VLC
Figure 9.19: Alternate scans (a horizontal) (b vertical)
Figure 9.20: Three neighbouring blocks in the DCT domain
Figure 9.21: An encoder with multiple reference pictures
Figure 9.22: Use of multiple reference pictures with and without back channel
Figure 9.23: Effects of errors (a with data partitioning) (b without data partitioning)
Figure 9.24: Error in a bit stream
Figure 9.25: Pixels at the boundary of (a a macroblock) (b four blocks)
Figure 9.26: Step-by-step decoding and skipping of bits in the bit stream
Figure 9.27: An example of intraframe error concealment
Figure 9.28: A grid of 3 × 3 macroblocks in the current and previous frame
Figure 9.29: An erroneous picture along with its error concealed version
Figure 9.30: A group of alternate P and B-pictures
Figure 9.31: Quality of decoded video with and without loss concealment with a bit error ratio of 10^-2
Figure 9.32: B-picture prediction dependency in the temporal scalability
Figure 9.33: Prediction flow in SNR scalability
Figure 9.34: Prediction flow in spatial scalability
Figure 9.35: Positions of the base and enhancement layer pictures in a multilayer scalable bit stream
Figure 9.36: Example of picture transmission order
Figure 9.37: A 4 × 4 luminance pixel block and its eight directional prediction modes
Figure 9.38: Various motion compensation modes
Figure 9.39: Use of S frame in bit stream switching
Figure 9.40: Creation of the switching pictures from the bit streams
Figure 9.41: The position of the secondary picture in random accessing of the bit stream
Figure 9.42
Figure 9.43
Figure 9.44

Chapter 10: Content-Based Video Coding (MPEG-4)

Figure 10.1: (a A video frame composed of) (b balloon VOP₁), (c aeroplane VOP₂ and) (d the background VOP₀)
Figure 10.2: Shape of objects (a balloon) (b aeroplane)
Figure 10.3: An object-based video encoder/decoder
Figure 10.4: VOP encoder structure
Figure 10.5: Intelligent VOP formation
Figure 10.6: Weighting function (left), adapted Gaussian kernel (right)
Figure 10.7: Immersion-based watershed flooding
Figure 10.8: A pictorial representation of video segmentation (a original) (b gradient) (c watershed transformed) (d colour merged) (e segmented image)
Figure 10.9: (a gradient image) (b object mask) (c segmented object)
Figure 10.10: Object boundaries for chain coding and the eight directions of the chain around the object
Figure 10.11: Quad tree representation of a shape block
Figure 10.12: A subblock of 2 × 2 pixels
Figure 10.13: Upper and left level indices of a subblock
Figure 10.14: Changing pixels (a intra alpha block) (b inter alpha block)
Figure 10.15: Reference area for detecting reference changing pixel a (b₁) (b c₁)
Figure 10.16: CR determination algorithm
Figure 10.17: Template for the construction of the pixels of (a the intra and) (b inter BABs. The pixel to be coded is marked with '?')
Figure 10.18: An example of an intra BAB template
Figure 10.19: Greyscale shape coding
Figure 10.20: Priority of boundary macroblocks surrounding an exterior macroblock
Figure 10.21: An example of SA-DCT (a original segment) (b ordering of pixels and horizontal SA-DCT) (c location 1D coefficients) (d location of samples prior to vertical SA-DCT) (e ordering of 1D samples and vertical SA-DCT) (f location of 2D SA-DCT coefficients)
Figure 10.22: Static sprite of Stefan (courtesy of MPEG-4)
Figure 10.23: (a a two-dimensional mesh) (b the mapped texture)
Figure 10.24: Reconstructed model-based image with the affine transform
Figure 10.25: Block diagram of a wavelet-based still image encoder
Figure 10.26: Prediction for coding the lowest band coefficients
Figure 10.27: A multiscale encoder of higher bands
Figure 10.28: Block diagram of a wavelet video codec
Figure 10.29: A hybrid H.263/wavelet video coding scheme
Figure 10.30: A virtual zero tree
Figure 10.31: Quality of QCIF size Akio sequence coded at various bit rates
Figure 10.32: Quality of QCIF size Carphone sequence coded at various bit rates
Figure 10.33: A snap shot of Akio coded at 20 kbit/s, 10 Hz (a H.263) (b SNR-scalable H.263) (c wavelet SPIHT)
Figure 10.34: A snap shot of Carphone coded at 40 kbit/s, 10 Hz (a H.263) (b SNR-scalable H.263) (c wavelet SPIHT)
Figure 10.35: OTS enhancement structure of type-1, with predictive coding of VOL
Figure 10.36: OTS enhancement structure of type-1, with bidirectional coding of VOL
Figure 10.37: OTS enhancement structure of type-2
Figure 10.38: Comparison between MPEG-4 and H.263
Figure 10.39
Figure 10.40
Figure 10.41

Chapter 11: Content Description, Search and Delivery (MPEG-7 and MPEG-21)

Figure 11.1: Scope of MPEG-7
Figure 11.2: Index generation for a video clip
Figure 11.3: Gabor filter spectrum; the contours indicate the half-peak magnitude of the filter responses in the Gabor filter dictionary. The filter parameters used are u_h = 0.4, u_l = 0.05, M = 4 and L = 6
Figure 11.4: An example of texture-based image retrieval; query texture— D5
Figure 11.5: A contour and the positions of its curvature extremes at four different scales
Figure 11.6: New polygons of Figure 11.5 with some added vertices
Figure 11.7: a the query shape, b, c and d the three closest shapes in order
Figure 11.8: Sketch-based shape retrieval

Appendix E: Channel Error/Packet Loss Model

Figure E.1: An Elliot-Gilbert two-level error model