5. The Major Trends in Video Watermarking


5. The Major Trends in Video Watermarking

Digital watermarking for video is a fairly new area of research which basically benefits from the results for still images. Many algorithms have been proposed in the scientific literature and three major trends can be isolated. The most simple and straightforward approach is to consider a video as a succession of still images and to reuse an existing watermarking scheme for still images. Another point of view considers and exploits the additional temporal dimension in order to design new robust video watermarking algorithms. The last trend basically considers a video stream as some data compressed according to a specific video compression standard and the characteristics of such a standard can be used to obtain an efficient watermarking scheme. Each of those approaches has its pros and cons as detailed in Table 42.4.

Table 42.4: Pros and cons of the different approaches for video watermarking.

Pros

Cons

Adaptation image video

Inherit from all the results for still images

Computationally intensive

Temporal dimension

Video-driven algorithms which often permit higher robustness

Can be computationally intensive

Compression standard

Simple algorithms which make real-time achievable

Watermark may be inherently tied to the video format

5.1 From Still Image to Video Watermarking

In its very first years, digital watermarking has been extensively investigated for still images. Many interesting results and algorithms were found and when new areas, such as video, were researched, the basic concern was to try to reuse the previously found results. As a result, the watermarking community first considered the video as a succession of still images and adapted existing watermarking schemes for still images to the video. Exactly the same phenomenon occurred when the coding community switched from image coding to video coding. The first proposed algorithm for video coding was indeed Moving JPEG (M-JPEG), which simply compresses each frame of the video with the image compression standard JPEG. The simplest way of extending a watermarking scheme for still images is to embed the same watermark in the frames of the video at a regular rate. On the detector side, the presence of the watermark is checked in every frame. If the video has been watermarked, a regular pulse should be observed in the response of the detector [2]. However, such a scheme has no payload. The detector only tells if a given watermark is present or not but it does not extract any hidden message. On the other hand, the host data is much larger in size than a single still image. Since one should be able to hide more bits in a larger host signal, high payload watermarks for video could be expected. This can be easily done by embedding an independent multi-bits watermark in each frame of the video [15]. However one should be aware that this gain in payload is counterbalanced by a loss of robustness.

Differential Energy Watermarks (DEW)

The DEW method was initially designed for still images and has been extended to video by watermarking the I-frames of an MPEG stream [31]. It is based on selectively discarding high frequency DCT coefficients in the compressed data stream. The embedding process is depicted in Figure 42.10. The 8x8 pixel blocks of the video frame are first pseudo randomly shuffled. This operation forms the secret key of the algorithm and it spatially randomizes the statistics of pixel blocks i.e. it breaks the correlation between neighbouring blocks. The obtained shuffled frame is then split into n 8x8 blocks. In Figure 42.10, n is equal to 16. One bit is embedded into each one of those blocks by introducing an energy difference between the high frequency DCT-coefficients of the top half of the block (region A) and the bottom half (region B). This is the reason why this technique is called a differential energy watermark.

click to expand
Figure 42.10: Description of DEW embedding

In order to introduce an energy difference, the block DCT is computed for each n 8x8 block and the DCT-coefficients are prequantized with quality factor Qjpeg using the standard JPEG quantization procedure. The obtained coefficients are then separated in two halves and the high frequency energy for each region is computed according to the following equation:

(42.4) click to expand

where θi,b is the DCT coefficient with index i in the zig-zag order in the bth DCT block, [.] indicates the prequantization with quality factor Qjpeg and c is a given cut-off index which was fixed to 27 in Figure 42.10. The value of the embedded bit is encoded as the sign of the energy difference D=EA-EB between the two regions A and B. All the energy after the cut-off index c in either region A or region B is eliminated by setting the corresponding DCT coefficients to zero to obtain the appropriate sign for the difference D. It should be noted that this can be easily done directly in the compressed domain by shifting the End Of Block (EOB) marker of the corresponding 8x8 DCT blocks toward the DC-coefficient up to the cut-off index. Finally, the inverse block DCT is computed and the shuffling is inversed in order to obtain the watermarked frame. On the detector side, the energy difference is computed and the embedded bit is determined according to the sign of the difference D. This algorithm has been further improved to adapt the cut-off index c to the frequency content of the considered n 8x8 block and so that the energy difference D is greater than a given threshold Dtarget[30].

5.2 Integration of the Temporal Dimension

The main drawback of considering a video as a succession of independent still images is that it does not satisfactorily take into account the new temporal dimension. The coding community has made a big step forward when they decided to incorporate the temporal dimension in their coding schemes and it is quite sure that it is the advantage of the watermarking community to investigate such a path. Many researchers have investigated how to reduce the visual impact of the watermark for still image by considering the properties of the Human Visual System (HVS) such as frequency masking, luminance masking and contrast masking. Such studies can be easily exported to video with a straightforward frame-per-frame adaptation. However, the obtained watermark is not optimal in terms of visibility since it does not consider the temporal sensitivity of the human eye. Motion is indeed a very specific feature of the video and new video-driven perceptual measures need to be designed in order to be exploited in digital watermarking [28]. This simple example shows that the temporal dimension is a crucial point in video and that it should be taken into account to design efficient algorithms.

Spread-Spectrum (SS)

One of the pioneer works in video watermarking considers the video signal as a one dimensional signal [22]. Such a signal is acquired by a simple line-scanning as shown in Figure 42.11. Let the sequence a(j){-1,1} represent the watermark bits to be embedded. This sequence is spread by a chip-rate cr according to the following equation:

(42.5)

click to expand
Figure 42.11: Line scan of a video stream

The spreading operation permits to add redundancy by embedding one bit of information into cr samples of the video signal. The obtained sequence b(i) is then amplified locally by an adjustable factor λ(i)0 and modulated by a pseudorandom binary sequence p(i){-1, 1}. Finally, the spread spectrum watermark w(i) is added to the line-scanned video signal v(i), which gives the watermarked video signal vw(i). The overall embedding process is consequently described by the following equation:

(42.6)

The adjustable factor λ(i) may be tuned according to local properties of the video signal, e.g. spatial and temporal masking of the HVS, or kept constant depending on the targeted application.

On the detector side, recovery is easily accomplished with a simple correlation. However, in order to reduce cross-talk between watermark and video signals, the watermarked video sequence is high-pass filtered, yielding a filtered watermarked video signal vw(i), so that major components of the video signal itself are isolated and removed. The second step is demodulation. The filtered watermarked video signal is multiplied by the pseudo-random noise p(i) used for embedding and summed over the window for each embedded bit. The correlation sum s(j) for the jth bit is given by the following equation:

(42.7) click to expand

The correlation consists of two terms 1 and 2. The main purpose of filtering was to leave 2 untouched while reducing 1 down to 0. As a result, the correlation sum becomes:

(42.8) click to expand

The hidden bit is then directly given by the sign of s(j). This pioneer method offers a very flexible framework, which can be used as a basic root of a more elaborate video watermarking scheme.

Other approaches have been investigated to integrate the temporal dimension. Temporal wavelet decomposition can be used for example in order to separate static and dynamic components of the video [49]. A watermark is then embedded in each component to protect them separately. The video signal can also be seen as a three dimensional signal. This point of view has already been considered in the coding community and can be extended to video watermarking. 3D DFT can be used as an alternative representation of the video signal [11]. The HVS is considered on one hand to define an embedding area which will not result in a visible watermark. On the other hand, the obtained embedding area is modified so that it becomes immune to MPEG compression. Considering video as a three dimensional signal may be inaccurate. The three considered dimensions are indeed not homogeneous: there are two spatial dimensions and one temporal one. This consideration and the computational cost may have hampered further work in this direction. However this approach remains pertinent in some specific cases. In medical imaging for example, different slices of a scanner can be seen as different frames of a video. In this case, the three dimensions are homogeneous and a 3D-transform can be used.

5.3 Exploiting the Video Compression Formats

The last trend considers the video data as some data compressed with a video specific compression standard. Indeed, most of the time, a video is stored in a compressed version in order to spare some storage space. As a result, watermarking methods have been designed, which embed the watermark directly into the compressed video stream. The first algorithm presented in Section 4.3 is a very good example. It exploits a very specific part of the video compression standard (run length coding) in order to hide some information.

Watermarking in the compressed stream can be seen as a form of video editing in the compressed domain [36]. Such editing is not trivial in practice and new issues are raised. The previously seen SS algorithm has been adapted so that the watermark can be directly inserted in the nonzero DCT coefficients of an MPEG video stream [22]. The first concern was to ensure that the watermarking embedding process would not increase the output bit-rate. Nothing ensures indeed that a watermarked DCT-coefficient will be VLC-encoded with the same number of bits than when it was unwatermarked. A straightforward strategy consists then to watermark only the DCT coefficients which do not require more bits to be VLC encoded. The second issue was to prevent the introduced distortion with the watermark to propagate from one frame to another one. The MPEG standard relies indeed on motion prediction and any distortion is likely to be propagated to neighbour frames. Since the accumulation of such propagating signals may result in a poor quality video, a drift compensation signal can be added if necessary. In this case, motion compensation can be seen as a constraint. However it could also be exploited so that the motion vectors of the MPEG stream carry the hidden watermark [24]. The components of the motion vector can be quantised according to a rule which depends on the bit to be hidden. For example, the horizontal component of a motion vector can be quantized to an even value if the bit to be hidden is equal to 0 and to an odd value otherwise.

All the frames of an MPEG coded video are not encoded in the same way. The intra-coded (I) frames are basically compressed with the JPEG image compression standard while the inter-coded (B and P) frames are predicted from other frames of the video. As a result, alternative watermarking strategies can be used depending on the type of the frame to be watermarked [23]. Embedding the watermark directly in the compressed video stream often allows real-time processing of the video. However the counterpart is that the watermark is inherently tied to a video compression standard and may not survive video format conversion.




Handbook of Video Databases. Design and Applications
Handbook of Video Databases: Design and Applications (Internet and Communications)
ISBN: 084937006X
EAN: 2147483647
Year: 2003
Pages: 393

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net