Digital watermarking has focused on still images for a long time but nowadays this trend seems to vanish. More and more watermarking algorithms are proposed for other multimedia data and in particular for video content. However, even if watermarking still images and video is a similar problem, it is not identical. New problems, new challenges show up and have to be addressed. This section points out three major challenges for digital video watermarking. First, there are many nonhostile video processings, which are likely to alter the watermark signal. Second, resilience to collusion is much more critical in the context of video. Third, real-time is often a requirement for digital video watermarking.
Robustness of digital watermarking has always been evaluated via the survival of the embedded watermark after attacks. Benchmarking tools have even been developed in order to automate this process . In the context of video, the possibilities of attacking the video are multiplied. Many different nonhostile video processings are indeed available. Nonhostile refers to the fact that even content provider are likely to process a bit their digital data in order to manage efficiently their resources.
This category gathers all the attacks which modify the pixel values in the frames. Those modifications can be due to a wide range of video processings. Data transmission is likely to introduce some noise for example. Similarly, digital to analog and analog to digital conversions introduce some distortions in the video signal. Another common processing is to perform a gamma correction in order to increase the contrast. In order to reduce the storage needs, content owners often transcode, i.e. re-encode with a different compression ratio, their digital data. The induced loss of information is then susceptible to alter the performances of the watermarking algorithm. In the same fashion, customers are likely to convert their videos from a standard video format such as MPEG-1, MPEG-2 or MPEG-4 to a popular format e.g. DivX. Here again, the watermark signal is bound to undergo some kind of interferences. Spatial filtering inside each frame is often used to restore a low-quality video. Inter-frames filtering, i.e. filtering between adjacent frames of the video, has to be considered too. Finally, chrominance resampling (4:4:4, 4:2:2, 4:2:0) is commonly used processing to reduce storage needs.
Many watermarking algorithms rely on an implicit spatial synchronisation between the embedder and the detector. A pixel at a given location in the frame is assumed to be associated with a given bit of the watermark. However, many nonhostile video processings introduce spatial desynchronisation which may result in a drastic loss of performance of a watermarking scheme. The most common examples are changes across display formats (4/3, 16/9 and 2.11/1) and changes of spatial resolution (NTSC, PAL, SECAM and usual movies standards). Alternatively the pixel position is susceptible to jitter. In particular, positional jitter occurs for video over poor analog links e.g. broadcasting in a wireless environment. In the digital cinema context, distortions brought by the handheld camera can be considered as nonhostile since the purpose of the camera is not explicitly to remove the embedded watermark. It has been shown that the handheld camera attack can be separated into two geometrical distortions : a bilinear transform, due to the misalignment between the camera and the cinema screen, and a curved transform, because of the lens deformations. This results in a curved-bilinear transform depicted in Figure 42.7 which can be modelled with twelve parameters.
Figure 42.7: Example of distortion created by a handheld camera (exaggerated)
Similarly temporal desynchronisation may affect the watermark signal. For example, if the secret key for embedding is different for each frame, simple frame rate modification would make the detection algorithm fail. Since changing frame rate is a quite common process, watermarks should be designed so that they survive such an operation.
The very last kind of nonhostile attacks gathers all the operation that a video editor may perform. Cut-and-splice and cut-insert-splice are two very common processings used during video editing. Cut-insert-splice is basically what happens when a commercial is inserted in the middle of a movie. Moreover, transition effects, like fade-and-dissolve or wipe-and-matte, can be used in order to smooth the transition between to scenes of the video. Such kind of editing can be seen as temporal editing in contrast to spatial editing. Spatial editing refers to the addition of a visual content in each frame of the video stream. This includes for example graphic overlay, e.g. logos or subtitles insertion, and video stream superimposition, like in the Picture-in-Picture technology. The detector sees such operation as a cropping of some part of the watermark. Such a severe attack is susceptible to induce a high degradation of the detection performances.
There are many various attacks to be considered as shown in Table 42.3 and it may be useful to insert countermeasures  in the video stream in order to cope with the distortions introduced by such video processings. Moreover, the reader should be aware that many other hostile attacks are likely to occur in the real world. Indeed, it is relatively easy today to process a whole movie thanks to the powerful available personal computers. It is virtually possible to do whatever transformation on a video stream. For example, for still images, Stirmark introduces random local geometric distortions which succeed in trapping the synchronisation of the detector. This software has been optimised for still images and, when used on each frame of the video stream, visible artefacts can be spotted when moving objects go through the fixed geometric distortion. However future versions of Stirmark will surely address this visibility issue.
Collusion is a problem that has already been pointed out for still images some time ago. It refers to a set of malicious users who merge their knowledge, i.e. different watermarked data, in order to produce illegal content, i.e. unwatermarked data. Such collusion is successful in two different distinct cases.
Collusion type I: The same watermark is embedded into different copies of different data. The collusion can estimate  the watermark from each watermarked datum and obtain a refined estimate of the watermark by linear combination, e.g. the average, of the individual estimations. Having a good estimate of the watermark permits to obtain unwatermarked data with a simple subtraction with the watermarked one.
Collusion type II: Different watermarks are embedded into different copies of the same data. The collusion only has to make a linear combination of the different watermarked data, e.g. the average, to produce unwatermarked data. Indeed, generally, averaging different watermarks converges toward zero.
Collusion is a very important issue in the context of digital video since there are twice more opportunities to design a collusion than with still images. When video is considered, the origin of the collusion can be twofold.
Inter-videos collusion: This is the initial origin considered for still images. A set of users have a watermarked version of a video which they gather in order to produce unwatermarked video content. In the context of copyright protection, the same watermark is embedded in different videos and collusion type I is possible. Alternatively, in a fingerprinting application, the watermark will be different for each user and collusion type II can be considered. Inter-videos collusion requires different watermarked videos to produce unwatermarked video content.
Intra-video collusion: This is a video-specific origin. As will be detailed later, many watermarking algorithms consider a video as a succession of still images. Watermarking video comes then down to watermarking series of still images. Unfortunately this opens new opportunities for collusion. If the same watermark is inserted in each frame, collusion type I can be enforced since different images can be obtained from moving scenes. On the other hand, if alternative watermarks are embedded in each frame, collusion type II becomes a danger in static scenes since they produce similar images. As a result, the watermarked video alone permits removing the watermark from the video stream.
Even if collusion is not really of interest depending on the targeted application e.g. broadcast monitoring, it often raises much concern in digital video watermarking. It gives indeed opportunities for forgery if the watermarking algorithm is weak against intra-video collusion.
The reader will have understood that the main danger is intra-frame collusion i.e. when a watermarked video alone is enough to remove the watermark from the video. It has been shown that both strategies always insert the same watermark in each frame and always insert a different watermark in each frame making collusion attacks conceivable. As a result, an alternative strategy has to be found. A basic rule has been enounced so that intra-video collusion is prevented . The watermarks inserted into two different frames of a video should be as similar, in terms of correlation, as the two frames are similar. In other terms, if two frames look like quite the same, the embedded watermarks should be highly correlated. On the contrary, if two frames are really different, the watermark inserted into those frames should be unalike. This rule is quite straightforward when regarding attentively the definition of the two types of collusion. This can be seen as a form of informed watermarking since this rule implies a dependency between the watermark and the host frame content. A relatively simple implementation of this approach can be done by embedding a spatially localised watermark according to the content of each frame of the video . A small watermark pattern can be embedded in some key locations of each frame, e.g. salient points. During the extraction process, the detector can easily detect the position of the salient points and look for the presence or the absence of a watermark.
The problem of inter-video collusion still holds. Concerning collusion type I, this issue can be prevented by inserting a Trusted Third Party (TTP) which gives the message to be embedded. This message is often a function of the encrypted message that the copyright owner wants to hide and a hash of the host data. Different videos give different messages to be hidden and consequently different embedded watermarks. The TTP also acts as a repository. When an illegal copy is found, the copyright owner extracts the embedded message and transmits it to the TTP, which in turn gives the associated original encrypted message. If the copyright owner can successfully decrypt it, he can claim ownership. Regarding collusion type II, results obtained for still images can easily be extended to digital video. The problem arises when a coalition of malicious users, having each one a copy of the same data but with a different embedded watermark, colludes in order to produce illegal unwatermarked data. They compare their watermarked data, spot the locations where the different versions differ and modify the data in those locations. A traditional countermeasure  consists in designing the set of distributed watermarks so that a coalition, gathering at most c users, will not succeed in removing the whole watermark signal. It should be noted that c is generally very small in comparison with the total number n of users. Moreover the set of watermarks is built in such a way that no coalition of users can produce a document which will make an innocent user, i.e. not in the illegal coalition, be framed. In other terms, colluding creates still watermarked video content and the remaining watermark clearly identifies the malicious colluding users, without ever accusing any innocent customer. Implementations of such set of watermark have already been proposed for still images which are based on the projective geometry  or the theory of combinatorial designs .
The watermark is often considered as noise addition. A simple estimation consequently consists in computing the difference between the watermarked data and the low-pass filtered version of it.