Video Cut Detection Process | Handbook of Video Databases: Design and Applications (Internet and Communications)

Usually, within video stream data, camera cut operations represent the most common transition between consecutive shots, much more than other gradual transitions such as dissolving, fading and wiping. Thus, if the timing performance of video cut detection algorithms can be enhanced without reducing their accuracy, the overall performance of video segmentation and indexing procedures will be greatly improved.

For the video cut detection process, our system extracts the consecutive frames needed to detect the camera cut events. The system can define various configurations for the extracted frames. They can be color or gray frames, different image qualities and different image formats (either JPG or BMP). In addition, there is a temporal skip parameter, removing our need to extract and analyze all consecutive frames within the required segment. For cut detection, the system already implements a spatial skip parameter. This improves the performance without sacrificing the accuracy of the detection, capitalizing on the redundant information within the frames. We will describe first the original algorithm, which we call the "six most significant RGB bits with the use of blocks intensity difference." Then, we will present the new recommended algorithm, which we call the "binary penetration" algorithm. This new approach will be seen to further improve the performance of the cut detection procedure by taking into account the temporal heuristics of the video information.

3.1 The "Six Most Significant RGB Bits with the Use of Blocks Intensity Difference" Algorithm

First, the system makes use of the 24-bit RGB color space components of each of the compared pixels (each component has 8-bit representation). However, to speed the performance considerably, the system uses a masking operation to exploit only the two most significant bits (MSBs) of each color, meaning that we actually define only 2⁶ (64) ranges of color degrees for the entire RGB color space. The system evaluates the histogram of the corresponding two frames, taking the temporal skip into account. Then, the following formula is used to represent the difference between two frames, f₁ and f₂:

(1)

Where,	-H₁(i) is the RGB Histogram distribution for frame M,
	-H₂(i) is the RGB Histogram distribution for frame (M + Temporal Skip),
	-N = 64 = the possible 6 bits RGB values

If this histogram difference exceeds some defined threshold, the system decides that the two frames represent an abrupt camera cut. The 1/2 factor is used for normalization purposes, so that the histogram difference could range from 0% to 100%.

Originally, following the results of the initial approach, the system gave poor results in a few cases. One problem was that the previous algorithm makes use only of the global color distribution information. This meant that the system did not detect video cuts even when there was an actual cut in the compared frames. The two examples shown in Figure 14.3 illustrate this problem. The previous algorithm ignores the locality information of the colors' distribution. This is especially true in the case of two frames of different shots within the same scene, where the background color information is normally similar.

click to expand
Figure 14.3: Incorrect Frames - Change Decision.

Therefore, we extended the algorithm to suit these circumstances, partitioning each frame into a number of disjoint blocks of equal size. This allowed us to make quicker and better use of the locality information of the histogram distribution. Selecting the number of the blocks used therefore became a system design issue. Our testing results in [13] show that the algorithm behaves consistently with various numbers of blocks.

Nevertheless, using 25 for the number of blocks (5 horizontal blocks * 5 vertical blocks) has shown slightly better overall accuracy than other numbers. It is important to mention that the number of blocks should not be increased very much, for two reasons. First, this will slow the performance of the cut detection process. Second, the algorithm will tend to simulate the pixel pair-wise histogram algorithm. The possibility of detecting false camera cuts is therefore increasing and efficiency is decreasing in the case of motion resulting from quick camera operation or large object optical flow. However, the number of blocks should not be too small as well in order to avoid the overall distribution problem of missing true visual cuts, as shown in Figure 14.3.

Using equation (2), the system evaluates the mean of every corresponding two-block histogram difference between the compared frames to represent the overall two-frame histogram difference (f₁, f₂). Figure 14.4 shows the solution to the previous problem.

click to expand
Figure 14.4: Correct frames— unchanged decision.

(2)

Where,	-Histogram Difference (b_1ij, b_2ij) is the histogram difference of each two corresponding blocks (sub-images) b_1ij, b_2ij of the two frames fl, f2, evaluated similar to equation (1),
	-L = number of horizontal blocks = number of vertical blocks

Another problem to be handled is false detection. It occurs mainly because of the use of the temporal skip during processing. If the temporal skip is significantly high, and the change in the same continuous shot is sufficiently quick (because of motion from object flow or camera operations like tilting, panning or zooming), the algorithm will mistakenly recognize the frames as significantly different. We therefore use an additional step. After the first cut detection process, we analyze the changed frames more comprehensively [1], provided that the temporal skip is already greater than one. We do this in order to compensate for the possible velocity of object and camera movements, recognizing that the use of block differences magnifies the effect of these movements. We therefore re-analyze these specific frames but with the temporal skip equal to one, asking the system to extract all the frames in the area being analyzed. As a result, the algorithm became more accurate in maintaining true camera cuts while still rejecting the false cuts obtained from the first phase.

3.2 The "Binary Penetration" Algorithm

Although the use of the "six most significant RGB bits, with the use of blocks intensity difference" algorithm provided us with efficient and reliable results, it lacks the performance required in these kinds of systems, especially if it is to be used within distributed architectures. For this reason, we updated the algorithm with a new approach using the temporal correlation within the visual information of the selected video segment. The result was the design and implementation of the "binary penetration" algorithm.

The idea behind this algorithm is to delay the step of analyzing all the consecutive frames of a certain region until the algorithm suggests whether or not they may contain a potential cut. The previous algorithm extracts and analyzes all the frames when the histogram difference exceeds the threshold.

In our new algorithm, we extract and analyze certain frames in a binary penetration manner as shown in Figure 14.5. Initially, the algorithm (Figure 14.6) compares frames 'm' and 'k', which are separated by a temporal skip number of frames. Then, if the difference in the two frames exceeds the threshold, we extract the middle frame and compare it to both ends. If both differences are less than the threshold, we conclude that there is actually no video cut in this region. Processing then continues for the regions that follow. However, if one of the two differences exceeds the threshold, or even if both of them exceed it, we take the greater difference to indicate the possibility of a cut within this half region. We must stress that the temporal skip should be a moderate value that indicates the possibility of finding a maximum of one camera cut operation within each region. We continue the same procedure with the selected half until we find two consecutive frames exceeding the given threshold. This represents a true camera cut operation. The procedure may stop as well when the difference in both halves, at a certain level, is lower than the threshold. That means that there is actually no camera cut operation in this whole region, but merely the movement of a large object or the camera. The use of the temporal skip provides a difference that will pass this cut detection test within upper levels while recognizing false cuts at lower levels.

click to expand
Figure 14.5: A Camera Cut Detection Scenario Using the Binary Penetration.

click to expand
Figure 14.6: The Binary Penetration Algorithm.