A Generalized Binary Penetration Algorithm | Handbook of Video Databases: Design and Applications (Internet and Communications)

The previous algorithm description handles the most likely cases for detecting video cuts of consecutive shots. However, the use of the temporal skip parameter makes other scenarios possible. Figure 14.7 shows the most important cases of camera and video editing within a video segment. We could define these cases as follows:

Case 1: the entire temporal skip region of processing is included within one continuous shot.

click to expand
Figure 14.7: Consecutive shots cut possibilities.

Case 2: one true camera cut exists within the temporal skip period.

Case 3: more than one camera cut exists within the temporal skip period. The editing operation includes the transition between at least three different camera angles.

Case 4: again, more than one camera cut is found within the temporal skip period. However, in this case, the transition returns to the first camera angle after one or more other camera shots. An example is the interview-like scenario in news or documentary videos.

Case 5: the start and/or end frame of a processing region coincides with the camera cut effect.

The described "binary penetration" algorithm will recognize cases 1 and 2 easily. In case 5, the algorithm still works smoothly, using any temporal skip value, because the last frame of the previous processed region is itself the starting frame of the following processing region. So, even if we cannot detect the shot cut effect from the previous region, the algorithm will still detect the effect in the following region. For cases 3 and 4, care is needed; as a result, we made a simple generalization to the algorithm.

This generalization simply just changes one step. Instead of choosing only the higher of the two evaluated differences in any level, which exceeds the defined threshold, we need to continue the penetration in each half of the level. This is done if both differences exceed the threshold, not necessarily only the higher of them. These modifications allow us to discover all the cuts within the temporal skip frames, even if there is more than one such cut.

In case 4, it more likely that camera cuts will be missed, thereby reducing the recall accuracy. We could decrease this effect by reducing the threshold value to pass the first level and discover the actual cuts in the next levels. However, we still need to remember that, in both cases 3 and 4, the problem is avoided if the temporal skip parameter has a moderate value, not large enough to include more than one cut effect.