The individual algorithms for unimodal analysis have been benchmarked and the results have previously been published [12, 17, 18, 22]. For Scout, we benchmarked the Bayesian Engine to investigate how the system would perform on financial and celebrity segments. We automatically segmented and indexed seven TV programs: Marketwatch, Wall Street Week (WSW), and Wall Street Journal Report (WSJR) as well as the one-hour talk shows hosted by Jay Leno and David Letterman. The total video analyzed was about six hours. Each of the seven TV programs was classified as being either a financial news program or a talk show. Initially, each segment was classified as either a program segment or a commercial segment.
Program segments were subsequently divided into smaller, topic-based segments based mainly on the transcript. The Bayesian Engine performed a bipartite inference between financial news and talk shows on these sub-program units. Visual and audio information from the mid-level layer was also used in addition to the transcript. Next, a post-processing of the resulting inferences for each segment was performed by merging small segments into larger segments. In this inference process, Bayesian Engine combined video content and context information. This turned out to be of great importance for the final outcome of the inferences. In particular, the role of audio context information turned out to be especially important.
There were a total of 84 program segments from the four financial news programs and a total of 168 program segments for talk shows. This gave us a total of 252 segments to classify. When using all multimedia cues (audio, visual, and transcript), we get the following results: (i) total precision of 94.1% and recall of 85.7% for celebrity segments, and (ii) total precision of 81.1% and a recall of 86.9% for financial segments.