We have developed a two-level framework that can automatically segment an input news video into story units. Given an input video stream, the system performs the analysis at two levels. The first is shot classification, which classifies the video shots into one of 13 pre-defined categories using a combination of low-level, temporal, and high-level features. The second level builds on the results of the first level and performs the HMM analysis to locate story (or scene) boundary. Our results demonstrate that our two-level framework is effective and we could achieve an F1 performance of over 89% in scene/story boundary detection. Our detailed analysis also indicates that HMM is effective in identifying dominant features that can be used to locate story transitions.
As our training data is rather sparse, our conclusion is preliminary. Although in theory one level analysis should yield better results, two-level analysis, which requires less training data, has been found to be superior. This conclusion is reinforced in NLP research. Nevertheless, we need to do further tests by using a large set of training data, and by using news from different broadcast stations and countries. We will also incorporate speech to text data obtained from the audio track and use text segmentation technique to help identify story boundaries. We hope to fuse information from multiple sources in order to develop a robust and reliable story boundary detection model for news and other types of video.
Our eventual goal is to convert an input news video into a set of news stories together with their classification. This will bring us a major step towards supporting personalized news video for general users.
The authors would like to acknowledge the support of the National Science & Technology Board and the Ministry of Education of Singapore for the provision of a research grant RP3960681 under which this research is carried out. The authors would also like to thank Rudy Setiono, Wee-Kheng Leow, and Gao Sheng for their comments and fruitful discussions on this research, and Chandrashekhara Anantharamu for his help in programming technique.