[1] G. Backfried,R. Rainoldi, and J. Riedler, Automatic Language Identification in Broadcast News, IJCNN-2002, Honolulu, HI, Vol. 2, pp. 1406-1410, 2002.
[2] J.S. Boreczky and L. D. Wilcox, A Hidden Markov Model Framework for Video Segmentation Using Audio and Image Features, ICASSP-1998, Vol. 6, pp. 3741–3744, May 12–15, 1998.
[3] Y. Chang,W. Zeng,I. Kamel, and R. Alonso, Integrated Image and Speech Analysis for Content-based Video Indexing, Proc. 3rd IEEE Int. Conf. Multimedia Computing and Systems, Hiroshima, Japan, pp. 306–313, June 17–23, 1996.
[4] S. Chen and P. Gopalakrishnan, Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion, DARPA Speech Recognition Workshop, 1998.
[5] J. Choi,D. Hindle,J. Hirschberg,I. Chagnolleau,C. Nakatani,F. Pereira,A. Singhal, and S. Whittacker, An Overview of The AT&T Spoken Document Retrieval System, DARPA Broadcast News Transcription and Understanding Workshop, 1998.
[6] F. Dellaert,T. Polzin, and A. Waibel, Recognizing Emotion in Speech, ICSLP 1996, Oct. 1996.
[7] J.-L. Gauvain and L. Lamel, Large-vocabulary Continuous Speech Recognition: Advances and Applications, Proc. of the IEEE, Vol. 88, No. 8, pp. 1181–1200, Aug. 2000.
[8] A. Ghias,J. Logan,D. Chamberlin,B. Smith, Query by Humming, ACM Multimedia 95, 1995.
[9] J. Hirschberg,M. Bacchiani,D. Hindle,P. Isenhour,A. Rosenberg,L. Stark,L. Stead,S. Whittarker, and G. Zamchick, SCANMail: Browsing and Searching Speech Data by Content, Proc. European Conf. On Speech Communication and Technology, Aalborg, Denmark, Sept. 2001.
[10] J. Huang,Z. Liu, and Y. Wang, Joint Video Scene Segmentation and Classification Based on Hidden Markov Model, ICME-2000, New York, NY, Aug. 2000.
[11] Q. Huang,Z. Liu, and A. Rosenberg, Automated Semantic Structure Reconstruction and Representation Generation for Broadcast News, Proc. Of SPIE, Jan. 1999.
[12] N. Jayant,J. Johnson, and S. Safranek, Signal Compression Based on Models of Human Perception, Proc. of the IEEE, Vol. 81, pp. 1385–1422, Oct. 1993.
[13] S. Kullback, Information Theory and Statistics, Dover Publications, Inc. 1968.
[14] T. Lambrou,P. Kudumakis,R. Speller,M. Sandler, and A. Linney, Classification of Audio Signals Using Statistical Features on Time and Wavelet Transform Domains, ICASSP-1998, Vol. 6, pp. 3621–3624, 1998.
[15] R. Lienhart,S. Pfeiffer, and W. Effelsberg, Scene Determination Based on Video and Audio Features, Proc. IEEE Int. Conf. Multimedia Computing and Systems, Vol. 1, Florence, Italy, pp. 685–690, June 7–11, 1999.
[16] Z. Liu,J. Huang,Y. Wang, and T. Chen, Audio Feature Extraction & Analysis for Scene Classification, MMSP-1997, pp. 343–348, 1997.
[17] Z. Liu,J. Huang, and Y. Wang, Classification of TV Programs Based on Audio Information Using Hidden Markov Model, MMSP-1998, pp. 27–32, 1998.
[18] Z. Liu and Q. Huang, Content-based Indexing and Retrieval-by-example in Audio, ICME-2000, 2000.
[19] Z. Liu,Y. Wang, and T. Chen, Audio Feature Extraction and Analysis for Scene Segmentation and Classification, J. VLSI Signal Processing Sys. Signal, Image, Video Technol., Vol. 20, pp. 61–79, Oct. 1998.
[20] L. Lu,H. You,H.-J. Zhang, A New Approach to Query by Humming in Music Retrieval, ICME-2001, 2001.
[21] L. Lu,H. Jiang, and H. Zhang, A Robust Audio Classification and Segmentation Method, ACM MM-2001, 2001.
[22] J. Makhoul,F. Kubala,T. Leek,D. Liu,L. Nguyen,R. Schwartz, and A. Srivastava, Speech and Language Technologies for Audio Indexing and Retrieval, Proc. Of the IEEE, Vol. 88, No. 8, pp. 1338–1353, Aug. 2000.
[23] K. Minami,A. Akutsu,H. Hamada, and Y. Tonomura, Video Handling with Music and Speech Detection, IEEE Multimedia Magazine, Vol. 5, pp. 17–25, July–Sept. 1998.
[24] M. Mitra,A. Singhal, and C. Buckley, Improving Automatic Query Expansion, ACM SIGIR'98, pp. 206–214, 1998.
[25] Overview of the MPEG-7 Standard, ISO/IEC JTC1/SC29/WG11, N4509, Dec. 2001.
[26] J. Nam and A. H. Tewfik, Combined Audio and Visual Streams Analysis for Video Sequence Segmentation, ICASSP-1997, Vol. 3, pp. 2665–2668, 1997.
[27] E. Parris and M. J. Carey, Language Independent Gender Identification, ICASSP-1996, Vol. 2, pp. 685–688, 1996.
[28] S. Pfeiffer,S. Fischer, and W. Effelsberg, Automatic Audio Content Analsyis, Proc. 4th ACM Int. Conf. Multimedia, Boston, MA, Nov. 18–22, pp. 21–30, 1996.
[29] D. A. Reynolds, An Overview of Automatic Speaker Recognition Technology, ICASSP-2002, pp. 4072–4075, 2002.
[30] Y. Rui,T. S. Huang, and S. Mehrotra, Relevance Feedback Techniques in Interactive Content-based Image Retrieval, ICMCS-1999, 1999.
[31] C. Saraceno and R. Leonardi, Audio As a Support to Scene Change Detection and Characterization of Video Sequences, ICASSP-1997, Vol. 4, pp. 2597–2600, 1997.
[32] J. Saunders, Real-time Discrimination of Broadcast Speech/Music, in ICASSP-1996, Vol. 2, pp. 993–996, 1996.
[33] E. Scheirer and M. Slaney, Construction and Evaluation of a Robust Multifeatures Speech/Music Discrimination, ICASSP-1997, Vol. 2, pp. 1331–1334, Apr. 21–24, 1997.
[34] M. Siegler,U. Jain,B. Raj,R. Stern, Automatic Segmentation, Classification and Clustering of Broadcast News Audio, Proc. DARPA Speech Recognition Workshop, Chantilly, VA pp. 97–99, Feb. 1997.
[35] J. V. Thong,P. Moreno,B. Logan,B. Fidler,K. Maffey, and M. Moores, Speechbot: An Experimental Speech-Based Search Engine for Multimedia Content on the Web, IEEE Trans. on Multimedia, Vol. 4, No. 1, pp. 88 - 96, March 2002.
[36] G. Tzanetakis and P. Cook, Musical Genre Classification of Audio Signals, IEEE Trans. On Speech and Audio Processing, Vol. 10, Issue 5, pp. 293-302, July 2002.
[37] H. D. Wactlar,M. G. Christel,Y. Gong, and A. G. Haupmann, Lessons Learned from Building a Terabyte Digital Video Library, IEEE Computer Magazine, Vol. 32, pp. 66–73, Feb. 1999.
[38] E. Wold,T. Blum,D. Keislar, and J. Wheaton, Content-based Classification, Search, and Retrieval of Audio, IEEE Multimedia, Vol. 3, No. 2, pp. 27–36, 1996.
[39] Y. Wang,Z. Liu, and J. Huang, Multimedia Content Analysis, IEEE Signal Processing Magazine, Vol. 17, No. 6, pp. 12–36, Nov. 2000.
[40] K. Zechner, Summarization of Spoken Language - Challenges, Methods, and Prospects, Speech Technology Expert eZine, Issue 6, Jan. 2002.
[41] T. Zhang and C.-C.J. Kuo, Hierarchical Classification of Audio Data for Archiving and Retrieving, ICASSP-1999, Vol. 6, pp. 3001–3004, 1999.
[42] T. Zhang and C.-C.J. Kuo, Video Content Parsing Based on Combined Audio and Visual Information, SPIE's Conference on Multimedia Storage and Archiving Systems IV, Boston, MA, pp. 78–89, Sept., 1999.