Chapter VIII: Text Processing by Binary Neural Networks


T. Beran, Czech Technical University, Czech Republic
T. Macek, Czech Technical University, Czech Republic

This chapter describes a rather less traditional technique of text processing. The technique is based on the binary neural network Correlation Matrix Memory. We propose using the neural network for text searching tasks . Two methods of coding input words are described and tested . Further, we discuss the problems of using this approach for text processing.

INTRODUCTION

With more and more people becoming familiar with computers, the amount of information stored in electronic formats is quickly increasing. The consequence of that is the need to be able to search large amounts of data for particular information. Various techniques have been developed for the text searching task. Many techniques are very fast and sophisticated. Speed is one of the most important criteria, but it is not the only one. The other one is the ability to deal with somehow corrupted text. Text could be corrupted, for example, when we do not know exactly what we are searching for, or if the text is the result of OCR or speech recognition.

In this chapter, we describe a rather less traditional technique for the text searching task. The technique is based on a binary neural network called CMM (Correlation Matrix Memory). We have tested CMM on the problem of finding a particular word in a single text document. Searching gives all occurrences of the word. Although the technique is able to search approximately, here we focus on exact searching.

This chapter is divided as follows : "Technique Description" explains our approach and describes CMM; "Text Coding and Experiments" explains the importance of input patterns coding, proposes two new methods, and shows initial experiments; "Discussion" discusses some problems that arise when this technique is applied to real text.




(ed.) Intelligent Agents for Data Mining and Information Retrieval
(ed.) Intelligent Agents for Data Mining and Information Retrieval
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 171

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net