| C++ Neural Networks and Fuzzy Logic |
by Valluru B. Rao
M&T Books, IDG Books Worldwide, Inc.
ISBN: 1558515526 Pub Date: 06/01/95
|Previous||Table of Contents||Next|
The Hopfield memory, Bidirectional Associative memory and Fuzzy Associative memory are all unsupervised networks that perform pattern completion, or pattern association. That is, with corrupted or missing information, these memories are able to recall or complete an expected output. Gallant calls the training method used in these networks as one-shot learning, since you determine the weight matrix as a function of the completed patterns you wish to recall just once. An example of this was shown in Chapter 4 with determination of weights for the Hopfield memory.
ART1 is the first neural network model based on adaptive resonance theory of Carpenter and Grossberg. When you have a pair of patterns such that when one of them is input to a neural network the output turns out to be the other pattern in the pair, and if this happens consistently in both directions, then you may describe it as resonance. We discuss in Chapter 8 bidirectional associative memories and resonance. By the time training is completed, and learning is through, many other pattern pairs would have been presented to the network as well. If changes in the short-term memory do not disturb or affect the long-term memory, the network shows adaptive resonance. The ART1 model is designed to maintain it. Note that this discussion relates largely to stability.
Learning, convergence, and stability are matters of much interest. As learning is taking place, you want to know if the process is going to halt at some appropriate point, which is a question of convergence. Is what is learned stable, or will the network have to learn all over again, as each new event occurs? These questions have their answers within a mathematical model with differential equations developed to describe a learning algorithm. Proofs showing stability are part of the model inventors task. One particular tool that aids in the process of showing convergence is the idea of state energy, or cost, to describe whether the direction the process is taking can lead to convergence.
The Lyapunov function, discussed later in this chapter, is found to provide the right energy function, which can be minimized during the operation of the neural network. This function has the property that the value gets smaller with every change in the state of the system, thus assuring that a minimum will be reached eventually. The Lyapunov function is discussed further because of its significant utility for neural network models, but briefly because of the high level of mathematics involved. Fortunately, simple forms are derived and put into learning algorithms for neural networks. The high-level mathematics is used in making the proofs to show the viability of the models.
Alternatively, temperature relationships can be used, as in the case of the Boltzmann machine, or any other well-suited cost function such as a function of distances used in the formulation of the Traveling Salesman Problem, in which the total distance for the tour of the traveling salesman is to be minimized, can be employed. The Traveling Salesman Problem is important and well-known. A set of cities is to be visited by the salesman, each only once, and the aim is to devise a tour that minimizes the total distance traveled. The search continues for an efficient algorithm for this problem. Some algorithms solve the problem in a large number but not all of the situations. A neural network formulation can also work for the Traveling Salesman Problem. You will see more about this in Chapter 15.
Suppose you have a criterion such as energy to be minimized or cost to be decreased, and you know the optimum level for this criterion. If the network achieves the optimum value in a finite number of steps, then you have convergence for the operation of the network. Or, if you are making pairwise associations of patterns, there is the prospect of convergence if after each cycle of the network operation, the number of errors is decreasing.
It is also possible that convergence is slow, so much so that it may seem to take forever to achieve the convergence state. In that case, you should specify a tolerance value and require that the criterion be achieved within that tolerance, avoiding a lot of computing time. You may also introduce a momentum parameter to further change the weight and thereby speed up the convergence. One technique used is to add a portion of the previous change in weight.
Instead of converging, the operation may result in oscillations. The weight structure may keep changing back and forth; learning will never cease. Learning algorithms need to be analyzed in terms of convergence as being an essential algorithm property.
|Previous||Table of Contents||Next|
Copyright © IDG Books Worldwide, Inc.