6.3 How Do Neural Networks Work?
There is no universally accepted definition of a neural network. Most definitions, however, agree that they are networks of many simple processors or units that are connected and process numeric values. Neural networks are models of biological learning systems; in fact, much of the inspiration in all the fields of AI comes from the desire of researchers to emulate with software the human capacities of recognition, learning, remembering, and evolving. Thus, neural networks were developed as analogs of human brains. They were proposed 50 years ago in theory, motivated by a desire by scientists to understand how the human brain works. Similar to the way brain cells learn, neural networks work through a process of excitation and connection, depending on the weighted functions of the inputs from many other cells to which they are wired.
A neural network software system is an information-processing program inspired by the heavily interconnected structure of the human brain. They are collections of mathematical models that emulate some of the observed properties of biological nervous systems and draw on the analogies of adaptive biological learning. Learning in biological systems involves adjustments of the synaptic connections that exist between the neurons. This is the basic process that neural networks attempt to replicate; evenly distributed observations are required for the network to learn the patterns of different types of behavior. Learning occurs by example through exposure to a set of input-output data where the training algorithm iteratively adjusts the connection weights (synapses). These connection weights store the knowledge necessary to solve specific problems, which, for investigative data mining, involve recognizing the patterns of various types of cybercrimes, fraud, system intrusions, and other digital crimes (Fausett, 1994).
Knowledge, then, for a neural network is reduced to a set of weights between its internal connections. Learning comes down to what gets encoded in the wiring and the weighting factors of the various neurons. For example, to construct a fraud-detection model with a neural network, samples of fraudulent and non-fraudulent transactions are required for it to distinguish the different features and behavior of each. For this reason, when working with neural networks, the selection of samples in the training is extremely important. Adequate effort must be made to ensure that a balanced number of observations are presented to a network.
6.4 Types of Network Architectures
There are literally hundreds of neural networks architectures, which set the way internal connections and learning occur. However, the most dominant ones are the multi-layer perceptrons (MLP), also known as back-propagation, for classification and the Kohonen network or SOM for clustering. Most networks are trained via a feedback mechanism, which gradually adjusts and trains with the data by the testing and correction of errors. This is by most estimates the most common type of architecture in use for all neural networks.
While the majority of neural networks are classified as feed-forward (adjustment of errors), there are some that are recurrent, implementing a feedback scheme, depending on how data is processed through the network. Yet another way of classifying neural network types is by their method of learning or training. Most neural networks employ supervised training, while others are referred to as unsupervised, such as the SOM or Kohonen network. A SOM does not need training; its task is, instead, discovering clusters of similarity in a database, which it does through a concept of distance measurements unique to its architecture.
Supervised training is similar to a student being guided by a teacher or mentor. This type of back-propagation neural network is used when a sample of cases, profiles, or crimes is available for training a network to recognize the patterns of criminal behavior. For example, an auction site such as ebay.com could use this type of network to detect the probability of criminal activity because it probably has in its servers records of transactions where fraud was perpetrated.
The popular MLP networks have found their way into countless commercial and marketing applications requiring pattern recognition, classification, profiling, and prediction. It is estimated by industry analysts that 75% to 90% of most of today's applications use MLP schema networks. The MLP networks, main advantage is that they are easy to use and that they can approximate any input/output map. Their disadvantages are that they train slowly and require an adequate sample of training data.
In order to address some of the inadequacies of the MLP, other architectures exist, such as generalized feed-forward networks, which are a generalization of the MLP. These networks often solve problems much more efficiently and quickly. Yet another architecture is the modular feed-forward networks, which are a special class of MLP. These networks process their input using several parallel MLPs, and then combine their results. These networks tend to speed up training times and reduce the number of observations required for training. In situations such as fraud detection or terrorist profiling, this architecture may be ideal.
A rather unique architecture is the principal component analysis (PCA) network which combines unsupervised and supervised learning in the same topology. PCA is an unsupervised linear procedure that finds a set of uncorrelated features from the input. An MLP is next deployed in a supervised format to perform the nonlinear classification from the features discovered from the unsupervised procedure.
As noted, most networks require supervision; that is, they need samples of what the user is trying to classify to recognize, for example, a potential fraudulent transaction. However a Kohonen neural network also known as SOM, is different. This class of network does not need samples. A SOM basically creates clusters of similar records in a database without the need for a training output. This type of network, as we shall see in some case studies, has been used by some ingenious police investigators to cluster and associate crimes and criminals based on their modus operandi. Other neural network architectures include learning vector quantization, radial basis function, and hopfield, just to name a few.