 < Day Day Up > 

Artificial neural networks (ANNs) and genetic algorithms (GAs) were invented to mimic some of the phenomenon observed in biology. The biological metaphor for ANNs is the human brain and the biological metaphor for GAs is the evolution of a species. An ANN consists of different sets of neurons or nodes and the connections between one set of neurons to the other. Each connection between two nodes in different sets is assigned a weight which shows the strength of the connection. The connections with a positive weight are called excitatory connections and a connection with a negative weight is called an inhibitory connection. The network of neurons and their connections is called the architecture of the ANN. Let A = {N_{1}, N_{2}, N_{3}}, B = { N_{4}, N_{5}, N_{6}}, and C= { N_{7}} be three sets of nodes for an ANN. The set A is called a set of input nodes, set B is called a hidden set of nodes, and set C is called a set of output nodes. The cardinality of set A is equal to the number of input variables; the cardinality of set C is equal to number of output variables. Each connection can be viewed as a mapping from either an input node to a hidden node or from a hidden node to an output node. The general architecture of three sets of nodes is called a threelayer (of nodes) ANN. Figure 1 illustrates a threelayer ANN. Notice when the property AÇBÇC=F always holds true and the network is called a feedforward network. The connections in a feedforward network are in one direction and the connections from A to B and from B to C are onto. The w_{ij} are the weights that denote the strength of connection from node i to node j.
Figure 1: A threelayer (of nodes) artificial neural network
Information is processed at each node in an ANN. For example, at hidden node N_{4} the incoming signal vector (input) from the three nodes in the input set is multiplied by the strength of each connection and is added up. The result is passed through an activation function and the outcome is the activation for the node. If x represents the sum of the product of incoming signal vector and the strength of connection then the activation, using logistic sigmoid activation function, can be represented by,
In the backpropagation algorithm based learning, the strengths of connections are randomly chosen. Based on the initial set of randomly chosen weights, the algorithm tries to minimize the following root mean square error (RMS):
where N is number of patterns in the training set, t_{n} is the target output of the n^{th} pattern and o_{n} is the actual output for n^{th} pattern. In each subsequent training step, the initial set of random connection weights (strength of connections) is adjusted towards the direction of maximum decrease of E which is scaled by a learning rate lamda. Mathematically, an old weight w_{old} is updated to its new value w_{new} using following equation:
One useful property of the sigmoid function is that:
This means that the derivative (gradient) of the sigmoid function can be calculated by applying a simple multiplication and subtraction operator on the function itself. For any neuron n_{k}, its output is determined by one of the following formulas:
where x_{i} is the ith input (A is total number of inputs), w_{lik} is the strength of connection from the ith input node to the kth hidden node, B is number of hidden nodes, w_{2ik} is the strength of connection from the ith hidden node to the kth output node, and C is number of output nodes. The weights w_{10k} and w_{20k} are thresholding weights and x_{0} and h_{0} are both equal to one. The network is initialized with small random values for the weights, and the backpropagation learning procedure is used to update the weights as the data is iteratively presented to the inputlayer neurons. At each iteration, the weights are updated by backpropagating the error as follows:
ΔWl_{ik} = ηδ_{k}x_{i} and Δw2_{ik} = η∂_{k}h_{k}
where
Here, η is the learning rate and y_{k} is the actual output value.
Genetic algorithms (GAs) use survival of the fittest strategy to learn solution vector for a forecasting model. GAs are a parallel search technique that starts with a set of random potential solutions and uses special search operators (evaluation, selection, crossover, and mutation) to bias the search towards the promising solutions. At any given time, unlike any optimization approach, GA has several promising potential solutions (equal to population size) as opposed to one optimal solution. Each population member in a GA is a potential solution. A population member (P_{1}) used to obtain solution vector for a forecasting model will consist of a set of all unknown parameters. P_{1} can be represented as:
P_{1} = ⟨w_{14}, w_{15}, w_{16}, w_{24}, w_{25}, w_{26}, w_{31}, w_{35}, w_{36}, w_{47}, w_{57}, w_{67} ⟩
where (w_{14}, w_{l5}, w_{l6}, w_{24}, w_{25}, w_{26}, w_{3l}, w_{35},w_{36}, w_{47}, w_{57}, w_{67}) ∈ ℜ are unknown parameters in a forecasting model. Any w ∈ P_{1} is called a gene (a variable in the forecasting model) of a given population member P_{l}. A set of several population members is called population Ω. The cardinality of the set of population members Ω (number of population members) is called population size. The cardinality of a population (number of genes) member is called the defining length of the population member ζ. The defining length (number of unknown variables in forecasting model) for population member P_{1} is ζ=12. The defining length of all the population members in a given population is constant.
GA starts with a random set of population. An evaluation operator is then applied to evaluate the fitness of each individual. In case of obtaining a forecasting model, the evaluation function is the inverse of forecasting error. A selection operator is then applied to select the population members with higher fitness (lower forecasting error). Under the selection operator, individual population members may be born, allowed to live or die. Several selection operators are reported in literature; the operators are proportionate reproduction, ranking selection, tournament selection, and steady state selection. Among the popular selection operators are ranking and tournament selection. Goldberg and Deb (1991) show that both ranking and tournament selection maintain strong population fitness growth potential under normal conditions. The tournament selection operator, however, requires lower computational overhead. The time complexity of ranking selection is O (nlog n) whereas the time complexity of tournament selection is O(n) where n is the number of population members in a population. In tournament selection two random pair of individuals are selected and the member with the better fitness of the two is admitted to the pool of individuals for further genetic processing. The process is repeated in the way that the population size remains constant and the best individual in the population always survives. For our research, we use the tournament selection operator.
After a selection operator is applied, the new population special operators called crossover and mutation are applied with a certain probability. For applying the crossover operator, the status of each population member is determined. Every population member is assigned a status as a survivor or nonsurvivor. The number of population members equal to survivor status is approximately equal to population size * (1  probability of crossover). The number of nonsurviving members are approximately equal to population size *probalility of crossover. The nonsurviving members in a population are then replaced by applying crossover operators to randomly selected surviving members. Several crossover operators exist; we use onepoint crossover in our research.
In onepoint crossover, two surviving parents and a crossover point are randomly selected. For each parent, the genes in the righthand side of the crossover point are flipped to produce two children. Let P_{1} and P_{2} be two parents and the crossover point be denoted by "." The two children C_{1} and C_{2} are produced as follows: (we use the bold font to simplify the understanding).
P_{1} = ⟨ w_{14}, w_{15}, w_{16}, w_{24}, w_{25}, w_{26},  w_{34}, w_{35}, w_{36}, w_{47}, w_{57}, w_{67} ⟩
P_{2} = ⟨ w_{14}, w_{15}, w_{16}, w_{24}, w_{25}, w_{26},  w_{34}, w_{35}, w_{36}, w_{47}, w_{57}, w_{67} ⟩
C_{1} = ⟨ w_{14}, w_{15}, w_{16}, w_{24}, w_{25}, w_{26}, w_{34}, w_{35}, w_{36}, w_{47}, w_{57}, w_{67} ⟩
C_{2} = ⟨ w_{14}, w_{15}, w_{16}, w_{24}, w_{25}, w_{26}, w_{34}, w_{35}, w_{36}, w_{47}, w_{57}, w_{67} ⟩
The mutation operator randomly picks a gene in a surviving population member (with the probability equal to probability of mutation) and replaces it with a real random number.
 < Day Day Up > 
