Artificial Neural Networks ANNs

Artificial Neural Networks (ANNs)
Sdhabhon Bhokha, Associate Professor, Dean, Faculty of Engineering, Ubonratchathani University,

http://www.sdhabhon.com or sdhabhon@ubu.ac.th
1. Introduction
This chapter deals with a new enhanced computer technology which is a branch of artificial
intelligence, called Artificial Neural Networks (ANNs). Different terms and definitions are given. Basic
concepts of ANNs are explained, i.e. neural cells or neurons, perceptron, Hopfield net. Next, various
aspects of the ANNs' modelling are described, i.e. the processes, selecting and representing the
variables, hidden layers and nodes, weights and biases, summation and transformation function,
learning rate and momentum. Training process is also presented, i.e. definitions, methods, back-
propagation training algorithm and Generalized Delta Rule (GDR), and updating the network. The
samples used for network modelling are discussed, i.e., methods of sampling, amount of samples, and
how they will be fed into the network during training and testing. After that, the testing process is
explained. The following part explains the outputs obtained from the ANNs. Then, advantages and
awareness of using the ANNs are addressed. Then, researches, developments in ANNs, as well as
application of the ANNs are expressed. Finally, this Chapter summarizes the possibility to use the
ANNs as a new approach and tool for pre-design estimating of construction costs and duration.
2. Definitions
Artificial neural networks (ANNs) have some different names which are: 1) connectionist models; 2)
parallel distributed processing models; 3) neuromorphic systems; and 4) neural computing. They can be
defined by any or a combination among the followings which also shows their properties.
1) ANNs models are composed of many non-linear computational elements, operating in parallel
and arranged in patterns reminiscent of biological neural nets (Lippmann, 1987).
2) ANNs are paralleled, distributed information processing structure consisting of processing
elements which can possess a local memory and can carry out localized information processing
operations. They are interconnected via unidirectional signal channels called connections. Each
processing element has a single output connection that branched or fans out into as many
collateral connections as desired (Nielsen, 1989).
3) ANNs are types of information processing system whose architectures are inspired by the
structure of human biological neural systems (Caudill and Butler, 1990).
4) Neural networks concentrate on machine learning which was based on the concept of self-
adjustment of internal control parameters. The artificial neural network environment consist of
five primary components; learning domain, neural nets, learning strategies, learning process,
and analysis process (Adeli, 1992).
5) Artificial neural net is a kind of machine learning. It is a computational procedure, and
composed of simple elementary functions such as summation and multiplication (Arciszewski
and Ziarko, 1992).
6) ANNs are information processing technology inspired by studies of the brain and nervous
system. They composed of a collection of neurons (or nodes or processing elements, or units)
which are grouped in layers. They accept several inputs, perform a series of operations on
them, and produce one or more outputs. They are similar to a subroutine that works best in
classifying, modelling and forecasting (Klimasauskas, 1993).
7) ANNs are collections of simple computational elements called neurons that are interconnected
(Berry and Trigueros, 1993).
8) ANNs are models that emulate a biological neural network. They compose of artificial neurons
or neurons which are the processing elements-PEs. They are information processing
ANNs by Sdhabhon Bhokha, July 9, 2005 - 1
technologies inspired by studies of the brain and nervous system. They are implementation of
software simulation of massively parallel process involving processing elements
interconnected in a network architecture (Medsker et al., 1993),
9) ANNs are composition of neurons or processing elements, and connections that are organized
in layers (Salchenberger, et al., 1993).
10) ANNs are connectionist systems that have an ability to learn and generalize from examples, to
provide meaningful solutions to problem even when input data contain errors or are
incomplete. They can adapt solution overtime to compensate for changing circumstances. They
process information rapidly and also transfer readily between computing systems (Flood and
Kartam, 1994).
11) ANNs are computational devices constructed from a large number of parallel processing
devices. Individually, the neurons perform trivial functions, but collectively, they are capable
of solving very complicated problems. In other words, they are capable of learning from
example, can infer solutions to problems beyond those to which they are exposed during
training. They can provide meaningful answers even when the data to be processed include
errors or are incomplete. They can process information extremely rapidly (Gagarin et al.,
1994).
12) ANNs are AI software technology that represents objects or pieces of information as nodes and
expresses relationships between them as links to provide a powerful and flexible way of
representing knowledge (Paulson, 1995).
13) ANNs are computational models composed of many non-linear processing elements arranged
in patterned similar to biological neuron networks. Typically, they have an activation value
associated with each node and a weight value associated with each connection. An activation
function governs the firing of nodes and the propagation of data through networks connections
in massive parallelism. The networks can also be trained with examples through connection
weight adjustments (Tan et al., 1996).
This research uses the name "neural networks" for the models, and the term "node" rather than neuron
or neural cell. However, the neural networks and the related terms conform to all the definitions given
above.
3. Basic Concepts of ANNs

Neural cells or neurons have been defined in various ways. Lippmann (1987), Chester (1993) and
Kireetoh (1995) described the behavior of neural cells as living neural, receives multiple inputs from
other neurons via branching input (afferent) path called "dendrites". The combined stimuli from these
input signals activate a region called "axon hillock", where an outgoing (efferent) tendril called "axon"
connects to the cell body. The axon then transmits the neuron's output to still other neurons through
their dendrite. In some cases, the output that the neuron transmits along its axon goes directly to muscle
of gland cells in order to activate or inhibit the functions those cells perform. The gap between an
output axon of one neuron and the input dendrites of another is the location of synapses. Information
transfer across a synapse is controlled by bio-chemical agents but a process that is modeled in electronic
neurons by the changing of synaptic weight. It is estimated that the brain contains in the order of 1011
neurons, and 1014 to 1016 synaptic interconnections among these. Figures 1 illustrates two biological
neural cells, while Figure 2 illustrates schematic multi-layered artificial neural network.

Neuron No.1
Axon
Synapse
Dendrite
Dendrite
Neuron No.2
Axon
Figure 1 Two Biological Neural Cells.
Input Input layer Hidden layer Output layer Output
U [1][1]
S[ n][1] 1 1 V [1][1]
1 Om1 Tm1
V [1][ k ]
U [ 2 ][1] U [i ][1]
S[ n][ 2] 2
U [ 2 ][ j ]
U [1][ j ]
V [1][ k ]
k Omk Tmk
S[ n][i ] i j V [ j ][ k ]
U [i ][ j ] f ( N+bk )
f ( N+b j ) N+bk
i
1
N ij = s U
1
i ij , b j = bias at j N+b j
0.5 f ( N+b.. )
j
N jk = s j V jk , bk = bias at k N+b..
1
0
Sigmoid transformation function
Figure 2 Schematic Multi-layered ANN.
ANN technology is a branch of artificial intelligence (AI) that attempts to achieve human brain-like
capability (Lippmann, 1987; Caudill and Butler, 1990; Klimasauskas, 1993; Medsker et al. 1993).
Various kinds of the ANN structure are based on biological nervous system which can exhibit a
surprising number of the brain's characteristics, e.g. learn from experience, generalize from previous
examples to new problems by inferring solutions to problems beyond those to which they are exposed
during training. They can provide meaningful answers even when the data to be processed include
errors or are incomplete (Karunasekera, 1992; Hawlet et al., 1993; Mesker et al., 1993; Chao and
Skibniewski, 1994; Flood and Kartam, 1994a, and 1994b; Gagarin et al., 1994). They can process
information extremely rapidly when applied to solve real world problems. ANNs have been mobilized
for building neuro-computing architectures in physical hardware that can think and act intelligently like
human beings. ANNs can be built either by developing a neuro-computer called machine or neuro-
software languages called programs (Forsyth, 1992; Mesker et al., 1993; Adeli, 1996).

A simple form of ANNs, called perceptron has been introduced by Rosenblatt in 1957. The perceptron
means a simple network with only an input and output layer, or a neural network which has no hidden
layer. Another definition of perceptron is the way to group together experiences that are similar, and
how to differentiate them from dissimilar experiences (Smith, 1993). The perceptron is the theory of
statistical separability which concerns mathematical analysis of the behavior of a class of network
models (Rosenblalt, 1950s). On the other hand, multi-layered perceptron means a feed-forward net with
one or more layers of nodes between the input and output nodes. The additional layers contain hidden
nodes that are not directly connected to both the input and output nodes. The multi-layer perceptron
overcomes many limitations of single layer perceptron (Lippmann, 1987). Their capabilities stem from
the non-linear relationships among the nodes.
Hopfield net is a kind of network used with binary inputs. This is most appropriate when exact binary
representation are possible as black-and-white images, yes-and-no answers or on-and-off switch. In
1982, Hopfield designed a neural network that revived the technology, bringing it out of the neural dark
age of the 1970s (Chester. 1993). He devised an array of neurons that were fully interconnected with
each neuron feeding its output to all others. The concept is that all the neurons will transmit signals
back and forth to each other in a closed feedback loop until their states became stable. This concept
does not make use of the feed forward mechanism of adjusting synaptic input weights of nodes in order
to tune the outputs of those nodes as those presented in the perceptron. Instead, a Hopfield net makes
feedback the central feature of the network.
4. ANNs Modeling
The ANNs modeling can be explained as follows.
4.1 ANNs modelling processes

Neural networks concentrate on machine learning, which is based on the concept of
self-adjustment of internal control parameters. The artificial neural network environment
consists of five primary components; learning domain, neural nets, learning strategies, learning
process, and analysis process (Adeli, 1992). Accordingly, neural network based modelling
process involves five main aspects which are: 1) data acquisition, analysis and problem
representation; 2) architecture determination; 3) learning process determination; 4) training of
the network; and 5) testing of the trained network for generalization evaluation (Wu and Lim,
1993). Elazouni et al. (1997) classified ANNs modelling into three main phases: 1) design; 2)
implementation; and 3) recall or use for problem solving. The design phase consists of two
tasks: problem analysis and problem structuring. The implementation consists of three main
aspects: 1) acquiring the knowledge (including data collection); 2) selecting the network
configuration; and 3) training and testing the network.
4.2 Selecting the variables

An ANN model consists of independent variables (or inputs) and dependent variables (or
outputs). Selecting variables to be used in the model involves two considerations (Smith,
1993). First, the information might be transformed to make it more useful to the network.
Second, selection among the transformed variables will be based on predictiveness and
covariance. Basically, a selected independent variable is predictive if the dependent variables
correlated with it. By contrast, if two independent variables are correlated with each other, the
correlated inputs make the model more sensitive to the statistical peculiarities of particular
samples. They accentuate the overfitting problem and limit generalization. For these reasons,
the model should include only the independent variables that are predictive of the dependent
variables but do not covary with each other or one another, regardless of what modeling
technique is used. To minimize training samples and training time, only the major affecting
factors which have strong influence on the specific problem should be considered in setting
input nodes (Wu and Lim, 1993).
Yeh et al. (1993) outlined four criteria for selecting attributes and training examples. First,
availability of attributes; by which attributes should be clearly observable without sophisticated
experience, expensive cost, and long-time period. Second, unnecessary or insufficient
condition of attributes which reduce the classification reliability must be avoided. Third, a
good training set should contain common, unusual and rare cases, and such kind of training set
cannot be obtained by random sampling from the problem domain. Fourth, the more training
examples, the better learning results will be obtained.
4.3 Representing the variables

Smith (1993) explained that the way independent variables are represented by the input nodes
of the network has a major impact on the training of the network and on the performance of the
resulting model. The ability of the network is mainly referred to as its effectiveness in
generalizing. The amount of computation and the time required for learning are both greatly
influenced by the form of representation used. There are two types of independent and
dependent variables: 1) quantitative; and 2) class variables. The quantitative or continuous
valued variable can be any number. It is not necessary to fall within the bounds of the applied
sigmoid function. It also is possible to scale or normalize the quantitative variables to some
standard range such as 0 to 1, -1 to 1, or none (Smith, 1993; Yeh et al., 1993; Elazouni et al.,
1997). Elazouni et al. (1997) opined that the networks usually provide improved performance
when the data are normalized. It is necessary to avoid excessive generalization in which the
network learns about examples at one extreme but applies to examples at the other extreme.
One solution is to cut the variable up and represent it with several nodes. By doing this, the
network can only generalize to examples that are reasonably close by. The lessons learned
from each example during training are localized or limited. The two nodes between which the
input values are located, will then be partially turned on. This attractive method is called
"interpolation representation" (Hutchison, 1989; Smith, 1993). It is appropriate since no
information about the precise value of the variable is lost. It also permits generalization
because the values nearby have similar representations. On the other hand, the class variables
are discrete, logical or symbolic states.
The class variables use binary representation. One binary output can represent the black-and-
white images, yes-and-no answers or on-and-off switch (Smith, 1993; Kireetoh, 1995).
Pezeshk et al. (1996) used single binary output in a different way, e.g. zero and one to
represent clay and sand.
For multiple outputs, the value 1 indicates that the object or event belongs to the class
represented by that node, while the value 0 indicates that it does not. The number of nodes
would not be equal to the number of classes (Smith, 1993). There may be one less node than
there are classes. All the classes but one is represented by turning on the appropriate node
and the remaining class is represented by not turning on any node. This can reduce
computational time. On the other hand, Chau et al. (1997) assigned the attributes of qualitative
class dependent variables (outputs) in a different way, e.g. -2 to 2 were used to identify bad,
slightly bad, average, slightly good, and good, respectively.
It is possible to mix both quantitative and class variables among the inputs to a single network
(Smith, 1993). Such a mixture however, does raise an issue for the algorithmic implementation
of the mathematics. Another alternative is to binarize the quantitative variable, represent it by
using binary input nodes. There are two major problems with this approach, i.e. discrimination
and generalization. This representation makes it impossible for the network to discriminate
between examples whose values are within the sub-range of the same node. No binary
representation of a quantity can completely resolve the problems of discrimination and
generalization. However, these problems can be reduced by using "ensemble coding", whereby
several nodes are turned on. Some are to represent broad ranges of values and help the network
generalize while the others represent narrow ranges of values and help it discriminate.
4.4 Hidden Layer and hidden Node

In multi-layered perceptron, hidden layer means a third layer of processing elements or units in
between the input and output layers that increases computational power. In principle, the
hidden layer can be more than one layer. The network can approximate a target function of any
complexity if it has enough hidden nodes. The hidden layers of nodes make multi-layered
perceptron attractive as a statistical modeling tool (Lippmann, 1987; Karunasekera, 1992;
Hawley et al., 1993; Khoshgoftaar and Lanning, 1995). The output of hidden nodes can be
considered as a new variable, i.e. an input to the nodes on the next layer or the nodes on the
output layer (or dependent variable). They contain interesting information about the
relationship being modelled. The new variables fired from the nodes on the hidden layer, and
along with the net topology are known as internal representations, and can make the modelling
process self-explanatory. Consequently, the neural network approach is attractive as a form of
machine learning (Berry and Trigueiros, 1993).
Too few hidden nodes (or too small networks) for a given problem will cause back-
propagation not to converge to a solution (Karunasekera, 1992). However, many hidden nodes
cause a much longer learning period. At some point, increasing the number of hidden nodes
does not greatly increase the ability of the neural network to classify (William, 1993). On the
other hand, too many units on a layer can make a network to become over specific, particularly
on the extreme case where the number of units on the first processing layer is equal to the
number of examples in the training set (Rumelhart, 1988). Too many hidden nodes can overfit,
such that the network can model the accidental structure of the noise in the sample as well as
the inherent structure of the target function (Smith, 1993). Therefore, minimum sized network
which uses as few hidden units as possible is important for efficient classification and good
generalization.(Khan et al., 1993). Berke and Hajela (1991) suggested that the number of
hidden nodes should be between the average and the sum of nodes on the input and output
layers. Rogers and Ramarsh (1992) suggested that a good initial guess for hidden nodes is to
take the sum of nodes on the input and output layers. Soemardi (1996) suggested that the
number of hidden nodes should be 75% of the of input nodes. Thus, experience shows that the
number of hidden nodes have a maximum limit of the sum of the input and output nodes but
the minimum could be either 75% of the input nodes or the average of the input and output
nodes.
4.5 Weights and biases

Weights are defined as the strength of input connections, which are expressed by a real
number. The processing nodes receive inputs through links. Each link has a weight attached to
it. The sum of the weights make up a value that updates the processing nodes, the output
excitation to get either on or off. The weights are the relative strength (mathematical value) of
the initial entering data or the various connections that transfer data from layer to layer
(Medsker et al., 1993). They are the relative importance of each input to a processing element
(Medsker et al., 1993). In practice, the weights would be initiated and assigned to the network
prior to the start of training. The weights initiation techniques are also important in order to
control and obtain the convergence of training and training time. For each network, the number
of unknowns is equal to the sum of the weights and biases. For a given network, the number of
weights is the product of the number of nodes on all links, and the number of biases is the sum
of numbers of all the nodes.
4.6 Summation and transformation function

Summation function is a function which finds the weighted average of all inputs elements (or
nodes) to each processing elements (or nodes). It simply multiplies the input values by the
weights and totals them up for a weighted sum. The transformation function (or transfer
function or local memory) is a relationship between the internal activation level ( N ) of the
neuron (called activation function) and the outputs. The transformation function is a kind of
sigmoid function. A function f ( N ) will be a sigmoid function if it has two certain
characteristics: 1) it is bounded; and 2) the value of a sigmoid function always increases as N
increases (Smith, 1993). A number of different functions have these characteristics and thus
qualify as sigmoid functions. Any of them may be used in the neural network.
Usually, the micro-computer with single processor, called Von Neumann computer is used to
train and test networks. In fact, the biological neural system architecture is completely different
from the Von Neumann architecture. This difference significantly affects the type of functions
each computational model can best perform. Table 1 compares Von Neumann computer with a
biological neural system. Figure 3 shows some common forms of transformation functions.
Table 1 Comparison between Von Neumann Computer and Biological Neural System.
Von Neumann Computer Biological Neural System
Rule - based Rule-less or Example-based
Symbolic Distributed
Serial Parallel
Discrete Continuous
Problem solving Pattern recognition
Psychology Bio-physiology
Cognitive Behavioral
Structural Functional
Logical Intuitive
Domain specific Domain free
Need rules Find rules
Much programming need Little programming need
Difficult to maintain Easy to maintain
No default tolerant Default tolerant
Need human expert Need only data base
Rigid logic Need only data base
Require re-programming Adaptive system

f (N) f (N) f (N)
1 1 1
0 N 0 N 0 N
-1 -1 -1
1) hard Limiter 2) Threshold 3) Sigmoid
a. Bi-linear and continuous function
1
1
Logistic: f ( N) =
1+e- ( N - )
f(N)
1 - e-N eN - e-N
Sine wave: g( N) = N Hyperbolic tangent :tanh( N) = N N
1+e e +e
20 10 0 10 20 N
b. Continuous function
Figure 3 shows some common forms of transformation functions.
4.7 Learning rate and momentum

Back propagation is a time-consuming algorithm when either the size of the net is large or the
number of the training patterns is large (Khan et al., 1993). Back propagation has some
limitations. There is no guarantee that the network can be trained in a finite amount of time. It
employs gradient descent, i.e. follows the slope of the error surface downward and constantly
adjusts the weights towards minimum. Therefore, it has the danger of getting trapped in a local
minimum before achieving the global minimum. It is important to select the correct learning
rate and momentum term when using back propagation. Unfortunately, there is little guidance,
other than experience, which is based on trial-and-error (Anderson et al., 1993; Khan et al.,
1993).
Learning rate, 1 is the constant of proportionality which provides dynamic access to the rate
at which weights may be changed. A high learning rate corresponds to rapid learning which
may push the training towards a local minimum or cause oscillation. In turn, when applying
small learning rates, the time to reach a global minimum will be considerably increased (Khan
et al., 1993). Learning rates for each layer of the same network can be different.
The remedy for problems of choosing learning rate is to apply a momentum factor, which is
multiplied by the previous weight change so that while the learning rate is controlled the
changes are still rapid (Khan et al., 1993). The role of the momentum term, is to smooth out
the weight changes, which helps to protect network learning from oscillation (Anderson et al.,
1993). A rule of thumb is that the learning rate for the last hidden layer should be twice that of
the output layer. If there were no connections that jump layers, the learning rate for each prior
hidden layer should be twice that of the prior hidden layer (Klimasauskas, 1993).
5. Training
Training the ANNs can be described as follows.
5.1 Definitions
The term "training" or "learning" can be one of, or a combination of the following definitions:
1) Training means a process whereby error is used to modify the weights so that the
network gives a more correct answer the next time (Klimasauskas, 1993).
2) Learning is a mechanical process which may be decision trees, called explanation
trees. It is used for providing decision rules (Adeli and Yeh, 1990).
3) Learning is the process whereby the ANN learns from its mistake. It usually involves
three tasks: 1) computes outputs; 2) compare outputs with desired outputs; and 3)
adjusts the weights and repeats the process (Medsker et al., 1993).
In this research, the two terms "training" and "learning" are used interchangeably. Training (or
learning) is the process by which the weights and biases are initialized randomly. It deals with
splitting the samples prior to feeding them to the networks. These also include the algorithm
used for minimizing the system error, and criteria for stopping training.
5.2 Methods of training

A network learns because the strength of the connections between neurons change. The
efficiency of a neuron in exciting or inhibiting another is not constant. Network training is a
matter of adjusting weights, either manually or automatically such that the network is capable
of reproducing the target output within specific error margin for the respective input pattern. In
training a network, error is used to modify the weights so that the network gives a more correct
answer the next time. It may increase or decrease over time, depending systematically on
experience. Several diagnostic tools are helpful in particular to understand how the network is
training, e.g. measuring the mean square error of the entire output layer (Klimasauskas, 1993).
There are two categories of training (Lippmann, 1987; Smith, 1993; Kireetoh, 1995). First,
supervised training requires pairs of input vector with a target vector representing the desired
output, called training pairs. When the input vector is applied, the output is then, calculated,
and compared to the corresponding target vector. The difference is fed back through the
network. The weights are adjusted according to an algorithm that tends to minimize the error.
Second, unsupervised training has been introduced by Kohonen (1984) and is a far more
plausible model of training in biological systems. Target vector is not required for the outputs.
Therefore, no comparison to predetermined ideal response. The training process extracts the
statistical properties of the training set and groups similar vectors into classes, applying a
vector from a given class to the input will produce a specific output vector. There is no way to
determine prior to training which specific output pattern will be produced by a given input
vector class. Hence, the outputs of such a network must generally be transformed into a
comprehensible form subsequent to the training process.
5.3 Back-propagation training algorithm and generalized delta rule (GDR)

In the mid 1950's, Rosenblalt proposed a neural net model called perceptron. Widrow and Hoff
established an algorithm that can be expressed in terms of the delta rule for learning the value
of weights and known as Widro-Hoff Rule (Lippmann, 1988; Chester, 1993). In 1986,
Rumelhart, Hilton and Williams extended this learning procedure to establish the back-
propagation learning rule, called Generalized Delta Rule-GDR. Back propagation, or backdrop
has been the most popular and widely implemented of all neural network paradigms. It is based
on a multi-layered feed forward topology with supervised learning. The propagation of error
operates into two modes: 1) mapping; and 2) learning. In the mapping mode, information flows
forward through the network, from inputs to the outputs. In the learning mode, the information
flow alternates between forward and backward. A key element in the back-propagation
paradigm is the existence of a hidden layer of nodes. This frees the network from the linear
limitations of the perceptron.
Smith (1993) summarized five ways in which the power of back-propagation can deliver
significant benefits as follows.
1) Using back-propagation may reduce the cost of building the model as it allows the
user to substitute machine time or computer time for person time.
2) Back-propagation may produce a better model if two conditions are met. First, the
form of the relationship between inputs and desired outputs shall be more complex
than the form of the function that is imposed on the model by the conventional tool.
Second, the sample is large enough in order to permit it to find the relationships
underlying the noise in the data.
3) Using back-propagation provides assurance that the model is as good as it can be since
the cost of finding out or time required for knowing the complexity of the problem
may be prohibitive.
4) Back propagation provides an opportunity to build a single model with multiple
outputs that is not possible for conventional techniques.
5) There are advanced forms of back-propagation with capabilities that are not found in
conventional methods.
Despite the universal approximation properties, back-propagation networks suffer from four
main problems. The first problem is that network structuring is a versatile, intuitive, and highly
solution-dependent trial-and-error task. The second is that the algorithm is slow in training and
convergence is very sensitive to the initial set of weights. The third is that training can be
trapped in local minima or paralyse. The fourth is that the design of an optimum network
configuration for a given problem is a non-guided or trial-and-error process that does not
guarantee adequate generalization. Some simple techniques and heuristic to improve back-
propagation performance are: 1) divide the problem into several sub-problems and conquer
them with appropriate techniques; 2) use simple ANNs with sufficient flexibility; and 3) use
Expert System to build explanation facility for the ANNs or vice versa (Hegazy and Moselhi,
1994).
5.4 Criteria for stopping training

There are two common criteria to stop training a network: 1) training cycles (epochs); and 2)
desired errors. Carpenter (1993) suggested to typically apply 20,000 to 100,000 training cycles
(epochs) to train the network when steepest descent method is used. The other criteria is to
limit the difference between desired output and output calculated by the network (Khan et al.,
1993; Hegazy and Moselhi, 1994; Vaziri, 1996). The training process may be brought to halt
using either the worst error difference after complete presentation of all input-output patterns,
or the root mean square error summed over all patterns. In practice, it is sometimes necessary
to apply or compare both approaches to ensure the capability of the trained network in
generalizing on the tested samples and application. The errors of tested samples is generally
higher than the error of training sample as the network is trained to reduce the latter, not the
former.
However, the over-trained network would occasionally result in overfitting. Overfitting means
the network can converge and yield a minimum or desired errors in training samples but it
cannot generalize well when validated with testing sample. Smith (1993) suggested to spot
over-fitting by watching the error on the test sample. The weights that produce the lowest error
on the test sample would be used for the model.
5.5 Updating the networks

Medsker et al. (1993) suggested that a convenient procedure must be planned for updating the
training sets and initiating periodic re-training of the network. This includes the ability to
recognize and new cases that are discovered when the system is used routinely. On-going
monitoring and feedback to the developer are recommended for system maintenance and
improvement and long term success. In other words, feedback may be useful in the design of
future versions or in new products. Even with the help of ANN tools with several paradigms
available in the market, developing a network might not be so simple. Specifically, it would be
necessary to program the layout of database, to partition the data for training and testing, and to
transfer the data to files suitable for input and output to the ANN tool.
6. Samples
Samples mean the number of known or pre-determined inputs and outputs which are used for training
and testing networks. Yeh et al. (1993) suggested three possible sources of samples: 1) expert's
questionnaire; 2) historical records; and 3) simulations. Samples are always divided into two set;
training and testing. The training set is a portion of samples which is extracted from a set of the whole
samples. The training set is used to derive the classification algorithm while test set is the remaining
patterns which are used to test the classification algorithm. The training set should be chosen so that it
represents equally the likelihood of each outcome. Carpenter and Barthelemy, (1992) stated that it is
not exactly true that neural networks can be trained with fewer training pairs than other types of
approximations. Eventhough the network approximation can match the exact function at the design
point, it would not guarantee that a unique approximation over the region of interest would be achieved.
The term "underdetermined neural networks", means networks having the number of training pairs
fewer than the number of weights and biases associated with the network. This leads to inadequate
approximation. Such networks can be trained to exactly duplicate the exact function at the design
points. However, approximations thus obtained are not unique. Klimasauskas (1993) recommended that
at least five examples should be provided for each weight in training the network. By contrast, Bahrami
(1994) suggested that the number of training samples should be approximately ten times the number of
weights.
Smith (1993) said that samples for training and testing may be limited by the computer's memory or
storage. If all the data is held in memory simultaneously during training, the size of computer's memory
may be a significant constraint. On the other hand, using disk storage permits a much larger sample but
is much slower because the same data have to be read from disk, epoch after epoch. Increasing the
sample size would not exactly increase the training time since larger samples require fewer training
epochs. The basic rule, therefore, is to use the largest samples available, so long as they fit in
computer's memory or storage. In this study, the input and output matrices are embedded in the
computer's memory or program as they would not significantly affect the computer time.

7. Testing
Testing or validation of the ANNs can be described as follows.
7.1 Definitions
Another process in neural network modelling is called testing or classifying. It can be one of or
a combination of the following definitions.
1) Testing is to determine how well a network has captured the nature of a function. It is
to validate the network on additional samples that are not used in training the network,
called testing set. The network which can yield the best performance on the validation
samples would be the best accurate model among the other networks which are trained
all the way to converge on the training samples (Smith, 1993).
2) Testing is a phase to examine the performance of the network by using the derived
weights. It is to measure the ability of the network to classify the testing samples
correctly (Medsker et al., 1993).
3) Testing is a process by which the testing samples are used to determine how well the
network performs on data it has not seen before during training. A properly built and
trained network will exhibit similar levels of performance on both the training and
testing sets. If performance differs widely, appropriate corrective action should be
taken to the architecture, composition, or size of the training and testing sets
(Klimasauskas, 1993).
Training process used in this research conforms to all the definitions given above. It deals with
feeding the remaining samples to the trained networks to validate their generalization
capability. Among the number of successfully trained networks, the best network may yield
minimum error when it is validated by the test samples.
7.2 Test set

There are two methods of testing the effectiveness of a network (McKim et al., 1996). First, the
samples have to be split into training and testing sets. Following the completion of training, the
independent variables of the test set are then given to the trained network at the input layer and
the resulting output for each set of dependent variable associated with the independent
variables. The difference between the predicted and actual variables is calculated for all
samples of the test set. Then, the system error is calculated for the test set. The smaller the
value of the system error, the more the network would have better predictive capabilities.
Another method uses all samples as the learning and testing sets. All the independent and
dependent variables are used in the training of the network. When the network error reaches
the threshold value, the network is considered trained. All the independent variables are given
to the trained network again to test its predictive capabilities. The output produced by the
trained network is compared against the actual independent variables. Similarly, the error of
the system is calculated, and the result is used to indicate the predictiveness of the network.
Both methods have advantages and disadvantages.
The former method needs considerable numbers of training samples such that effective
learning could be achieved. Possibly, splitting the data into two sets may not be truly random
and can introduce biases in the data which will distort the model. The latter method uses all
samples to train the network which would significantly improve the learning process but the
network would memorize relationships rather than truly developing an analytical model.

Klimasauskas (1993) said that the test set should be chosen to represent the entire population.
So, he recommended to extract a test set first, then select a training set from the remaining
examples.
This dissertation splitted the samples into two equal sets. One is used for training while the
other is used for testing the networks.
8. Outputs
Representing the dependent variable in a several class problem as a single quantitative output is
straightforward and understandable. The computational time will be reduced when there is only one
output node. The single output is easier to interpret and apply. A dependent variable could be less
variable, indicating what category an example belongs to, i.e. a binary representation can also be used
for dependent class variable. There is an intuitive appeal to using the value 1 and 0 to represent on and
off.
When logistic sigmoid function is used, the actual outputs produced by a network are continuous-
valued quantities. There ranges are limited by the bounds of the function, i.e. 0 and 1 are not the values
the output nodes can actually produce. They can only approach these bounds. If the network is trained
with these target values, it would continue pushing its weights toward extreme values until training is
stopped. This is difficult to control over-fitting.
In learning situations, the values of 0.9 and 0.1, are sometimes used for specifying binary target output
values and hence, the outputs of a set of data should be scaled to within this range (Khan et al., 1993;
Smith, 1993). The outputs could also be a class variable. They would represent the dependent variables
according to the way the target outputs represent them. Representing the dependent variables is easier
than the representation of the independent variables as inputs as there are few choices available. They
would not greatly affect the accuracy of the network or training time (Smith, 1993). A single network
can be designed to predict more than one output without any special problem. The dependent variables
can include both quantitative and class variables (Smith, 1993).
9. Advantages of ANNs
ANNs have interesting properties and characteristics. They can perform higher level human tasks such
as decision making, planning, scheduling, natural language generation and understanding, visual pattern
recognition, diagnosis, classification problem solving and designing (Adeli, 1981). They are based on
pattern recognition (Hawley et al., 1993; Kimoto et al., 1993) which are effectively applied for: 1)
classifying (Hawley et al. (1993; Klimasauskas, 1993); 2) associative memory; and 3) clustering.
Klimasauskas (1993) remarked that ANNs can also improve the performance of several existing
technologies in modelling, forecasting, and signal processing. ANNs offer certain advantages over the
traditional rule-based system, i.e. conventional programming, and knowledge-based expert systems.
Figure 4.5 shows the characterization of neural networks and rule-based systems.
ANNs are attractive because of at least four reasons.

1. They are weighted connection and massively parallel processing with fault tolerance, so that
they can automatically learn from experience. This is called internal representation (Kireetoh,
1995).
2. They have the generalization capability to learn complex patterns of inputs and provide
meaningful solutions to problems even when input data contain errors, or are incomplete, or
are not presented during training. In other words, they have the ability to integrate information
from multiple sources and incorporate new features without degrading prior learning
(Karunasekera, 1992; Hawley et al., 1993; Medsker et al. 1993; Chao and Skibniewski, 1994;
Flood and Kartam ,1994a and 1994b).
3. They are distribution free because no prior knowledge is needed about the statistical
distribution of the classes in the data sources in order to apply the method for classification.
This is an advantage over most statistical methods that require modelling of data
(Karunasekera, 1992; Hawley et al., 1993; Wu and Lim, 1993; Khoshgoftaar and Lanning,
1995). Neural networks could avoid some of the shortcomings of the currently used
statistically or empirically based techniques.
4. They take care of determining how much weight each data source should have in the
classification, which remains a problem for statistical methods (Karunasekera, 1992; Wu and
Lim, 1993). The non-linear learning and smooth interpolation capabilities give the neural
network an edge over standard computers and rule-based systems for solving certain problems
(Kimoto et al., 1993; Wu and Lim, 1993).
10. Awareness of using ANNs
Following are numbers of awareness of using the ANNs.
10.1 ANNs as an alternative to regression models

An artificial neural network can be compared to regression model by using the errors of the
system, e.g. root mean squared error, mean absolute percentage error, and mean squared error
(Marquez et al., 1993; Vazini, 1996). Neural networks have considerable potential as
alternatives to regression models. Marquez et al. (1993) recommended that the potential
offered by a neural network is tempered by several basic problems. Guidelines are needed to
deal with the enormous number of choices and decisions the model builder has to make.
Without these guidelines, the procedure for selecting the structure of a neural network as well
as the selection of the training parameters would continue to be a trial-and-error process.
Carpenter and Barthelemy (1994) summarized common mis-conception of using neural

networks as approximation compared with using polynomial in four areas. First, it is not
exactly true that neural network approximations are superior to other types of mathematical
approximation. In general sense, the performance of an approximation depends upon the
number of underdetermined parameters, i.e. weights, and biases, associated with the
approximation. Second, it is not exactly true that neural networks can be trained with fewer
training pairs than other types of approximations. Eventhough network approximation can
match the exact function at the design point, it will not guarantee that a unique approximation
over the region of interest will be achieved. The "underdetermined neural networks" are
networks having the number of training pairs fewer than the number of weights and biases
associated with the network. Such networks can be trained to exactly duplicate the exact
function at the design points. Like the underdetermined polynomial approximations,
approximations thus obtained are not unique. A necessary condition for obtaining a unique
approximation is to have the number of the design points, used for training the approximation
equal or greater than the number of parameters associated with the approximation. Third, it is
not exactly true that neural networks are less sensitive to training data than other types of
approximation. Fourth, it is not exactly true that neural network approximations are as easy as
to obtain as other types of approximation because the training process for neural network is a
time-consuming task. Most neural networks algorithm use Delta-error-back-propagation
algorithm to adjust the weights and biases on each learning cycle so that difference between the
predicted and desired outputs is reduced to lower than or equal to an acceptable limit. This is a
steepest descent method of minimizing the error using the values of weights and biases defined
by a learning parameter.
10.2 Possible errors in network modelling

Below are three main errors frequently found in network modelling (Khaing, 1992; Smith,
1993).
1) Local minima, is the presence of more than one valley in the error surface that yields a
potential problem, i.e. the error function is at a local minimum rather than at the global
minimum. Given a specific data set, the probability with which a local minimum exists
goes down rapidly as the number of weights in the network increase. It is very likely
to eliminate the existence of local minima for a network of a certain size by adding
more hidden nodes.
2) Noise, includes inaccuracies due to the fact that the independent variables do not
contain all the information needed to determine the dependent variable, other factors
that are not included in the model may play an important role. In other words, noise is
not an inherent randomness or absence of causality in the world, rather, it is the effect
of missing (or inaccuracies) the information about the world.
3) Mapping, this error occurs when mapping function does not have the same form as the
target function.
10.3 The problem of explanation

ANNs lack the capacity to explain their conclusions. They are unable to reason in a sequential
or stepwise manner that results in precise conclusions. These restrictions could be critical when
dealing with situations that demand exact answers and lucid justifications (Baker, 1993;
Hawley et al., 1993; Kahkonen and Pallas, 1993; Medsker et al., 1993; Tan et al., 1996). Due
to the difficulty in explaining, the only way to test the system for consistency and reliability is
to monitor the output (Hawley et al., 1993; Smith, 1993). However, the time and effort
required to train and test the ANNs is much less than that required to extract and translate an
expert's knowledge base for a rule-based system (Baker, 1993). In addition, Medsker et al.
(1993) said that properly configured and trained networks could often make consistently good
classifications, generalizations, or decisions, in a statistical sense.
11. Researches and Developments in ANNs
Interesting numbers of research in the ANNs are summarized as follows.
11.1 Learning paradigm

Lippmann (1987) described the roles of input, hidden and output layer. Each first-layer node
formed a hyper-plane in pattern space because the inputs to the node formed a linear
combination. So, it would be a line in two-dimensional space, a plane in three-dimensional
space, and hyperplanes in N-dimensional space. At the next layer, a node was used to carry out
an operation on the collection of hyperplanes. Thus, whereas a node in the second layer formed
a hyper-plane, and a node in the third-layer formed a hyperregion. He also concluded that
generalized delta rule was superior to the statistical methods in terms of the classification
accuracy.
Chu et al. (1990) proposed an improved pocket learning algorithm of neural network to
increase the learning speed of back-propagation learning and pocket algorithm. They pointed
out two main problems for which solutions have not yet been determined which were: 1) how
many hidden cells were needed in the neural network to learn successfully; and 2) how to
shorten the time of learning.
Karunanithi et al. (1994) used an adaptive neural network with a constructive algorithm called
cascade-correlation. This dynamic expansion of the network continues until the problem is
successfully learned. Thus, the cascade-correlation algorithm automatically constructs a
suitable network architecture for a given problem. Further, it has more consistency in solving
problems and provides faster learning than the standard back-propagation learning algorithm.
Adeli (1995) stated that the convergence speed of neural network learning models was slow,
perhaps several hours or even days of computer times of conventional serial workstation. The
number of iterations for learning an example is often in the order of thousands. The
convergence rate is highly dependent on the choice of the values of learning and momentum
ratios encountered in the algorithm. The proper values of these two parameters are dependent
on the type of problem.
Bahrami (1995) pointed out the different methods of reducing the number of weights in a
network while at the same time retaining the capability of solving the problem, e.g. pruning
and weight sharing. Weight sharing makes use of local connections with identical weights such
that individual nodes process only a local region of the input.
11.2 Neuro computers

Forsyth (1992) identified two computer architectures: 1) Von Neumann; and 2) Connected
Machine (parallel computer). The former conducts the calculations by a single processor using
separate memory. The latter consists of a large number of simple processors, each with its own
memory such that all calculations are concurrently conducted by individual processors working
on various parts of the same problem. The architecture of the Connection Machine can be
simulated on a Von Neumann computer which is still dominant. One concept of concurrent
computing is called "neural network". He identified five decades in the history of artificial
intelligence: 1) the dark ages of neural network (1950s); 2) the age of reason in automated
logic (1960s); 3) the romantic movement or knowledge engineering (1970s); 4) the
enlightenment or machine learning (1980s); and 5) gothic revival or neural network revisiting
(1990s). In addition, he gave reasons why the neuro-computing approach succeeded when its
limitations had been exposed more than 25 years ago. First, the modernization of computing
machinery has many orders of magnitude more powerful than anything available to the
ciberneticians of the 1950s, i.e. it is possible to construct neural nets with 100,000 processing
elements. Second, the learning rules of multi-layered neural networks has been discovered.
Adeli (1996) commented on the increase in powerful computers combined with novel
computing paradigms in next century. They will have profound impact on architectural,
engineering, and construction through automation of complex and large-scale projects. He
focused on several emerging computing paradigms, e.g. neuro-computing, parallel processing,
and distributed computing. As true intelligence is associated with learning, most neural
networks research has been done in the area of machine learning. He stated that the multi-
processing capabilities are presently limited to high-performance parallel machines and
supercomputers. He hoped that they would be widely available on desktop machine by the year
2,000 which would be helpful for parallel processing and distributed computing.
11.4 Explanation facility

Kahkonen and Pallas (1993) suggested two possible ways to obtain satisfactory explanation
facilities and the means for the user to manage the performance of the resultant system. One is
to divide and conquer the problem into sub-problems; each of which is solved using an
appropriate technique. This results in an "integrated system" which can include neural
networks, sequential algorithms and rule-based problem solving. The other is to embed neural
networks, i.e. in an rule-based system. In the latter case, the neural networks will be called only
when they are needed during the problem-solving process such that they can function and
return the required values of attributes to the system.
Li (1996) discussed research issues in developing the explanatory functionality for ANN based
cost models. He described a prototype model for ANN based cost estimation system. The
system can generate rules and inference tracks to form explanation. He explained the
possibility of developing a rule-extraction algorithm which could be added to the ANN based
cost model. The rule-extraction algorithm is machine learning which will exploit and extract
rules from the inter-relationships of a trained ANN based cost model.
12. Applications of ANNs
ANNs can be applied in pattern recognition. Chen and Wang (1990) used multi-layered perceptron to
identify and correct wrong spelling. Berry and Trigueiros (1993) applied ANNs to the extraction of
knowledge from accounting reports which is a kind of classification study. Kireetoh (1995) presented
an application of the BP network with one hidden layer to recognize and identify hand-writing of Thai
numeric (0 to 9).
ANNs can be applied to diagnosing problems. Yeh et al. (1993) built a combined knowledge-based
expert system (KBES) with an artificial neural network (ANN) for diagnosing piles. Szewczyk and
Hajela (1994) used neural networks to detect damage in structural systems.
A number of researches use ANNs in decision making and optimization. Tseng et al. (1990) used
Hopfield network to solve constrained task allocation problems. Wang and Tsai (1990) used Hopfield
network with a time-varying energy function to solve travelling salesman problem. Kamarthi et al.
(1992) used a two-layered back-propagation network for selecting of vertical formwork systems for a
given building site. Murtaza and Deborah (1994) used neural network with Kihonen algorithm, i.e.
unsupervised learning, for decision making on construction modularization. Soemardi (1996) used two
fuzzy neural networks for solving group decision making in selecting a wall system under multiple
criteria in attempt to minimize the efforts to bring all the parties involved.
ANNs can also be applied for predicting or forecasting. Karunasekera (1992) used simulated annealing
neural network to classify remotely sensed data. Siang (1992) applied back-propagation method in
forecasting monthly water quality, i.e. temperature, pH, conductivity and water flow. Anderson et al.
(1993) used back propagation network to predict bi-linear moment-rotation characteristics for minor-
axis connections. Hawley et al. (1993) applied ANNs for financial decision making. Kimoto et al.
(1993) used modular neural networks to model stock market prediction. Raghupathi et al. (1993)
applied ANN to predict bankruptcy. Salchenberger et al. (1993) applied back-propagation network to
predict thrift failures. Surkan and Singleton (1993) used multi-layered neural network to improve bond
rating. Williams (1993) developed back-propagation networks for predicting one-month and six-month
changes in construction cost index. Wu and Lim (1993) applied multilayered feed forward neural
networks to correlate the maximum scour depth at a spur dike. Chao and Skibniewski (1994) used
neural network and observation-data-based approach to estimate construction operation productivity.
Gagarin et al. (1994) applied neural networks to the problem of determining truck attributes, i.e.
velocity, axle spacings and axle loads, purely from strain-response taken from the structure over which
the truck was travelling. Hegazy and Moselhi (1994) used back-propagation artificial neural networks
to develop an optimum markup estimation model that derived solutions to new bid situations. Li (1996)
used artificial neural network to model construction cost estimation. McKim et al. (1996) used neural
network to predict effectiveness of construction firm. Pezeshk et al. (1996) used neural network to
interpret the bore hole geographical and formation well logs. Vaziri (1996) used neural network model
to predict monthly carbondioxide concentration. Elazouni et al. (1997) used ANNs to estimate the
required resources of concrete silo walls at conceptual design stage.
ANNs can be applied for design, planning, and management. Mo (1993) classified automated systems
in structural design into four basic types: 1) traditional computer-aided design; 2) databases
management systems; 3) expert systems; and 4) neural networks. Mawdesley and Carr (1993)
investigated the possibility of using artificial neural networks to produce Project Planning Networks
(PPNs) to substitute the shortage of skilled planners and the ever increasing complexity of projects.
Chua et al. (1997) used ANNs to identify the key management factors affecting budget performance in
a project.
13 Summary
ANNs have been reviewed. Accordingly, a number of researches clearly show the potential, capability,
advantages, and application, particularly in optimizing, decision making, and forecasting. ANNs' have
their structure like the cost models described in the last chapter, i.e. they consist of input, output, and
transformation of data. Moreover, ANNs are distribution free models that need no prior knowledge
about the statistical distribution. They out-perform the currently used empirically and statistically based
techniques. These conform to the needs for developing the approach and tool for pre-design
construction cost and duration estimating in the hope that the ANNs can be uses in parallel to or
partially replace the heuristics or the experience of estimating experts. However, there are some minor
problems of ANNs' modelling of which we must be aware. Therefore, the processes must be done
carefully. Good ANN models should not be time-consuming and costly. They should need only major
inputs, simple topology and learning paradigm. The outputs should be easy to understand or interpret as
far as the explanation cannot be provided. Methodology and development of the ANNs will be
discussed in the following chapters.
References
Adeli, H., 1988. Expert Systems in Construction and Structural Engineering, Chapman and Hall, London.
Adeli, H., 1992. Computer-aided Engineering in The 1990's. The International Journal of Construction
Information Technology, Proc. Paper, 1, 1: 1-10.
Adeli, H., 1995. Knowledge Engineering. Archives of Computational Methods in Engineering; State of the Art
Reviews, 2, 4: 51-68.
Adeli, H., 1996. Computing for the Year 2000. In Lye, H.K, Sang, C.Y., and Adeli, H., Editors. Computing &
Information Technology for Architecture, Engineering & Construction, Proc.: 1-5.
Adeli, H., and Yeh, C., 1990. Explanation-Based Machine Learning in Engineering Design, Engineering
Applications of Artificial Intelligence, 3, 2; 127-137.
Adeli, H., and Hung, S.L., 1995. Machine Learning; Neural Networks, Genetic Algorithms, and Fuzzy Systems,
John Wiley & Son Inc., New York.
Aleksander I., and Morton, H., 1990. An Introduction to Neural Computing, 1st Ed., Chapman and Hall, London.
Al-Tabtabai, H., Kartam, N., Flood, I, and Alex, A.P., 1997. Expert Judgment in Forecasting Construction Project
Completion, Engineering Construction and Architectural Management, 4, 4: 271-293.
Anderson, D., Hines, E.L., Arthur, S.J., and Eiap, E.L., 1993. Application of Artificial Neural Networks to
Prediction of Minor Axis Steel Connections. In Topping, B.H.V., and Khan, A.I., Editors, Neural Networks and
Combinatorial Optimization in Civil and Structural Engineering, Civil-Comp Press: 31-37.
Arciszewski, T., and Ziarko, W., 1992. Machine Learning in knowledge Acquisition. In Arciszewski, T., and

Rossman, L.A., Editors. Knowledge Acquisition in Civil Engineering,
American Society of Civil Engineers, New York: 50-68.
Bahrami, M., 1955. Issues on Representational Capabilities of Artificial Neural Networks and Their
Implementation. International Journal of Intelligent Systems, 10, 6: 571-579.
Baker, D., 1993. Analyzing Financial Health: Integrating Neural Networks and Expert Systems. In Trippi, R.R.,
and Turnan, E., Editors. Neural Networks in Finance and Investing, Probus Publishing: 85-72.
Berry, R.H., and Trigueiros, D., 1993. Applying Neural Networks to the Extraction of Knowledge from
Accounting Reports: a Classification Study. In Trippi, R.R., and Turban, E., Editors. Neural Networks in Finance
and Investing, Probus Publishing: 103-124.
Bharath, R., and Drosen, J., 1994. Neural Network Computing, Windcrest/McGraw-Hill, New York.
Bhokha, S., and Ogunlana, S.O., 1994. Friendliness of Artificial Neural Network Applied in Civil Engineering.
Second National Convention on Civil Engineering, Engineering Institute of Thailand, Proc.: 387-390.
Bhokha, S., 1996. Neural Networks for Forecasting Construction Duration of Pre-design Buildings. Third
National Convention on Civil Engineering, Engineering Institute of Thailand, Proc.: CMT 4-1 - CMT 4-9.
Bode J., 1998. Neural Networks for Cost Estimation, Cost Engineering, Vol. 40, No.1: 25-30.
Carpenter, W.C., and Barthelemy, J.F., 1994. Common Misconceptions About Neural Networks as
Approximators. Journal of Computing in Civil Engineering ASCE, Proc. Paper No. 5442, 8, 3: 345-358.
Chao, L.C., and Skibniewski, M.J., 1994. Estimating Construction Productivity: Neural Network-based
Approach. Journal of Computing in Civil Engineering ASCE, Proc. Paper No. 5952, 8, 2: 234-251.
Chen, P.Y., and Wang, J.S., 1990. A Smart Spell Checking System-A Neural Net Approach. International
Computer Symposium, National Tsing Hua University, Taiwan, Proc.: 885-890.
Chester, M., 1993. Neural Networks; A Tutorial, Prentice-Hall Inc., New Jersey.
Chu, Y.P., Wang, S.T., and Hsieh, C.M., 1990. An Improved Pocket Learning Algorithm of Neural Network.
International Computer Symposium, National Tsing Hua University, Taiwan, Proc.: 897-902.
Chua, D.K.H., Kog, Y.C., Loh, P.K., Jaselskis, E.J., 1997. Model for Construction Budget Performance-Neural
Network Approach. Journal of Construction Engineering and Management, Proc. Papar 14320, 123, 3: 214-222.
Elazouni A.M., Nosair I.A., Mohieldin Y.A., and Mohamed A.G., 1997. Estimating Resource Requirements at
Conceptual Design Stage Using Neural Networks. Journal of Computing in Civil Engineering ASCE, Proc. Paper
11485, 11, 4: 217-223.
Forsyth, R., 1992. In Search of Knowledge. In Arciszewski, T., and Rossman, L.A., Editors. Knowledge
Acquisition in Civil Engineering, American Society of Civil Engineers: 1-10.
Gagarin, N., Flood, I., and Albrecht, P., 1994. Computing Truck Attributes with Artificial Neural Networks.
Journal of Computing in Civil Engineering ASCE, Proc. Paper No. 5307, 8, 2: 179-200.
Hawley, D.D., Johnson, J.D., and Raina, D., 1993. Artificial Neural Systems; A New Tool for Financial Decision-
making. In Trippi, R.R., and Turban, E., Editors. Neural Networks in Finance and Investing, Probus Publishing:
27-46.
Jain, A.K., and Mao, J.C, and Mohiuddin, K.M., 1996. Artificial Neural Networks: Tutorial, Computer, 29, 3: 31-
44.
Kahkonen, K., and Pallas J., 1993. Roles, Benefits and Objectives of Neural Networks in Building Construction
Project Practice. In Topping, B.H.V., and Khan, A.I., Editors. Neural Networks and Combinatorial Optimization
in Civil and Structural Engineering, Civil-Comp Press: 53-59.
Kamarthi, S., Sanvido, V.E., and Kumara, R.T., 1992. NEUROFORM-Neural Network System for Vertical
Formwork Selection. Journal of Computing in Civil Engineering ASCE, Proc. Paper No. 416, 6, 2: 178-199.
Karunanithi, N., Grenney, W.J., Whitley, A., and Bovee, K., 1994. Neural Networks for River Flow Prediction.
Journal of Computing in Civil Engineering ASCE, Proc. Paper No. 4230, 8, 2: 201-220.
Karunasekera, H.N.D., 1992. Neural Network Structure Generation for the Classification of Remotely Sensed
Data Using Simulated Annealing, M.Eng Thesis, Asian Institute of Technology, Bangkok.
Khaing, W.W., 1992. Concept Generalization with Neural Network, M.Eng Thesis, Asian Institute of
Technology, Bangkok.
Khan, A.I., Topping, B.H.V., and Bahreininejad, A., 1993. Parallel Training of Neural Networks for Finite
Element Mesh Generation. In Topping, B.H.V., and Khan, A.I., Editors. Neural Networks and Combinatorial
Optimization in Civil and Structural Engineering, Civil-Comp Press: 81-94.
Khoshgoftaar, T.M., and Lanning, D.L., 1995. A Neural Network Approach for Early Detection of Program
Modules Having High Risk in the Maintenance Phase. Journal of Systems Software, 9, 1: 85-91.
Kimoto, T., Asakawa, K., Yoda, M., and Takeoka, M., 1993. Stock Market Prediction System with Modular
Neural Networks. In Trippi, R.R., and Turban, E., Editors. Neural Networks in Finance and Investing, Probus
Publishing: 343-356.
Kireetoh, S., 1995. Neural Networks Technology. Engineering Institute of Thailand, Engineering Institute of
Thailand, Proc.: EE371-EE384.
Klemic, G.G., 1993. The Use of Neural Computing Technology to Develop Profiles of Chapter 11; Debtors Who
Are Likely to Become Tax Delinquents? In Trippi, R.R., and Turban, E., Editors. Neural Networks in Finance and
Investing, Probus Publishing: 125-137.
Klimasuaskas, C.C., 1993. Applying Neural Networks. In Trippi, R.R., and Turban, E., Editors. Neural Networks
in Finance and Investing, Probus Publishing: 47-72.
Li, H., 1996. Towards Developing Artificial Neural Network Based Cost Model with Self-explanatory Abilities,
Urban Engineering in Asian Cities in the 21st Century, Proc., School of Civil Engineering, Asian Institute of
Technology: D.265-D.270.
Lippmann, R.P., 1988. An Introduction to Computing with Neural Nets, In Vemuri, V., Editor, Artificial Neural
Networks; Theoretical Concepts, The Computer Society: 36-54.
Maher, M.L., 1987. Expert Systems for Civil Engineers: Technology and Application, American Society of Civil
Engineers, New York.
Marquez, L., Hill, T., Worthley, R., and Remus, W., 1993. Neural Network Models as an Alternative to
Regression. In Trippi, R.R., and Turban, E., Editors. Neural Networks in Finance and Investing, Probus
Publishing: 435-450.
Mawdesley, M.J., and Carr, V., 1993. Artificial Neural Networks for Construction Project Planning. In Topping,
B.H.V., and Khan, A.I., Editors. Neural Networks and Combinatorial Optimization in Civil and Structural
Engineering, Civil-Comp Press: 39-46.
McKim, R., Adas, A., and Handa, V.K., 1996. Construction Firm Organizational Effectiveness: A Neural
Network-based Prediction Methodology. In Langford D.A., and Retik A., Editors. The Organization and
Management of Construction Shaping and Practice, Vol.3, E & FN Spon: 247-256.
Medsker, L., Turban, E., and Trippi, R.R., 1993. Neural Network Fundamentals for Financial Analysis. In Trippi,
R.R., and Turban, E., Editors. Neural Networks in Finance and Investing, Probus Publishing: 3-26.
Mo, Y.L., 1993. Automation in Structural Design. In Topping, B.H.V., and Khan, A.I., Editors. Information
Technology for Civil and Structural Engineers, Civil-Comp Press: 11-17.
Murtaza, M., and Fisher, D., 1994. NEUROMODEX-Neural Network System for Modular Construction Decision
Making, Journal of Computing in Civil Engineering ASCE, Proc. Paper No. 5708, 8, 2: 221-233.
Nielsen, R.H., 1989. Neurocomputing, Addison-Wesley Publishing, New York.
Ogunlana, S.O., and Bhokha, S., 1995. Application of Artificial Neural Networks-ANNs in Civil Engineering.
Engineering Institute of Thailand, Proc.: CE152-CE166.
Paulson, B.B., 1995. Computer Application in Construction, Mc Graw-Hill Inc., New York.
Pezeshk, S., Camp, C.V., and Karprapu S., 1996. Geophysical Log Interpretation Using Neural Network. Journal
of Computing in Civil Engineering ASCE, Proc. Paper No. 9984, 10, 2: 136-142.
Raghupathi, W., Schkade, L.L., and Raju, B.S., 1993. A Neural Network Approach to Bankruptcy Prediction. In
Trippi, R.R., and Turban, E., Editors. Neural Networks in Finance and Investing, Probus Publishing: 141-158.
Rogers, J.L., and Lamarsh, W.J., 1992. Application of a Neural Network to Simulate Analysis in An Optimization
Process. Artificial Intelligence in Design, Proc. Paper: Kluwer Academic Publishers: 739-754.
Salchenberger, L.M., Cinar, E.M., and Lash, N.A., 1993. Neural Networks: a New Tool for Predicting Thrift
Failures, In Trippi, R.R., and Turban, E., Editors, Neural Networks in Finance and Investing, Probus Publishing:
229-253.
Siang, J.J., 1992. Application of Back Propagation Method in Forecasting Problems, M.Eng Thesis, Asian
Institute of Technology, Bangkok.
Smith, M., 1993. Neural Networks for Statistical Modeling, Van Nostrand Reinhold, New York.
Soemardi, B.W., 1996. Fuzzy Neural Network Models for Construction Group Decision Making. In Lye, H.K,
Sang, C.Y., and Adeli, H., Editors. Computing & Information Technology for Architecture, Engineering &
Construction, Proc.: 333-340.
Surkan, A.J., and Singleton, J.C., 1993. Neural Networks for Bond Rating Improved by Multiple Hidden Layers.
In Trippi, R.R., and Turban, E., Editors, Neural Networks in Finance and Investing, Probus Publishing: 275-287.
Szewwzyk, Z., and Hajela, P., 1994. Damage Detection in Structures Based on Feature-sensitive Neural
Networks. Journal of Computing in Civil Engineering ASCE, Proc. Paper No. 3976, 8, 2: 163-178.
Tan, C.L., and Quah, T.S., and Teh H.H., 1996. An Artificial Neural Network that Models Human Decision
Making. Computer, 29, 3: 64-70.
Tseng, Y.S., Wu, J.L., and Huang, J.H., 1990. A Neural Network Approach to Constrained Task Allocation
Problems. International Computer Symposium, Proc., National Tsing Hua University, Taiwan: 767-771.
Vaziri, M., 1996. Predicting the Air Pollution for Tehran Using Artificial Neural Networks. In Lye, H.K, Sang,
C.Y., and Adeli, H., Editors. Computing & Information Technology for Architecture, Engineering &
Construction, Proc.: 385-390.
William, T.P., 1993. Neural Networks to Predict Construction Cost Indexes. In Topping, B.H.V., and Khan, A.I.,
Editors, Neural Networks and Combinatorial Optimization in Civil and Structural Engineering, Civil-Comp Press:
47-52.
Wu, X., and Lim, S.Y., 1993. Prediction of Maximum Scour Depth at Spur Dikes with Adaptive Neural
Networks. In Topping, B.H.V., and Khan, A.I., Editors. Neural Networks and Combinatorial Optimization in
Civil and Structural Engineering, Civil-Comp Press: 61-66.
Yeh, Y.C., Kuo, Y.H., and Hsu, D.S., 1990. Building KBES for Diagnosing PC Pile with Artificial Neural
Network. Journal of Computing in Civil Engineering ASCE, Proc. Paper No. 226, 7, 1: 71-93.
Recommended Readings
http://palaeo-electronica.org/2000_2/neural/neural.htm
http://www.stowa-nn.ihe.nl/ANN.htm
http://www.akri.org/cognition/machmemmod.htm
http://www.alanturing.net/turing_archive/pages/Reference%20Articles/what_is_AI/What%20is%20AI10.ht
ml
http://www.slais.ubc.ca/courses/libr500/02-03-wt1/www/K_MARTIN/1st_page.htm
http://www.aiexplained.com/apps/networks.html

Artificial Neural Networks ANNs

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Artificial Neural Networks ANNs

Uploaded by

Copyright:

Available Formats

Artificial Neural Networks (ANNs)

Sdhabhon Bhokha, Associate Professor, Dean, Faculty of Engineering, Ubonratchathani University,

3. Basic Concepts of ANNs

ANNs by Sdhabhon Bhokha, July 9, 2005 - 2

Figure 1 Two Biological Neural Cells.

Input Input layer Hidden layer Output layer Output

Figure 2 Schematic Multi-layered ANN.

ANNs by Sdhabhon Bhokha, July 9, 2005 - 3

4.1 ANNs modelling processes

4.2 Selecting the variables

4.3 Representing the variables

4.4 Hidden Layer and hidden Node

4.5 Weights and biases

4.6 Summation and transformation function

ANNs by Sdhabhon Bhokha, July 9, 2005 - 7

1) hard Limiter 2) Threshold 3) Sigmoid

a. Bi-linear and continuous function

Figure 3 shows some common forms of transformation functions.

4.7 Learning rate and momentum

Training the ANNs can be described as follows.

5.2 Methods of training

5.3 Back-propagation training algorithm and generalized delta rule (GDR)

5.4 Criteria for stopping training

5.5 Updating the networks

ANNs by Sdhabhon Bhokha, July 9, 2005 - 11

Testing or validation of the ANNs can be described as follows.

7.2 Test set

ANNs by Sdhabhon Bhokha, July 9, 2005 - 12

ANNs are attractive because of at least four reasons.

10. Awareness of using ANNs

Following are numbers of awareness of using the ANNs.

10.1 ANNs as an alternative to regression models

Carpenter and Barthelemy (1994) summarized common mis-conception of using neural

10.2 Possible errors in network modelling

10.3 The problem of explanation

11. Researches and Developments in ANNs

Interesting numbers of research in the ANNs are summarized as follows.

11.1 Learning paradigm

11.2 Neuro computers

11.4 Explanation facility

12. Applications of ANNs

ANNs by Sdhabhon Bhokha, July 9, 2005 - 18

ANNs by Sdhabhon Bhokha, July 9, 2005 - 21

You might also like