Professional Documents
Culture Documents
1. Introduction
This chapter deals with a new enhanced computer technology which is a branch of artificial
intelligence, called Artificial Neural Networks (ANNs). Different terms and definitions are given. Basic
concepts of ANNs are explained, i.e. neural cells or neurons, perceptron, Hopfield net. Next, various
aspects of the ANNs' modelling are described, i.e. the processes, selecting and representing the
variables, hidden layers and nodes, weights and biases, summation and transformation function,
learning rate and momentum. Training process is also presented, i.e. definitions, methods, back-
propagation training algorithm and Generalized Delta Rule (GDR), and updating the network. The
samples used for network modelling are discussed, i.e., methods of sampling, amount of samples, and
how they will be fed into the network during training and testing. After that, the testing process is
explained. The following part explains the outputs obtained from the ANNs. Then, advantages and
awareness of using the ANNs are addressed. Then, researches, developments in ANNs, as well as
application of the ANNs are expressed. Finally, this Chapter summarizes the possibility to use the
ANNs as a new approach and tool for pre-design estimating of construction costs and duration.
2. Definitions
Artificial neural networks (ANNs) have some different names which are: 1) connectionist models; 2)
parallel distributed processing models; 3) neuromorphic systems; and 4) neural computing. They can be
defined by any or a combination among the followings which also shows their properties.
1) ANNs models are composed of many non-linear computational elements, operating in parallel
and arranged in patterns reminiscent of biological neural nets (Lippmann, 1987).
2) ANNs are paralleled, distributed information processing structure consisting of processing
elements which can possess a local memory and can carry out localized information processing
operations. They are interconnected via unidirectional signal channels called connections. Each
processing element has a single output connection that branched or fans out into as many
collateral connections as desired (Nielsen, 1989).
3) ANNs are types of information processing system whose architectures are inspired by the
structure of human biological neural systems (Caudill and Butler, 1990).
4) Neural networks concentrate on machine learning which was based on the concept of self-
adjustment of internal control parameters. The artificial neural network environment consist of
five primary components; learning domain, neural nets, learning strategies, learning process,
and analysis process (Adeli, 1992).
5) Artificial neural net is a kind of machine learning. It is a computational procedure, and
composed of simple elementary functions such as summation and multiplication (Arciszewski
and Ziarko, 1992).
6) ANNs are information processing technology inspired by studies of the brain and nervous
system. They composed of a collection of neurons (or nodes or processing elements, or units)
which are grouped in layers. They accept several inputs, perform a series of operations on
them, and produce one or more outputs. They are similar to a subroutine that works best in
classifying, modelling and forecasting (Klimasauskas, 1993).
7) ANNs are collections of simple computational elements called neurons that are interconnected
(Berry and Trigueros, 1993).
8) ANNs are models that emulate a biological neural network. They compose of artificial neurons
or neurons which are the processing elements-PEs. They are information processing
ANNs by Sdhabhon Bhokha, July 9, 2005 - 1
technologies inspired by studies of the brain and nervous system. They are implementation of
software simulation of massively parallel process involving processing elements
interconnected in a network architecture (Medsker et al., 1993),
9) ANNs are composition of neurons or processing elements, and connections that are organized
in layers (Salchenberger, et al., 1993).
10) ANNs are connectionist systems that have an ability to learn and generalize from examples, to
provide meaningful solutions to problem even when input data contain errors or are
incomplete. They can adapt solution overtime to compensate for changing circumstances. They
process information rapidly and also transfer readily between computing systems (Flood and
Kartam, 1994).
11) ANNs are computational devices constructed from a large number of parallel processing
devices. Individually, the neurons perform trivial functions, but collectively, they are capable
of solving very complicated problems. In other words, they are capable of learning from
example, can infer solutions to problems beyond those to which they are exposed during
training. They can provide meaningful answers even when the data to be processed include
errors or are incomplete. They can process information extremely rapidly (Gagarin et al.,
1994).
12) ANNs are AI software technology that represents objects or pieces of information as nodes and
expresses relationships between them as links to provide a powerful and flexible way of
representing knowledge (Paulson, 1995).
13) ANNs are computational models composed of many non-linear processing elements arranged
in patterned similar to biological neuron networks. Typically, they have an activation value
associated with each node and a weight value associated with each connection. An activation
function governs the firing of nodes and the propagation of data through networks connections
in massive parallelism. The networks can also be trained with examples through connection
weight adjustments (Tan et al., 1996).
This research uses the name "neural networks" for the models, and the term "node" rather than neuron
or neural cell. However, the neural networks and the related terms conform to all the definitions given
above.
Dendrite
Dendrite
Neuron No.2
Axon
U [1][1]
S[ n][1] 1 1 V [1][1]
1 Om1 Tm1
V [1][ k ]
U [ 2 ][1] U [i ][1]
S[ n][ 2] 2
U [ 2 ][ j ]
U [1][ j ]
V [1][ k ]
k Omk Tmk
S[ n][i ] i j V [ j ][ k ]
U [i ][ j ] f ( N+bk )
f ( N+b j ) N+bk
i
1
N ij = s U
1
i ij , b j = bias at j N+b j
0.5 f ( N+b.. )
j
N jk = s j V jk , bk = bias at k N+b..
1
0
Sigmoid transformation function
ANN technology is a branch of artificial intelligence (AI) that attempts to achieve human brain-like
capability (Lippmann, 1987; Caudill and Butler, 1990; Klimasauskas, 1993; Medsker et al. 1993).
Various kinds of the ANN structure are based on biological nervous system which can exhibit a
surprising number of the brain's characteristics, e.g. learn from experience, generalize from previous
examples to new problems by inferring solutions to problems beyond those to which they are exposed
during training. They can provide meaningful answers even when the data to be processed include
errors or are incomplete (Karunasekera, 1992; Hawlet et al., 1993; Mesker et al., 1993; Chao and
Skibniewski, 1994; Flood and Kartam, 1994a, and 1994b; Gagarin et al., 1994). They can process
information extremely rapidly when applied to solve real world problems. ANNs have been mobilized
for building neuro-computing architectures in physical hardware that can think and act intelligently like
human beings. ANNs can be built either by developing a neuro-computer called machine or neuro-
software languages called programs (Forsyth, 1992; Mesker et al., 1993; Adeli, 1996).
Hopfield net is a kind of network used with binary inputs. This is most appropriate when exact binary
representation are possible as black-and-white images, yes-and-no answers or on-and-off switch. In
1982, Hopfield designed a neural network that revived the technology, bringing it out of the neural dark
age of the 1970s (Chester. 1993). He devised an array of neurons that were fully interconnected with
each neuron feeding its output to all others. The concept is that all the neurons will transmit signals
back and forth to each other in a closed feedback loop until their states became stable. This concept
does not make use of the feed forward mechanism of adjusting synaptic input weights of nodes in order
to tune the outputs of those nodes as those presented in the perceptron. Instead, a Hopfield net makes
feedback the central feature of the network.
4. ANNs Modeling
The ANNs modeling can be explained as follows.
Yeh et al. (1993) outlined four criteria for selecting attributes and training examples. First,
availability of attributes; by which attributes should be clearly observable without sophisticated
experience, expensive cost, and long-time period. Second, unnecessary or insufficient
condition of attributes which reduce the classification reliability must be avoided. Third, a
good training set should contain common, unusual and rare cases, and such kind of training set
cannot be obtained by random sampling from the problem domain. Fourth, the more training
examples, the better learning results will be obtained.
The class variables use binary representation. One binary output can represent the black-and-
white images, yes-and-no answers or on-and-off switch (Smith, 1993; Kireetoh, 1995).
Pezeshk et al. (1996) used single binary output in a different way, e.g. zero and one to
represent clay and sand.
For multiple outputs, the value 1 indicates that the object or event belongs to the class
represented by that node, while the value 0 indicates that it does not. The number of nodes
would not be equal to the number of classes (Smith, 1993). There may be one less node than
there are classes. All the classes but one is represented by turning on the appropriate node
and the remaining class is represented by not turning on any node. This can reduce
computational time. On the other hand, Chau et al. (1997) assigned the attributes of qualitative
class dependent variables (outputs) in a different way, e.g. -2 to 2 were used to identify bad,
slightly bad, average, slightly good, and good, respectively.
It is possible to mix both quantitative and class variables among the inputs to a single network
(Smith, 1993). Such a mixture however, does raise an issue for the algorithmic implementation
of the mathematics. Another alternative is to binarize the quantitative variable, represent it by
ANNs by Sdhabhon Bhokha, July 9, 2005 - 5
using binary input nodes. There are two major problems with this approach, i.e. discrimination
and generalization. This representation makes it impossible for the network to discriminate
between examples whose values are within the sub-range of the same node. No binary
representation of a quantity can completely resolve the problems of discrimination and
generalization. However, these problems can be reduced by using "ensemble coding", whereby
several nodes are turned on. Some are to represent broad ranges of values and help the network
generalize while the others represent narrow ranges of values and help it discriminate.
Too few hidden nodes (or too small networks) for a given problem will cause back-
propagation not to converge to a solution (Karunasekera, 1992). However, many hidden nodes
cause a much longer learning period. At some point, increasing the number of hidden nodes
does not greatly increase the ability of the neural network to classify (William, 1993). On the
other hand, too many units on a layer can make a network to become over specific, particularly
on the extreme case where the number of units on the first processing layer is equal to the
number of examples in the training set (Rumelhart, 1988). Too many hidden nodes can overfit,
such that the network can model the accidental structure of the noise in the sample as well as
the inherent structure of the target function (Smith, 1993). Therefore, minimum sized network
which uses as few hidden units as possible is important for efficient classification and good
generalization.(Khan et al., 1993). Berke and Hajela (1991) suggested that the number of
hidden nodes should be between the average and the sum of nodes on the input and output
layers. Rogers and Ramarsh (1992) suggested that a good initial guess for hidden nodes is to
take the sum of nodes on the input and output layers. Soemardi (1996) suggested that the
number of hidden nodes should be 75% of the of input nodes. Thus, experience shows that the
number of hidden nodes have a maximum limit of the sum of the input and output nodes but
the minimum could be either 75% of the input nodes or the average of the input and output
nodes.
Usually, the micro-computer with single processor, called Von Neumann computer is used to
train and test networks. In fact, the biological neural system architecture is completely different
from the Von Neumann architecture. This difference significantly affects the type of functions
each computational model can best perform. Table 1 compares Von Neumann computer with a
biological neural system. Figure 3 shows some common forms of transformation functions.
Table 1 Comparison between Von Neumann Computer and Biological Neural System.
Von Neumann Computer Biological Neural System
Rule - based Rule-less or Example-based
Symbolic Distributed
Serial Parallel
Discrete Continuous
Problem solving Pattern recognition
Psychology Bio-physiology
Cognitive Behavioral
Structural Functional
Logical Intuitive
Domain specific Domain free
Need rules Find rules
Much programming need Little programming need
Difficult to maintain Easy to maintain
No default tolerant Default tolerant
Need human expert Need only data base
Rigid logic Need only data base
Require re-programming Adaptive system
0 N 0 N 0 N
-1 -1 -1
1
1
Logistic: f ( N) =
1+e- ( N - )
f(N)
1 - e-N eN - e-N
Sine wave: g( N) = N Hyperbolic tangent :tanh( N) = N N
1+e e +e
20 10 0 10 20 N
b. Continuous function
Learning rate, 1 is the constant of proportionality which provides dynamic access to the rate
at which weights may be changed. A high learning rate corresponds to rapid learning which
may push the training towards a local minimum or cause oscillation. In turn, when applying
small learning rates, the time to reach a global minimum will be considerably increased (Khan
et al., 1993). Learning rates for each layer of the same network can be different.
The remedy for problems of choosing learning rate is to apply a momentum factor, which is
multiplied by the previous weight change so that while the learning rate is controlled the
changes are still rapid (Khan et al., 1993). The role of the momentum term, is to smooth out
ANNs by Sdhabhon Bhokha, July 9, 2005 - 8
the weight changes, which helps to protect network learning from oscillation (Anderson et al.,
1993). A rule of thumb is that the learning rate for the last hidden layer should be twice that of
the output layer. If there were no connections that jump layers, the learning rate for each prior
hidden layer should be twice that of the prior hidden layer (Klimasauskas, 1993).
5. Training
5.1 Definitions
The term "training" or "learning" can be one of, or a combination of the following definitions:
1) Training means a process whereby error is used to modify the weights so that the
network gives a more correct answer the next time (Klimasauskas, 1993).
2) Learning is a mechanical process which may be decision trees, called explanation
trees. It is used for providing decision rules (Adeli and Yeh, 1990).
3) Learning is the process whereby the ANN learns from its mistake. It usually involves
three tasks: 1) computes outputs; 2) compare outputs with desired outputs; and 3)
adjusts the weights and repeats the process (Medsker et al., 1993).
In this research, the two terms "training" and "learning" are used interchangeably. Training (or
learning) is the process by which the weights and biases are initialized randomly. It deals with
splitting the samples prior to feeding them to the networks. These also include the algorithm
used for minimizing the system error, and criteria for stopping training.
There are two categories of training (Lippmann, 1987; Smith, 1993; Kireetoh, 1995). First,
supervised training requires pairs of input vector with a target vector representing the desired
output, called training pairs. When the input vector is applied, the output is then, calculated,
and compared to the corresponding target vector. The difference is fed back through the
network. The weights are adjusted according to an algorithm that tends to minimize the error.
Second, unsupervised training has been introduced by Kohonen (1984) and is a far more
plausible model of training in biological systems. Target vector is not required for the outputs.
Therefore, no comparison to predetermined ideal response. The training process extracts the
statistical properties of the training set and groups similar vectors into classes, applying a
vector from a given class to the input will produce a specific output vector. There is no way to
determine prior to training which specific output pattern will be produced by a given input
vector class. Hence, the outputs of such a network must generally be transformed into a
comprehensible form subsequent to the training process.
Smith (1993) summarized five ways in which the power of back-propagation can deliver
significant benefits as follows.
1) Using back-propagation may reduce the cost of building the model as it allows the
user to substitute machine time or computer time for person time.
2) Back-propagation may produce a better model if two conditions are met. First, the
form of the relationship between inputs and desired outputs shall be more complex
than the form of the function that is imposed on the model by the conventional tool.
Second, the sample is large enough in order to permit it to find the relationships
underlying the noise in the data.
3) Using back-propagation provides assurance that the model is as good as it can be since
the cost of finding out or time required for knowing the complexity of the problem
may be prohibitive.
4) Back propagation provides an opportunity to build a single model with multiple
outputs that is not possible for conventional techniques.
5) There are advanced forms of back-propagation with capabilities that are not found in
conventional methods.
Despite the universal approximation properties, back-propagation networks suffer from four
main problems. The first problem is that network structuring is a versatile, intuitive, and highly
solution-dependent trial-and-error task. The second is that the algorithm is slow in training and
convergence is very sensitive to the initial set of weights. The third is that training can be
trapped in local minima or paralyse. The fourth is that the design of an optimum network
configuration for a given problem is a non-guided or trial-and-error process that does not
guarantee adequate generalization. Some simple techniques and heuristic to improve back-
propagation performance are: 1) divide the problem into several sub-problems and conquer
them with appropriate techniques; 2) use simple ANNs with sufficient flexibility; and 3) use
Expert System to build explanation facility for the ANNs or vice versa (Hegazy and Moselhi,
1994).
However, the over-trained network would occasionally result in overfitting. Overfitting means
the network can converge and yield a minimum or desired errors in training samples but it
cannot generalize well when validated with testing sample. Smith (1993) suggested to spot
over-fitting by watching the error on the test sample. The weights that produce the lowest error
on the test sample would be used for the model.
6. Samples
Samples mean the number of known or pre-determined inputs and outputs which are used for training
and testing networks. Yeh et al. (1993) suggested three possible sources of samples: 1) expert's
questionnaire; 2) historical records; and 3) simulations. Samples are always divided into two set;
training and testing. The training set is a portion of samples which is extracted from a set of the whole
samples. The training set is used to derive the classification algorithm while test set is the remaining
patterns which are used to test the classification algorithm. The training set should be chosen so that it
represents equally the likelihood of each outcome. Carpenter and Barthelemy, (1992) stated that it is
not exactly true that neural networks can be trained with fewer training pairs than other types of
approximations. Eventhough the network approximation can match the exact function at the design
point, it would not guarantee that a unique approximation over the region of interest would be achieved.
The term "underdetermined neural networks", means networks having the number of training pairs
fewer than the number of weights and biases associated with the network. This leads to inadequate
approximation. Such networks can be trained to exactly duplicate the exact function at the design
points. However, approximations thus obtained are not unique. Klimasauskas (1993) recommended that
at least five examples should be provided for each weight in training the network. By contrast, Bahrami
(1994) suggested that the number of training samples should be approximately ten times the number of
weights.
Smith (1993) said that samples for training and testing may be limited by the computer's memory or
storage. If all the data is held in memory simultaneously during training, the size of computer's memory
may be a significant constraint. On the other hand, using disk storage permits a much larger sample but
is much slower because the same data have to be read from disk, epoch after epoch. Increasing the
sample size would not exactly increase the training time since larger samples require fewer training
epochs. The basic rule, therefore, is to use the largest samples available, so long as they fit in
computer's memory or storage. In this study, the input and output matrices are embedded in the
computer's memory or program as they would not significantly affect the computer time.
7.1 Definitions
Another process in neural network modelling is called testing or classifying. It can be one of or
a combination of the following definitions.
1) Testing is to determine how well a network has captured the nature of a function. It is
to validate the network on additional samples that are not used in training the network,
called testing set. The network which can yield the best performance on the validation
samples would be the best accurate model among the other networks which are trained
all the way to converge on the training samples (Smith, 1993).
2) Testing is a phase to examine the performance of the network by using the derived
weights. It is to measure the ability of the network to classify the testing samples
correctly (Medsker et al., 1993).
3) Testing is a process by which the testing samples are used to determine how well the
network performs on data it has not seen before during training. A properly built and
trained network will exhibit similar levels of performance on both the training and
testing sets. If performance differs widely, appropriate corrective action should be
taken to the architecture, composition, or size of the training and testing sets
(Klimasauskas, 1993).
Training process used in this research conforms to all the definitions given above. It deals with
feeding the remaining samples to the trained networks to validate their generalization
capability. Among the number of successfully trained networks, the best network may yield
minimum error when it is validated by the test samples.
Another method uses all samples as the learning and testing sets. All the independent and
dependent variables are used in the training of the network. When the network error reaches
the threshold value, the network is considered trained. All the independent variables are given
to the trained network again to test its predictive capabilities. The output produced by the
trained network is compared against the actual independent variables. Similarly, the error of
the system is calculated, and the result is used to indicate the predictiveness of the network.
Both methods have advantages and disadvantages.
The former method needs considerable numbers of training samples such that effective
learning could be achieved. Possibly, splitting the data into two sets may not be truly random
and can introduce biases in the data which will distort the model. The latter method uses all
samples to train the network which would significantly improve the learning process but the
network would memorize relationships rather than truly developing an analytical model.
This dissertation splitted the samples into two equal sets. One is used for training while the
other is used for testing the networks.
8. Outputs
Representing the dependent variable in a several class problem as a single quantitative output is
straightforward and understandable. The computational time will be reduced when there is only one
output node. The single output is easier to interpret and apply. A dependent variable could be less
variable, indicating what category an example belongs to, i.e. a binary representation can also be used
for dependent class variable. There is an intuitive appeal to using the value 1 and 0 to represent on and
off.
When logistic sigmoid function is used, the actual outputs produced by a network are continuous-
valued quantities. There ranges are limited by the bounds of the function, i.e. 0 and 1 are not the values
the output nodes can actually produce. They can only approach these bounds. If the network is trained
with these target values, it would continue pushing its weights toward extreme values until training is
stopped. This is difficult to control over-fitting.
In learning situations, the values of 0.9 and 0.1, are sometimes used for specifying binary target output
values and hence, the outputs of a set of data should be scaled to within this range (Khan et al., 1993;
Smith, 1993). The outputs could also be a class variable. They would represent the dependent variables
according to the way the target outputs represent them. Representing the dependent variables is easier
than the representation of the independent variables as inputs as there are few choices available. They
would not greatly affect the accuracy of the network or training time (Smith, 1993). A single network
can be designed to predict more than one output without any special problem. The dependent variables
can include both quantitative and class variables (Smith, 1993).
9. Advantages of ANNs
ANNs have interesting properties and characteristics. They can perform higher level human tasks such
as decision making, planning, scheduling, natural language generation and understanding, visual pattern
recognition, diagnosis, classification problem solving and designing (Adeli, 1981). They are based on
pattern recognition (Hawley et al., 1993; Kimoto et al., 1993) which are effectively applied for: 1)
classifying (Hawley et al. (1993; Klimasauskas, 1993); 2) associative memory; and 3) clustering.
Klimasauskas (1993) remarked that ANNs can also improve the performance of several existing
technologies in modelling, forecasting, and signal processing. ANNs offer certain advantages over the
traditional rule-based system, i.e. conventional programming, and knowledge-based expert systems.
Figure 4.5 shows the characterization of neural networks and rule-based systems.
Chu et al. (1990) proposed an improved pocket learning algorithm of neural network to
increase the learning speed of back-propagation learning and pocket algorithm. They pointed
out two main problems for which solutions have not yet been determined which were: 1) how
many hidden cells were needed in the neural network to learn successfully; and 2) how to
ANNs by Sdhabhon Bhokha, July 9, 2005 - 15
shorten the time of learning.
Karunanithi et al. (1994) used an adaptive neural network with a constructive algorithm called
cascade-correlation. This dynamic expansion of the network continues until the problem is
successfully learned. Thus, the cascade-correlation algorithm automatically constructs a
suitable network architecture for a given problem. Further, it has more consistency in solving
problems and provides faster learning than the standard back-propagation learning algorithm.
Adeli (1995) stated that the convergence speed of neural network learning models was slow,
perhaps several hours or even days of computer times of conventional serial workstation. The
number of iterations for learning an example is often in the order of thousands. The
convergence rate is highly dependent on the choice of the values of learning and momentum
ratios encountered in the algorithm. The proper values of these two parameters are dependent
on the type of problem.
Bahrami (1995) pointed out the different methods of reducing the number of weights in a
network while at the same time retaining the capability of solving the problem, e.g. pruning
and weight sharing. Weight sharing makes use of local connections with identical weights such
that individual nodes process only a local region of the input.
Adeli (1996) commented on the increase in powerful computers combined with novel
computing paradigms in next century. They will have profound impact on architectural,
engineering, and construction through automation of complex and large-scale projects. He
focused on several emerging computing paradigms, e.g. neuro-computing, parallel processing,
and distributed computing. As true intelligence is associated with learning, most neural
networks research has been done in the area of machine learning. He stated that the multi-
processing capabilities are presently limited to high-performance parallel machines and
supercomputers. He hoped that they would be widely available on desktop machine by the year
2,000 which would be helpful for parallel processing and distributed computing.
Li (1996) discussed research issues in developing the explanatory functionality for ANN based
cost models. He described a prototype model for ANN based cost estimation system. The
system can generate rules and inference tracks to form explanation. He explained the
possibility of developing a rule-extraction algorithm which could be added to the ANN based
cost model. The rule-extraction algorithm is machine learning which will exploit and extract
rules from the inter-relationships of a trained ANN based cost model.
ANNs can be applied in pattern recognition. Chen and Wang (1990) used multi-layered perceptron to
identify and correct wrong spelling. Berry and Trigueiros (1993) applied ANNs to the extraction of
knowledge from accounting reports which is a kind of classification study. Kireetoh (1995) presented
an application of the BP network with one hidden layer to recognize and identify hand-writing of Thai
numeric (0 to 9).
ANNs can be applied to diagnosing problems. Yeh et al. (1993) built a combined knowledge-based
expert system (KBES) with an artificial neural network (ANN) for diagnosing piles. Szewczyk and
Hajela (1994) used neural networks to detect damage in structural systems.
A number of researches use ANNs in decision making and optimization. Tseng et al. (1990) used
Hopfield network to solve constrained task allocation problems. Wang and Tsai (1990) used Hopfield
network with a time-varying energy function to solve travelling salesman problem. Kamarthi et al.
(1992) used a two-layered back-propagation network for selecting of vertical formwork systems for a
given building site. Murtaza and Deborah (1994) used neural network with Kihonen algorithm, i.e.
unsupervised learning, for decision making on construction modularization. Soemardi (1996) used two
fuzzy neural networks for solving group decision making in selecting a wall system under multiple
criteria in attempt to minimize the efforts to bring all the parties involved.
ANNs can also be applied for predicting or forecasting. Karunasekera (1992) used simulated annealing
neural network to classify remotely sensed data. Siang (1992) applied back-propagation method in
forecasting monthly water quality, i.e. temperature, pH, conductivity and water flow. Anderson et al.
(1993) used back propagation network to predict bi-linear moment-rotation characteristics for minor-
axis connections. Hawley et al. (1993) applied ANNs for financial decision making. Kimoto et al.
(1993) used modular neural networks to model stock market prediction. Raghupathi et al. (1993)
applied ANN to predict bankruptcy. Salchenberger et al. (1993) applied back-propagation network to
predict thrift failures. Surkan and Singleton (1993) used multi-layered neural network to improve bond
rating. Williams (1993) developed back-propagation networks for predicting one-month and six-month
changes in construction cost index. Wu and Lim (1993) applied multilayered feed forward neural
networks to correlate the maximum scour depth at a spur dike. Chao and Skibniewski (1994) used
neural network and observation-data-based approach to estimate construction operation productivity.
Gagarin et al. (1994) applied neural networks to the problem of determining truck attributes, i.e.
velocity, axle spacings and axle loads, purely from strain-response taken from the structure over which
the truck was travelling. Hegazy and Moselhi (1994) used back-propagation artificial neural networks
to develop an optimum markup estimation model that derived solutions to new bid situations. Li (1996)
ANNs by Sdhabhon Bhokha, July 9, 2005 - 17
used artificial neural network to model construction cost estimation. McKim et al. (1996) used neural
network to predict effectiveness of construction firm. Pezeshk et al. (1996) used neural network to
interpret the bore hole geographical and formation well logs. Vaziri (1996) used neural network model
to predict monthly carbondioxide concentration. Elazouni et al. (1997) used ANNs to estimate the
required resources of concrete silo walls at conceptual design stage.
ANNs can be applied for design, planning, and management. Mo (1993) classified automated systems
in structural design into four basic types: 1) traditional computer-aided design; 2) databases
management systems; 3) expert systems; and 4) neural networks. Mawdesley and Carr (1993)
investigated the possibility of using artificial neural networks to produce Project Planning Networks
(PPNs) to substitute the shortage of skilled planners and the ever increasing complexity of projects.
Chua et al. (1997) used ANNs to identify the key management factors affecting budget performance in
a project.
13 Summary
ANNs have been reviewed. Accordingly, a number of researches clearly show the potential, capability,
advantages, and application, particularly in optimizing, decision making, and forecasting. ANNs' have
their structure like the cost models described in the last chapter, i.e. they consist of input, output, and
transformation of data. Moreover, ANNs are distribution free models that need no prior knowledge
about the statistical distribution. They out-perform the currently used empirically and statistically based
techniques. These conform to the needs for developing the approach and tool for pre-design
construction cost and duration estimating in the hope that the ANNs can be uses in parallel to or
partially replace the heuristics or the experience of estimating experts. However, there are some minor
problems of ANNs' modelling of which we must be aware. Therefore, the processes must be done
carefully. Good ANN models should not be time-consuming and costly. They should need only major
inputs, simple topology and learning paradigm. The outputs should be easy to understand or interpret as
far as the explanation cannot be provided. Methodology and development of the ANNs will be
discussed in the following chapters.
References
Adeli, H., 1988. Expert Systems in Construction and Structural Engineering, Chapman and Hall, London.
Adeli, H., 1992. Computer-aided Engineering in The 1990's. The International Journal of Construction
Information Technology, Proc. Paper, 1, 1: 1-10.
Adeli, H., 1995. Knowledge Engineering. Archives of Computational Methods in Engineering; State of the Art
Reviews, 2, 4: 51-68.
Adeli, H., 1996. Computing for the Year 2000. In Lye, H.K, Sang, C.Y., and Adeli, H., Editors. Computing &
Information Technology for Architecture, Engineering & Construction, Proc.: 1-5.
Adeli, H., and Yeh, C., 1990. Explanation-Based Machine Learning in Engineering Design, Engineering
Applications of Artificial Intelligence, 3, 2; 127-137.
Adeli, H., and Hung, S.L., 1995. Machine Learning; Neural Networks, Genetic Algorithms, and Fuzzy Systems,
John Wiley & Son Inc., New York.
Aleksander I., and Morton, H., 1990. An Introduction to Neural Computing, 1st Ed., Chapman and Hall, London.
Al-Tabtabai, H., Kartam, N., Flood, I, and Alex, A.P., 1997. Expert Judgment in Forecasting Construction Project
Completion, Engineering Construction and Architectural Management, 4, 4: 271-293.
Anderson, D., Hines, E.L., Arthur, S.J., and Eiap, E.L., 1993. Application of Artificial Neural Networks to
Prediction of Minor Axis Steel Connections. In Topping, B.H.V., and Khan, A.I., Editors, Neural Networks and
Combinatorial Optimization in Civil and Structural Engineering, Civil-Comp Press: 31-37.
Arciszewski, T., and Ziarko, W., 1992. Machine Learning in knowledge Acquisition. In Arciszewski, T., and
Recommended Readings
http://palaeo-electronica.org/2000_2/neural/neural.htm
http://www.stowa-nn.ihe.nl/ANN.htm
http://www.akri.org/cognition/machmemmod.htm
http://www.alanturing.net/turing_archive/pages/Reference%20Articles/what_is_AI/What%20is%20AI10.ht
ml
http://www.slais.ubc.ca/courses/libr500/02-03-wt1/www/K_MARTIN/1st_page.htm
http://www.aiexplained.com/apps/networks.html