You are on page 1of 109

Artificial Neural Networks

Dr. Anupam Shukla, Professor


ABV-IIITM, Gwalior

Biological (MOTOR) Neuron

Biological (MOTOR) Neuron

Neurons and Synapses

The basic computational unit in the nervous system is the nerve cell, or
neuron. A neuron has:
Dendrites (inputs)
Cell body
Axon (output)

Neurons and Synapses


A neuron receives input from other neurons (typically many thousands).
Inputs sum (approximately). Once input exceeds a critical level, the
neuron discharges a spike - an electrical pulse that travels from the body,
down the axon, to the next neuron(s) (or other receptors). This spiking
event is also called depolarization, and is followed by a refractory period,
during which the neuron is unable to fire.
The axon endings (Output Zone) almost touch the dendrites or cell body
of the next neuron. Transmission of an electrical signal from one neuron
to the next is effected by neurotransmitters, chemicals which are released
from the first neuron and which bind to receptors in the second. This link
is called a synapse. The extent to which the signal from one neuron is
passed on to the next depends on many factors, e.g. the amount of
neurotransmitter available, the number and arrangement of receptors,
amount of neurotransmitter reabsorbed, etc.

The Biological Neuron

The brain is a collection of about 10 billion interconnected neurons. Each


neuron is a cell that uses biochemical reactions to receive, process and transmit
information.
Each terminal button is connected to other neurons across a small gap called a
synapse.

Slide 6

The Biological Neuron

A neuron's dendritic tree is connected to a thousand neighbouring


neurons. When one of those neurons fire, a positive or negative charge
is received by one of the dendrites. The strengths of all the received
charges are added together through the processes of spatial and
temporal summation.

Neurons vs units (2)


Real neuron is far away from our simplified model unit

Chemistry, biochemistry, quantumness.

Biological neuron

A neuron has
n
n

A branching input (dendrites)


A branching output (the axon)

The information circulates from the dendrites to the axon via the
cell body
Axon connects to dendrites via synapses
n
n

Synapses vary in strength


Synapses may be excitatory or inhibitory

Biological inspiration
Dendrites

Soma (cell body)

Axon

10

Biological inspiration

axon

dendrites

synapses
The information transmission happens at the synapses.

11

Biological inspiration

12

Biological inspiration

13

Biological inspiration

Presynaptic Neuron

Postsynaptic Neuron

Interconnections in Brain

15

Brain Computation
The human brain contains about 10 billion nerve cells, or
neurons. On average, each neuron is connected to other
neurons through approximately 10,000 synapses.

16

Comparison between brain verses computer


Brain

ANN

Speed

Few ms.

Few nano sec. massive ||el


processing

Size and complexity

1011 neurons & 1015


interconnections

Depends on designer

Storage capacity

Stores information in its


interconnection or in synapse.
No Loss of memory

Contiguous memory locations


loss of memory may happen
sometimes.

Tolerance

Has fault tolerance

No fault tolerance Inf gets


disrupted when
interconnections are
disconnected

Control mechanism

Complicated involves
chemicals in biological
neuron

Simpler in ANN

17

NNs vs Computers
Digital Computers

Deductive Reasoning. We apply


known rules to input data to produce
output.

Computation is centralized,
synchronous, and serial.

Memory is packetted, literally stored,


and location addressable.

Not fault tolerant. One transistor goes


and it no longer works.

Exact.

Static connectivity.

Applicable if well defined rules with


precise input data.

Neural Networks

Inductive Reasoning. Given input and


output data (training examples), we
construct the rules.
Computation is collective,
asynchronous, and parallel.
Memory is distributed, internalized,
short term and content addressable.
Fault tolerant, redundancy, and
sharing of responsibilities.
Inexact.
Dynamic connectivity.
Applicable if rules are unknown or
complicated, or if data are noisy or
partial.
Slide 18

Basic models of ANN

Basic Models of ANN

Interconnections

Learning rules

Activation function

19

Types of Learning
Supervised learning : In this kind of learning, both the inputs and the
outputs are well determined and supplied to the training algorithm.
Hence whenever an input is applied, we can calculate the error. We
try to adjust the weights in such a way that this error is reduced.

Types of Learning
Unsupervised learning: In this type of learning, the target outputs are
unknown. The inputs are applied, and the system is adjusted based on these
inputs only. Either the supporting weights of the problem are added or the
dissipative nodes are decreased. In either case, the system changes according
to the inputs.

Types of Learning
Reinforcement learning: This type of learning is based on the reinforcement
process. In this system, the input is applied. Based on the output, the system
either gives some reward to the network or punishes the network. In this
learning technique, the system tries to maximize the rewards and minimize
the punishment. The basic block diagram is given

Figure: The basic block diagram of reinforcement learning.

ASSOCIATION OF BIOLOGICAL NET WITH


ARTIFICIAL NET

23

Models of Neuron
McCulloch-Pitts Model:
In McCulloch-Pitts model the activation (x) is given by a weighted sum of its
M input values (ai) & a bias term (). The output signal (s) is typically a
nonlinear function f (x) of the activation value x. The following equation
describe the operation of an MP model:

M
x= wi ai -

Activation:

j=1

Output Signal:
a1

w1

a2

w2

aM

input

s=f (x)

wM
weights
(fixed)

M
wi ai
i=1

s= f (x)

Output function f(.)


Summing part
24

Models of Neuron (contd)


Three commonly used linear threshold function was used in the original
MP model.
In this model a binary output function is used with the following logic:
f (x)
= 1,x>0
= 0,x<=0
A single input & a single output MP neuron with proper weight &
threshold gives an output a unit time later.
In the MP model the weights are fixed. Hence a network using this
model does not have the capability of learning.

25

Models of Neuron (contd)


Perceptron:
The Rosenblatts perceptron model for an artificial neuron consists of
outputs from sensory units to a fixed set of association units, the outputs of
which are fed to an MP neuron.
The association units perform predetermined manipulations on their inputs.
The main deviation from the MP model is that learning is incorporated in the
operation of the unit.
q

Sensory units

Association
units
a
A1
1

A2

AM

weights
w1

a2

w2

aM

wM

(adjustable)

s= f (x)

Summing unit Output unit

Models of Neuron (contd)


The desired or target output (b) is compared with the actual binary output
(s), & the error () is used to adjust the weights.
The following equation describe the operation of the perceptron model of
a neuron:
M

Activation:

x= wi ai -
j=1

Output Signal:
Error:
Weight change:

s=f (x)
= b - s
wi = ai

where is the learning rate parameter.


27

Models of Neuron (contd)


Adaline:
ADAptive LINear Element (ADALINE) is a computing model proposed by
Widrow & is shown below.
The main distinction between the Rosenblatts perceptron model & the
Widrows Adaline model is that, in the adaline the analog activation value (x)
is compared with the target output (b).
q

a1

w1

a2

w2

aM

wM

input

weights

M
wi ai
i=1

Activation value

Output Signal

s= f (x)

Output function f(.)


Summing part

Models of Neuron (contd)


In other words, the output is a linear function of the activation value (x).
The eqns that describe the operation of an Adaline are as follows:
M

Activation:

x= wi ai -
j=1

Output Signal:
Error:

s=f (x)
= b s = b - x

Weight change:

wi = ai

where is the learning rate parameter


This weight update rule minimizes the mean squared error , averaged
over all inputs. Hence it is called Least Mean Squared (LMS) error
learning law.
This law is derived using the negative gradient of the error surface in the
weight space. Hence it is also known as a gradient descent algorithm.

29

Transfer functions (contd)


When the threshold T is 0, the step function is called signum.

30

Transfer functions (contd)


The sigmoid

The sigmoid transfer function produces a continuous value in the range 0


to 1

The parameter gain affects the slope of the function around zero

31

Transfer functions (contd)


The hyperbolic tangent
A variant of the sigmoid transfer function

outputi

e activationi e activationi
= activationi
e
+ e activationi

Has a shape similar to the sigmoid (like an S), with the difference being that
the value of outputi ranges between 1 and 1.

32

The Neuron Model


Bias

x1

Input
values

x2

w2

!
xm

!
wm
weights

Induced
Field

Summing
function

Activation
function
Output

()

The Neuron Model


Definition : Non linear, parameterized function with restricted output range

y
n 1

y = f w0 + wi xi
i =1

w0

x1

x2

x3

Multilayer feed forward network

35

Feedback network
When outputs are directed back as inputs to same or preceding layer nodes it
results in the formation of feedback networks

36

Single layer Feedforward Network

37

Adaptive Linear Neuron (ADALINE)


In 1959, Bernard Widrow and Marcian Hoff of Stanford developed
models they called ADALINE (Adaptive Linear Neuron) and
MADALINE (Multilayer ADALINE). These models were named for
their use of Multiple ADAptive LINear Elements.
MADALINE was the first neural network to be applied to a real world
problem. It is an adaptive filter which eliminates echoes on Telephone
lines.

Dr R R Janghel

38

ADALINE Model

Dr R R Janghel

39

ADALINE Netwok

Initialize

Training

Thinking

Initialize
n Assign random weights to all links
Training
n Feed-in known inputs in random sequence
n Simulate the network
n Compute error between the input and the output
(Error Function)
n Adjust weights (Learning Function)
n Repeat until total error <
Thinking
n Simulate the network
n Network will respond to any input
n Does not guarantee a correct solution even for
trained inputs
40

Multilayer Perceptron

Output Values

Output Layer

Adjustable
Weights

Input Signals (External Stimuli)

Input Layer
41

Layers in Neural Network


The input layer:
Introduces input values into the network.
No activation function or other processing.
The hidden layer(s):
Performs classification of features.
Two hidden layers are sufficient to solve any problem.
Features imply more layers may be better.
The output layer:
Functionally is just like the hidden layers.
Outputs are passed on to the world outside the neural network.

Dr R R Janghel

42

BACK PROPAGATION Algorithm model

43

Multi-layer Neural Network


Employing Back propagation Algorithm

To illustrate this process let us take an example of three layer neural


network with two inputs and one output, which is shown in the picture
below: 

44

Example (contd)
Each neuron is composed of two units. First unit adds products of weights
coefficients and input signals. The second unit realize nonlinear function,
called neuron activation function. Signal e is adder output signal, and y = f (e)
is output signal of nonlinear element. Signal y is also output signal of neuron.

45

Example (contd)
To teach the neural network we need training data set. The training data set
consists of input signals(x1 and x2 ) assigned with corresponding target (desired
output) z.
Pictures below illustrate how signal is propagating through the network,
Symbols w(xm)n represent weights of connections between network input xm and
neuron n in input layer. Symbols yn represents output signal of neuron n. 

46

Example (contd)

47

Example (contd)
Propagation of signals through the hidden layer. Symbols wmn represent
weights of connections between output of neuron m and input of neuron n in
the next layer. 

48

Example (contd)
Propagation of signals through the output layer. 

In the next algorithm step the output signal of the network y is compared
with the desired output value (the target), which is found in training data
set. The difference is called error signal d of output layer neuron.

49

Example (contd)
It is impossible to compute error signal for internal neurons directly, because
output values of these neurons are unknown. The idea is to propagate error
signal d (computed in single teaching step) back to all neurons, which output
signals were input for discussed neuron. 

50

Example (contd)
The weights' coefficients wmn used to propagate errors back are equal to this
used during computing output value. Only the direction of data flow is
changed.

51

Example (contd)
When the error signal for each neuron is computed, the weights coefficients
of each neuron input node may be modified. In formulas below df(e)/de
represents derivative of neuron activation function (which weights are
modified). 

52

Example (contd)

53

Example (contd)

54

Example (contd)

Coefficient affects network teaching speed. There are a few techniques to


select this parameter. The first method is to start teaching process with large
value of the parameter. While weights coefficients are being established the
parameter is being decreased gradually. The second, more complicated,
method starts teaching with small parameter value. During the teaching
process the parameter is being increased when the teaching is advanced and
then decreased again in the final stage. Starting teaching process with low
parameter value enables to determine weights coefficients signs.
55

Back-Propagation Training Algorithm


The back-propagation training algorithm is an iterative
gradient algorithm designed to minimize the mean square error
between the actual output of a multilayer feed-forward
perceptron and the desired output. It requires continuous
differentiable non-linearity's. The following assumes a sigmoid
logistic non-linearity is used where the function f() is
f() =

1 .
1+ e-(-)

56

Architecture of Back-Propagation Algorithm


X1

X2
.
.

Xp

x1

h1

y1

x2

h2

y2

xp3

hm3

yn3

Input layer

Hidden layer

Output layer
57

The BPA Algorithm


Step 1: Initialize weights and offsets
Set all weights and node offsets to small random values
Step 2: Present Input and Desired Outputs
Present a continuous valued input vector x0,x1xP-1 and specify the
desired output d0,d1dN-1. If the net is used as a classifier then all desired
outputs are typically set to zero except for that corresponding to the class
the input is from. That desired output is 1. The input could be new on each
trial or samples from a training set could be presented cyclically until
weights stabilize.

58

The BPA Algorithm (contd)


Step 3: Calculate Actual Outputs
Use the sigmoid nonlinearity from above and formulas to calculate
outputs y0,y1yN-1.
Step 4: Adapt Weights
Use a recursive algorithm staring at the output nodes and
working back to the first hidden layer. Adjust weights by
wij(t+1) = wij(t) + j xi
In this equation wij(t) is the weight from hidden node i or from an
input to node j at time t, xi is either the output of node i or is an input, is
a gain term, and j is an error term for node j. if node j is an output node,
then
j = xj(1-xj) k wjk
59

The BPA Algorithm (contd)


Internal node thresholds are adapted in a similar manner by assuming they are
connection weights on links from auxiliary constant-valued inputs.
Convergence is sometimes faster is a momentum term is added and weight
changes are smoothed by
wij(t+1) = wij(t) + j xi + (wij(t) - wij(t-1)),
where 0 < < 1

Step 5: Repeat steps 2 to 4.

60

Derivation of Back-Propagation Algorithm


x= input vector of input layer for unit k.
h=weight vector of input layer for unit k.
v=input vector of hidden layer for unit j.
g= weight vector of hidden layer for unit j.
y=output vector of output layer.
wij= weight for ith neuron of output layer to jth neuron of hidden layer.
wjk= weight for jth neuron of hidden layer to kth neuron of input layer.

61

Derivation (contd)
Input of hidden layer:
p
hj = wjk * xk ..(1)
k=1
Output of hidden layer:
vj = f(hj)
-h
vj = 1/(1+ e j) ...(2)

62

Derivation (contd)
Input of output layer:
m
gi= wij * vj ..(3)
j=1
Output of output layer:
yi = f(gi)
g
yi = 1/(1+e- i) ..(4)

63

Derivation (contd)
Error function:
n
E(t)= 1/2 (yid - yi )2..(5)
i=1
Weight function:
wij(t+1)= wij(t) + wij(t).....(6)

wij(t) = - E(t) .. ..(7)


wij(t)

64

Derivation (contd)
Updating weights between output layer and hidden layer:
n
E(t) = E(t) * yi
(8)
wij(t) i=1 yi
wij(t)
From equation (4) and (5):
yi = 1/(1+e-gi)
E(t)= 1/2 (yid - yi )2
E(t) = - (yid - yi ) .(9)
yi

65

Derivation (contd)
-g

yi
= (1+e i) -1
wij(t)
wij
yi
= yi
wij(t) gi

gi
wij
m

-g

yi
= (1+e i) -1
wij(t)
gi

( wijvj)
j=1
wij

66

Derivation (contd)
yi
= -1(-e-gi)
wij(t)
(1+e-gi) 2

yi

= 1/(1+e-gi)

(1-yi)

= 1- 1/(1+e-gi)

(1-yi)

= e-gi/(1+e-gi)

vj(10)

yi(1-yi) = e-gi/(1+e-gi)2

67

Derivation (contd)
substituting this value in eq (10)
yi = (1-yi) yi * vj(11)
wij(t)
substituting the value of eq (9) and (11) in eq (8)
E(t) = -(yid - yi ) * (1-yi) yi * vj ....(12)
wij(t)

68

Derivation (contd)
i = -yi(1-yi)(yid yi)---------------------------- (13)
E(t) = - ni=1 i vj------------------------------(14)
wij(t)
where wij(t) = - E(t)
wij(t)
wij(t) = i vj-----------------------------------(15)
Hence updated weight will be
wij(t+1) = wij (t) + i vj-----------------------(16)
69

Derivation (contd)
Updating weights between hidden layer and input layer:
n

E(t) = Ei(t) ----------------------------------(17)


wjk(t) j=1 wjk
= Ei(t) * yi ---------------------------- (18)
yi
wjk(t)
A
B
A= Ei(t) = -(yid yi)------------------(19)
yi
B = yi = yi * gi * vj * hj ---- ---(20)
wjk(t) gi vj hj wjk
D E F G
70

Derivation (contd)
D = yi = yi(1-yi) ----------------------------------(21)
gi
E = gi = wij ----------------------------------------(22)
vj
F = vj = (1 + e-hj)-1 = vj(1-vj)-----------------(23)
hj hj
G = hj = xk -----------------------------------------(24)
wjk

Derivation (contd)
Substituting the values of equation (19),(20),(21),(22), (23), (24) into
equation (18), we get
E(t) = -(yid yi) * yi(1-yi) * vj(1-vj) xk wij
wjk(t)
= -vj(1-vj) xk ni=1 iwij
where i = yi(yid yi)(1-yi)
j = -vj(1-vj) ni=1 iwij
Change in weight between hidden & input layer is:
wij(t) = j xk
Hence updated weight will be
wjk(t+1) = wjk (t) + j xk
72

The error landscape in a multilayer perceptron


For a given pattern p, the error Ep can be plotted against the weights to
give the so called error surface
The error surface is a landscape of hills and valleys, with points of
minimum error corresponding to wells and maximum error found on
peaks.
The generalised delta rule aims to minimise Ep by adjusting weights so
that they correspond to points of lowest error
It follows the method of gradient descent where the changes are made in
the steepest downward direction
All possible solutions are depressions in the error surface, known as
basins of attraction
73

Variable Learning Rate


One solutions for choosing the correct learning rate is to keep the learning
rate as a variable.
A variable learning rate allows a system to be more flexible.
The concept of a variable learning rate is analogous to the frame rate while
watching a movie. Suppose there is an action scene in your movie. You
might consider watching the scene at a lower frame rate so as to get the
complete details. Or you may fast forward through most of the scenes
because they do not appeal you.

Momentum
The training algorithm is an iterative algorithm, which means that at each
step, the algorithm tries to move in such a way that the total error is reduced.
In the error graph the deeper valley corresponds to a lower error, and hence
to a better configuration of the ANN, than the shallower valley. The deeper
valley is known as the global minima, which is the least error in the entire
surface. The shallower valley is called the local minima.

Momentum
The training algorithm is trying to move in such a way that error is
reduced, it keeps following the local minima.
It can easily be seen that if it continues to move in the same direction, it
would eventually attain the global minima.
The momentum keeps pushing the training algorithm to continue moving in
the previous direction, making it possible for the training algorithm to
escape out of the local minima.
The meaning of momentum in this case is analogous to the meaning of
momentum in the physical world.
For example, a ball moving has momentum that keeps it moving in the
same direction.

Stopping Condition
The algorithm stops according to the stopping condition. Normally one or
more of the following criteria are used as stopping conditions:
Time: The algorithm may be stopped when the time taken to
execute exceeds more than a threshold.
Epoch: The algorithm has a specified maximum number of epochs.
Upon exceeding this number, the algorithm may be stopped
Goal: The algorithm may be stopped if the error measured by the
system reduces to more than a specific value. It may not be useful to
continue training after this point.

Stopping Condition
Validating data: If the error on validation data starts increasing, even
if there is a decrease in the testing data, it would be better to stop further
training.
Gradient: Gradient refers to the improvement of the performance or
the lowering of the error in between epochs.

Learning difficulties in multilayer perceptrons - local minima


The MLP may fail to settle into the global minimum of the error surface
and instead find itself in one of the local minima
This is due to the gradient descent strategy followed
A number of alternative approaches can be taken to reduce this
possibility:
Lowering the gain term progressively
n Used to influence rate at which weight changes are made during
training
n Value by default is 1, but it may be gradually reduced to reduce the
rate of change as training progresses

79

Learning difficulties in multilayer perceptrons (contd)


Addition of more nodes for better representation of patterns
n Too few nodes (and consequently not enough weights) can cause failure
of the ANN to learn a pattern
Introduction of a momentum term
n Determines effect of past weight changes on current direction of
movement in weight space
n Momentum term is also a small numerical value in the range 0 -1

Addition of random noise to perturb the ANN out of local minima
n Usually done by adding small random values to weights.
n Takes the net to a different point in the error space hopefully out of a
local minimum

80

Weight v/s Sum of Square Error

81

Flow Diagram Of Back-Propagation Algorithm.

82

Some advantages of ANNs


Able to take incomplete or corrupt data and provide approximate results.
Good at generalisation, that is recognising patterns similar to those learned
during training
Inherent parallelism makes them fault-tolerant loss of a few
interconnections or nodes leaves the system relatively unaffected
Parallelism also makes ANNs fast and efficient for handling large amounts
of data.

83

ANN State-of-the-art overview


Currently neural network systems are available as
n Software simulation on conventional computers - prevalent
n Special purpose hardware that models the parallelism of neurons.
ANN-based systems not likely to replace conventional computing
systems, but they are an established alternative to the symbolic logic
approach to information processing
A new computing paradigm in the form of hybrid intelligent systems has
emerged - often involving ANNs with other intelligent system tools

84

Network

The network was formed by trial and error. After numerous tries, we got the
optimal results by the following network configuration, which we tried changing
only the number of neurons in the hidden layer.

Number of hidden layers: 1


Number of neurons in hidden layer: 20
Activation functions of all neurons: Tan Sigmoid
Training method: train Gradient Descend (traingdm)
Learning rate: 0.1
Momentum: 0.01
Max epochs: 2,500
Goal: 0.01

Note that the goal was not met, even upon reaching the maximum epochs during
training.

Results
The results are further summarized in Table

Results of Time Forecasting Experiment

In this example, we used ANN to solve a real-life problem and got good
results.

Outputs
The systems output was the value of the series at the tth time. We repeated
the experiment for all ts to predict/regenerate the complete sequence
(starting from t = 15, because before that all previous inputs were
unknown).

Figure: The time series forecasting experiment.

Conclusion

The ANN can be seen as an instrumental tool for solving virtually any
real-life problem.
ANNs are definitely the technology of the future.
In real-life systems, however, these play a great role in making a
robust system that gives a very high performance.
Designing a good ANN is more of an art than a technical concept.
Equally important is the selection of inputs and outputs.
Thus we need to think broadly about the feasibility of the selected
input and output parameters before even trying to train the system.
ANNs, if used cautiously, are an excellent means for solving
emerging industrial problems.

ANN applications
The main business application areas of ANNs are:
n Production (36%)
n Information systems (20%)
n Finance (18%)
n Marketing & distribution (14.5%)
n Accounting/Auditing (5%)
n Others (6.5%)

89

ANN applications in Finance


Accounting
Finance
Human Resources
Human Resources
Management
Marketing
Identify tax fraud
Enhance auditing by finding irregularities

90

ANN applications in Finance

Expert System Resets Credit Limits Monthly


n Mimics credit managers decision making
n Scheduler prioritizes tasks: Staff handles delinquent accounts
Benefits
n More specialized and personalized customer service
n Efficient credit department
n Increased sales
n Happy customers
n Happy customer service representatives

91

ANN applications in Finance

Signatures and bank note verifications


Mortgage underwriting
Foreign exchange rate forecasting
Country risk rating
Bankruptcy prediction
Customer credit scoring
Credit card approval and fraud detection
Corporate merger and take over predictions
Currency trading
Stock and commodity selection and trading

92

Application (Continued.. )
Credit card profitability
Forecasting economic turning points
Foreign exchange trading
Bond rating and trading
Pricing initial public offerings
Load approvals
Economic and financial forecasting Risk management
Signature validation

93

Application (Continued.. )

For Credit Approval


Increases loan processor productivity by 25 to 35 % over other
computerized tools
Also detects credit card fraud

94

Application (Continued.. )
The ANN Method

Data from the application and into a database

Database definition

Preprocess applications manually

Neural network trained in advance with many good and bad risk cases

95

Application (Continued.. )
Neural Network Credit Authorizer Construction Process
Step 1: Collect data
Step 2: Separate data into training and test sets
Step 3: Transform data into network inputs
Step 4: Select, train and test network
Step 5: Deploy developed network application

96

Application (Continued.. )
Using ANNs for Bankruptcy Prediction
Concept Phase

Paradigm: Three-layer network, back-propagation


Training data: Small set of well-known financial ratios
Data available on bankruptcy outcomes
Supervised network
Training time not to be a problem

97

Application (Continued.. )

Application Design
Five Input Nodes
X1: Working capital/total assets
X2: Retained earnings/total assets
X3: Earnings before interest and taxes/total assets
X4: Market value of equity/total debt
X5: Sales/total assets
Single Output Node: Final classification for each firm
n Bankruptcy or
n Nonbankruptcy
Development Tool: NeuroShell
98

Application (Continued.. )

Application Design
Development
l Three-layer network with backpropagation (Figure 18.5)
l Continuous valued input
l Single output node: 0 = bankrupt, 1 = not bankrupt

Training
l Data Set: 129 firms
l Training Set: 74 firms; 38 bankrupt, 36 not
l Ratios computed and stored in input files for
n The neural network
n A conventional discriminant analysis program

99

Application (Continued.. )
Application Design
Parameters
l Learning threshold
l Learning rate
l Momentum
Testing
l Two Ways

n
n

Test data set: 27 bankrupt firms, 28 nonbankrupt firms


Comparison with discriminant analysis

The neural network correctly predicted


n
n

81.5 percent bankrupt cases


82.1 percent nonbankrupt cases

100

Application (Continued.. )
Application Design
ANN did better predicting 22 out of the 27 actual cases
Discriminant analysis predicted only 16 correctly
Error Analysis
n Five bankrupt firms misclassified by both methods
n Similar for nonbankrupt firms
Neural network at least as good as conventional
Accuracy of about 80 percent is usually acceptable for neural network
applications
101

Application (Continued.. )
Stock Market Prediction System with Modular Neural 
Networks

Accurate Stock Market Prediction - Complex Problem

Several Mathematical Models - Disappointing Results

Fujitsu and Nikko Securities: TOPIX Buying and Selling Prediction


System

102

Application (Continued.. )
Stock Market Prediction System with Modular Neural 
Networks
Input: Several technical and economic indexes

Several modular neural networks relate past indexes, and buy / sell
timing

Prediction system
l Modular neural networks
l Very accurate

103

Application (Continued.. )
Architecture
Network Architecture
Network Model (Figure 18.5): 3 layers, standard sigmoid function,
continuous output [0, 1]
High-speed Supplementary Learning Algorithm
Training Data
n Data Selection
n Training Data

104

Application (Continued.. )

Preprocessing: Input Indexes - Converted into spatial patterns,


preprocessed to regularize them

Moving Simulation Prediction Method (Figure 18.7)

Result of Simulations
l Simulation for Buying and Selling Stocks
l Example (Figure 18.8)
l Excellent Profit

105

Application (Continued.. )
Human Resources
Predicting employees performance and behavior
Determining personnel resource requirements

Management

Corporate merger prediction


Takeover target prediction
Country risk rating

106

Application (Continued.. )

Marketing

Consumer spending pattern classification


New product analysis
Customers characteristics
Sales forecasts
Data mining
Airline fare management
Direct mail optimization
Targeted marketing

107

Application (Continued.. )

Operations Airline Crew Scheduling


Predicting airline seat demand
Vehicle routing
Assembly and packaged goods inspection
Fruit and fish grading
Matching jobs to candidates
Production/job scheduling

And Many More

108

Examples of Integrated ANNs and Expert Systems


Resource Requirements Advisor
n
n
n
n
n
n

Advises users on database systems resource requirements


Predict the time and effort to finish a database project
ES shell AUBREY and neural network tool NeuroShell
ES supported data collection
ANN used for data evaluation
ES final analysis

109

You might also like