Ann 26116

Artificial Neural Networks
Dr. Anupam Shukla, Professor

ABV-IIITM, Gwalior
Biological (MOTOR) Neuron
Biological (MOTOR) Neuron
Neurons and Synapses
The basic computational unit in the nervous system is the nerve cell, or
neuron. A neuron has:
Dendrites (inputs)
Cell body
Axon (output)
Neurons and Synapses

A neuron receives input from other neurons (typically many thousands).
Inputs sum (approximately). Once input exceeds a critical level, the
neuron discharges a spike - an electrical pulse that travels from the body,
down the axon, to the next neuron(s) (or other receptors). This spiking
event is also called depolarization, and is followed by a refractory period,
during which the neuron is unable to fire.
The axon endings (Output Zone) almost touch the dendrites or cell body
of the next neuron. Transmission of an electrical signal from one neuron
to the next is effected by neurotransmitters, chemicals which are released
from the first neuron and which bind to receptors in the second. This link
is called a synapse. The extent to which the signal from one neuron is
passed on to the next depends on many factors, e.g. the amount of
neurotransmitter available, the number and arrangement of receptors,
amount of neurotransmitter reabsorbed, etc.
The Biological Neuron
The brain is a collection of about 10 billion interconnected neurons. Each

neuron is a cell that uses biochemical reactions to receive, process and transmit
information.
Each terminal button is connected to other neurons across a small gap called a
synapse.
Slide 6
The Biological Neuron
A neuron's dendritic tree is connected to a thousand neighbouring

neurons. When one of those neurons fire, a positive or negative charge
is received by one of the dendrites. The strengths of all the received
charges are added together through the processes of spatial and
temporal summation.
Neurons vs units (2)

Real neuron is far away from our simplified model unit
Chemistry, biochemistry, quantumness.
Biological neuron
A neuron has
n
n
A branching input (dendrites)

A branching output (the axon)
The information circulates from the dendrites to the axon via the
cell body
Axon connects to dendrites via synapses
n
n
Synapses vary in strength

Synapses may be excitatory or inhibitory
Biological inspiration
Dendrites
Soma (cell body)
Axon
10
axon
dendrites
synapses
The information transmission happens at the synapses.
11
12
13
Presynaptic Neuron
Postsynaptic Neuron
Interconnections in Brain
15
Brain Computation
The human brain contains about 10 billion nerve cells, or
neurons. On average, each neuron is connected to other
neurons through approximately 10,000 synapses.
16
Comparison between brain verses computer

Brain
ANN
Speed
Few ms.
Few nano sec. massive ||el

processing
Size and complexity
1011 neurons & 1015

interconnections
Depends on designer
Storage capacity
Stores information in its

interconnection or in synapse.
No Loss of memory
Contiguous memory locations

loss of memory may happen
sometimes.
Tolerance
Has fault tolerance
No fault tolerance Inf gets

disrupted when
interconnections are
disconnected
Control mechanism
Complicated involves
chemicals in biological
neuron
Simpler in ANN
17
NNs vs Computers
Digital Computers
Deductive Reasoning. We apply

known rules to input data to produce
output.
Computation is centralized,
synchronous, and serial.
Memory is packetted, literally stored,

and location addressable.
Not fault tolerant. One transistor goes

and it no longer works.
Exact.
Static connectivity.
Applicable if well defined rules with

precise input data.
Neural Networks
Inductive Reasoning. Given input and

output data (training examples), we
construct the rules.
Computation is collective,
asynchronous, and parallel.
Memory is distributed, internalized,
short term and content addressable.
Fault tolerant, redundancy, and
sharing of responsibilities.
Inexact.
Dynamic connectivity.
Applicable if rules are unknown or
complicated, or if data are noisy or
partial.
Slide 18
Basic models of ANN
Basic Models of ANN
Interconnections
Learning rules
Activation function
19
Types of Learning
Supervised learning : In this kind of learning, both the inputs and the
outputs are well determined and supplied to the training algorithm.
Hence whenever an input is applied, we can calculate the error. We
try to adjust the weights in such a way that this error is reduced.
Types of Learning
Unsupervised learning: In this type of learning, the target outputs are
unknown. The inputs are applied, and the system is adjusted based on these
inputs only. Either the supporting weights of the problem are added or the
dissipative nodes are decreased. In either case, the system changes according
to the inputs.
Types of Learning
Reinforcement learning: This type of learning is based on the reinforcement
process. In this system, the input is applied. Based on the output, the system
either gives some reward to the network or punishes the network. In this
learning technique, the system tries to maximize the rewards and minimize
the punishment. The basic block diagram is given
Figure: The basic block diagram of reinforcement learning.
ASSOCIATION OF BIOLOGICAL NET WITH

ARTIFICIAL NET
23
Models of Neuron
McCulloch-Pitts Model:
In McCulloch-Pitts model the activation (x) is given by a weighted sum of its
M input values (ai) & a bias term (). The output signal (s) is typically a
nonlinear function f (x) of the activation value x. The following equation
describe the operation of an MP model:
M
x= wi ai -
Activation:
j=1
Output Signal:
a1
w1
a2
w2
aM
input
s=f (x)
wM
weights
(fixed)
M
wi ai
i=1
s= f (x)
Output function f(.)

Summing part
24
Models of Neuron (contd)

Three commonly used linear threshold function was used in the original
MP model.
In this model a binary output function is used with the following logic:
f (x)
= 1,x>0
= 0,x<=0
A single input & a single output MP neuron with proper weight &
threshold gives an output a unit time later.
In the MP model the weights are fixed. Hence a network using this
model does not have the capability of learning.
25

Perceptron:
The Rosenblatts perceptron model for an artificial neuron consists of
outputs from sensory units to a fixed set of association units, the outputs of
which are fed to an MP neuron.
The association units perform predetermined manipulations on their inputs.
The main deviation from the MP model is that learning is incorporated in the
operation of the unit.
q
Sensory units
Association
units
a
A1
1
A2
AM
weights
w1
a2
w2
aM
wM
(adjustable)
s= f (x)
Summing unit Output unit

The desired or target output (b) is compared with the actual binary output
(s), & the error () is used to adjust the weights.
The following equation describe the operation of the perceptron model of
a neuron:
M
Activation:
x= wi ai -
j=1
Output Signal:
Error:
Weight change:
s=f (x)
= b - s
wi = ai
where is the learning rate parameter.

27

Adaline:
ADAptive LINear Element (ADALINE) is a computing model proposed by
Widrow & is shown below.
The main distinction between the Rosenblatts perceptron model & the
Widrows Adaline model is that, in the adaline the analog activation value (x)
is compared with the target output (b).
q
a1
w1
a2
w2
aM
wM
input
weights
M
wi ai
i=1
Activation value
Output Signal
s= f (x)
Output function f(.)

Summing part

In other words, the output is a linear function of the activation value (x).
The eqns that describe the operation of an Adaline are as follows:
M
Activation:
x= wi ai -
j=1
Output Signal:
Error:
s=f (x)
= b s = b - x
Weight change:
wi = ai
where is the learning rate parameter

This weight update rule minimizes the mean squared error , averaged
over all inputs. Hence it is called Least Mean Squared (LMS) error
learning law.
This law is derived using the negative gradient of the error surface in the
weight space. Hence it is also known as a gradient descent algorithm.
29
Transfer functions (contd)

When the threshold T is 0, the step function is called signum.
30

The sigmoid
The sigmoid transfer function produces a continuous value in the range 0

to 1
The parameter gain affects the slope of the function around zero
31

The hyperbolic tangent
A variant of the sigmoid transfer function
outputi
e activationi e activationi
= activationi
e
+ e activationi
Has a shape similar to the sigmoid (like an S), with the difference being that
the value of outputi ranges between 1 and 1.
32
The Neuron Model

Bias
x1
Input
values
x2
w2
!
xm
!
wm
weights
Induced
Field
Summing
function
Activation
function
Output
()
The Neuron Model

Definition : Non linear, parameterized function with restricted output range
y
n 1
y = f w0 + wi xi
i =1
w0
x1
x2
x3
Multilayer feed forward network
35
Feedback network
When outputs are directed back as inputs to same or preceding layer nodes it
results in the formation of feedback networks
36
Single layer Feedforward Network
37
Adaptive Linear Neuron (ADALINE)

In 1959, Bernard Widrow and Marcian Hoff of Stanford developed
models they called ADALINE (Adaptive Linear Neuron) and
MADALINE (Multilayer ADALINE). These models were named for
their use of Multiple ADAptive LINear Elements.
MADALINE was the first neural network to be applied to a real world
problem. It is an adaptive filter which eliminates echoes on Telephone
lines.
Dr R R Janghel
38
ADALINE Model
Dr R R Janghel
39
ADALINE Netwok
Initialize
Training
Thinking
Initialize
n Assign random weights to all links
Training
n Feed-in known inputs in random sequence
n Simulate the network
n Compute error between the input and the output
(Error Function)
n Adjust weights (Learning Function)
n Repeat until total error <
Thinking
n Simulate the network
n Network will respond to any input
n Does not guarantee a correct solution even for
trained inputs
40
Multilayer Perceptron
Output Values
Output Layer
Adjustable
Weights
Input Signals (External Stimuli)
Input Layer
41
Layers in Neural Network

The input layer:
Introduces input values into the network.
No activation function or other processing.
The hidden layer(s):
Performs classification of features.
Two hidden layers are sufficient to solve any problem.
Features imply more layers may be better.
The output layer:
Functionally is just like the hidden layers.
Outputs are passed on to the world outside the neural network.
Dr R R Janghel
42
BACK PROPAGATION Algorithm model
43
Multi-layer Neural Network

Employing Back propagation Algorithm
To illustrate this process let us take an example of three layer neural

network with two inputs and one output, which is shown in the picture
below:
44
Example (contd)
Each neuron is composed of two units. First unit adds products of weights
coefficients and input signals. The second unit realize nonlinear function,
called neuron activation function. Signal e is adder output signal, and y = f (e)
is output signal of nonlinear element. Signal y is also output signal of neuron.
45
Example (contd)
To teach the neural network we need training data set. The training data set
consists of input signals(x1 and x2 ) assigned with corresponding target (desired
output) z.
Pictures below illustrate how signal is propagating through the network,
Symbols w(xm)n represent weights of connections between network input xm and
neuron n in input layer. Symbols yn represents output signal of neuron n.
46
Example (contd)
47
Example (contd)
Propagation of signals through the hidden layer. Symbols wmn represent
weights of connections between output of neuron m and input of neuron n in
the next layer.
48
Example (contd)
Propagation of signals through the output layer.
In the next algorithm step the output signal of the network y is compared
with the desired output value (the target), which is found in training data
set. The difference is called error signal d of output layer neuron.
49
Example (contd)
It is impossible to compute error signal for internal neurons directly, because
output values of these neurons are unknown. The idea is to propagate error
signal d (computed in single teaching step) back to all neurons, which output
signals were input for discussed neuron.
50
Example (contd)
The weights' coefficients wmn used to propagate errors back are equal to this
used during computing output value. Only the direction of data flow is
changed.
51
Example (contd)
When the error signal for each neuron is computed, the weights coefficients
of each neuron input node may be modified. In formulas below df(e)/de
represents derivative of neuron activation function (which weights are
modified).
52
Example (contd)
53
Example (contd)
54
Example (contd)
Coefficient affects network teaching speed. There are a few techniques to

select this parameter. The first method is to start teaching process with large
value of the parameter. While weights coefficients are being established the
parameter is being decreased gradually. The second, more complicated,
method starts teaching with small parameter value. During the teaching
process the parameter is being increased when the teaching is advanced and
then decreased again in the final stage. Starting teaching process with low
parameter value enables to determine weights coefficients signs.
55
Back-Propagation Training Algorithm

The back-propagation training algorithm is an iterative
gradient algorithm designed to minimize the mean square error
between the actual output of a multilayer feed-forward
perceptron and the desired output. It requires continuous
differentiable non-linearity's. The following assumes a sigmoid
logistic non-linearity is used where the function f() is
f() =
1 .
1+ e-(-)
56
Architecture of Back-Propagation Algorithm

X1
X2
.
.
Xp
x1
h1
y1
x2
h2
y2
xp3
hm3
yn3
Input layer
Hidden layer
Output layer
57
The BPA Algorithm

Step 1: Initialize weights and offsets
Set all weights and node offsets to small random values
Step 2: Present Input and Desired Outputs
Present a continuous valued input vector x0,x1xP-1 and specify the
desired output d0,d1dN-1. If the net is used as a classifier then all desired
outputs are typically set to zero except for that corresponding to the class
the input is from. That desired output is 1. The input could be new on each
trial or samples from a training set could be presented cyclically until
weights stabilize.
58
The BPA Algorithm (contd)

Step 3: Calculate Actual Outputs
Use the sigmoid nonlinearity from above and formulas to calculate
outputs y0,y1yN-1.
Step 4: Adapt Weights
Use a recursive algorithm staring at the output nodes and
working back to the first hidden layer. Adjust weights by
wij(t+1) = wij(t) + j xi
In this equation wij(t) is the weight from hidden node i or from an
input to node j at time t, xi is either the output of node i or is an input, is
a gain term, and j is an error term for node j. if node j is an output node,
then
j = xj(1-xj) k wjk
59
The BPA Algorithm (contd)

Internal node thresholds are adapted in a similar manner by assuming they are
connection weights on links from auxiliary constant-valued inputs.
Convergence is sometimes faster is a momentum term is added and weight
changes are smoothed by
wij(t+1) = wij(t) + j xi + (wij(t) - wij(t-1)),
where 0 < < 1
Step 5: Repeat steps 2 to 4.
60
Derivation of Back-Propagation Algorithm

x= input vector of input layer for unit k.
h=weight vector of input layer for unit k.
v=input vector of hidden layer for unit j.
g= weight vector of hidden layer for unit j.
y=output vector of output layer.
wij= weight for ith neuron of output layer to jth neuron of hidden layer.
wjk= weight for jth neuron of hidden layer to kth neuron of input layer.
61
Derivation (contd)
Input of hidden layer:
p
hj = wjk * xk ..(1)
k=1
Output of hidden layer:
vj = f(hj)
-h
vj = 1/(1+ e j) ...(2)
62
Derivation (contd)
Input of output layer:
m
gi= wij * vj ..(3)
j=1
Output of output layer:
yi = f(gi)
g
yi = 1/(1+e- i) ..(4)
63
Derivation (contd)
Error function:
n
E(t)= 1/2 (yid - yi )2..(5)
i=1
Weight function:
wij(t+1)= wij(t) + wij(t).....(6)
wij(t) = - E(t) .. ..(7)

wij(t)
64
Derivation (contd)
Updating weights between output layer and hidden layer:
n
E(t) = E(t) * yi
(8)
wij(t) i=1 yi
wij(t)
From equation (4) and (5):
yi = 1/(1+e-gi)
E(t)= 1/2 (yid - yi )2
E(t) = - (yid - yi ) .(9)
yi
65
Derivation (contd)
-g
yi
= (1+e i) -1
wij(t)
wij
yi
= yi
wij(t) gi
gi
wij
m
-g
yi
= (1+e i) -1
wij(t)
gi
( wijvj)
j=1
wij
66
Derivation (contd)
yi
= -1(-e-gi)
wij(t)
(1+e-gi) 2
yi
= 1/(1+e-gi)
(1-yi)
= 1- 1/(1+e-gi)
(1-yi)
= e-gi/(1+e-gi)
vj(10)
yi(1-yi) = e-gi/(1+e-gi)2
67
Derivation (contd)
substituting this value in eq (10)
yi = (1-yi) yi * vj(11)
wij(t)
substituting the value of eq (9) and (11) in eq (8)
E(t) = -(yid - yi ) * (1-yi) yi * vj ....(12)
wij(t)
68
Derivation (contd)
i = -yi(1-yi)(yid yi)---------------------------- (13)
E(t) = - ni=1 i vj------------------------------(14)
wij(t)
where wij(t) = - E(t)
wij(t)
wij(t) = i vj-----------------------------------(15)
Hence updated weight will be
wij(t+1) = wij (t) + i vj-----------------------(16)
69
Derivation (contd)
Updating weights between hidden layer and input layer:
n
E(t) = Ei(t) ----------------------------------(17)

wjk(t) j=1 wjk
= Ei(t) * yi ---------------------------- (18)
yi
wjk(t)
A
B
A= Ei(t) = -(yid yi)------------------(19)
yi
B = yi = yi * gi * vj * hj ---- ---(20)
wjk(t) gi vj hj wjk
D E F G
70
Derivation (contd)
D = yi = yi(1-yi) ----------------------------------(21)
gi
E = gi = wij ----------------------------------------(22)
vj
F = vj = (1 + e-hj)-1 = vj(1-vj)-----------------(23)
hj hj
G = hj = xk -----------------------------------------(24)
wjk
Derivation (contd)
Substituting the values of equation (19),(20),(21),(22), (23), (24) into
equation (18), we get
E(t) = -(yid yi) * yi(1-yi) * vj(1-vj) xk wij
wjk(t)
= -vj(1-vj) xk ni=1 iwij
where i = yi(yid yi)(1-yi)
j = -vj(1-vj) ni=1 iwij
Change in weight between hidden & input layer is:
wij(t) = j xk
Hence updated weight will be
wjk(t+1) = wjk (t) + j xk
72
The error landscape in a multilayer perceptron

For a given pattern p, the error Ep can be plotted against the weights to
give the so called error surface
The error surface is a landscape of hills and valleys, with points of
minimum error corresponding to wells and maximum error found on
peaks.
The generalised delta rule aims to minimise Ep by adjusting weights so
that they correspond to points of lowest error
It follows the method of gradient descent where the changes are made in
the steepest downward direction
All possible solutions are depressions in the error surface, known as
basins of attraction
73
Variable Learning Rate

One solutions for choosing the correct learning rate is to keep the learning
rate as a variable.
A variable learning rate allows a system to be more flexible.
The concept of a variable learning rate is analogous to the frame rate while
watching a movie. Suppose there is an action scene in your movie. You
might consider watching the scene at a lower frame rate so as to get the
complete details. Or you may fast forward through most of the scenes
because they do not appeal you.
Momentum
The training algorithm is an iterative algorithm, which means that at each
step, the algorithm tries to move in such a way that the total error is reduced.
In the error graph the deeper valley corresponds to a lower error, and hence
to a better configuration of the ANN, than the shallower valley. The deeper
valley is known as the global minima, which is the least error in the entire
surface. The shallower valley is called the local minima.
Momentum
The training algorithm is trying to move in such a way that error is
reduced, it keeps following the local minima.
It can easily be seen that if it continues to move in the same direction, it
would eventually attain the global minima.
The momentum keeps pushing the training algorithm to continue moving in
the previous direction, making it possible for the training algorithm to
escape out of the local minima.
The meaning of momentum in this case is analogous to the meaning of
momentum in the physical world.
For example, a ball moving has momentum that keeps it moving in the
same direction.
Stopping Condition
The algorithm stops according to the stopping condition. Normally one or
more of the following criteria are used as stopping conditions:
Time: The algorithm may be stopped when the time taken to
execute exceeds more than a threshold.
Epoch: The algorithm has a specified maximum number of epochs.
Upon exceeding this number, the algorithm may be stopped
Goal: The algorithm may be stopped if the error measured by the
system reduces to more than a specific value. It may not be useful to
continue training after this point.
Stopping Condition
Validating data: If the error on validation data starts increasing, even
if there is a decrease in the testing data, it would be better to stop further
training.
Gradient: Gradient refers to the improvement of the performance or
the lowering of the error in between epochs.
Learning difficulties in multilayer perceptrons - local minima

The MLP may fail to settle into the global minimum of the error surface
and instead find itself in one of the local minima
This is due to the gradient descent strategy followed
A number of alternative approaches can be taken to reduce this
possibility:
Lowering the gain term progressively
n Used to influence rate at which weight changes are made during
training
n Value by default is 1, but it may be gradually reduced to reduce the
rate of change as training progresses
79
Learning difficulties in multilayer perceptrons (contd)

Addition of more nodes for better representation of patterns
n Too few nodes (and consequently not enough weights) can cause failure
of the ANN to learn a pattern
Introduction of a momentum term
n Determines effect of past weight changes on current direction of
movement in weight space
n Momentum term is also a small numerical value in the range 0 -1

Addition of random noise to perturb the ANN out of local minima
n Usually done by adding small random values to weights.
n Takes the net to a different point in the error space hopefully out of a
local minimum
80
Weight v/s Sum of Square Error
81
Flow Diagram Of Back-Propagation Algorithm.
82
Some advantages of ANNs

Able to take incomplete or corrupt data and provide approximate results.
Good at generalisation, that is recognising patterns similar to those learned
during training
Inherent parallelism makes them fault-tolerant loss of a few
interconnections or nodes leaves the system relatively unaffected
Parallelism also makes ANNs fast and efficient for handling large amounts
of data.
83
ANN State-of-the-art overview

Currently neural network systems are available as
n Software simulation on conventional computers - prevalent
n Special purpose hardware that models the parallelism of neurons.
ANN-based systems not likely to replace conventional computing
systems, but they are an established alternative to the symbolic logic
approach to information processing
A new computing paradigm in the form of hybrid intelligent systems has
emerged - often involving ANNs with other intelligent system tools
84
Network
The network was formed by trial and error. After numerous tries, we got the
optimal results by the following network configuration, which we tried changing
only the number of neurons in the hidden layer.
Number of hidden layers: 1

Number of neurons in hidden layer: 20
Activation functions of all neurons: Tan Sigmoid
Training method: train Gradient Descend (traingdm)
Learning rate: 0.1
Momentum: 0.01
Max epochs: 2,500
Goal: 0.01
Note that the goal was not met, even upon reaching the maximum epochs during
training.
Results
The results are further summarized in Table
Results of Time Forecasting Experiment
In this example, we used ANN to solve a real-life problem and got good
results.
Outputs
The systems output was the value of the series at the tth time. We repeated
the experiment for all ts to predict/regenerate the complete sequence
(starting from t = 15, because before that all previous inputs were
unknown).
Figure: The time series forecasting experiment.
Conclusion
The ANN can be seen as an instrumental tool for solving virtually any
real-life problem.
ANNs are definitely the technology of the future.
In real-life systems, however, these play a great role in making a
robust system that gives a very high performance.
Designing a good ANN is more of an art than a technical concept.
Equally important is the selection of inputs and outputs.
Thus we need to think broadly about the feasibility of the selected
input and output parameters before even trying to train the system.
ANNs, if used cautiously, are an excellent means for solving
emerging industrial problems.
ANN applications
The main business application areas of ANNs are:
n Production (36%)
n Information systems (20%)
n Finance (18%)
n Marketing & distribution (14.5%)
n Accounting/Auditing (5%)
n Others (6.5%)
89
ANN applications in Finance

Accounting
Finance
Human Resources
Human Resources
Management
Marketing
Identify tax fraud
Enhance auditing by finding irregularities
90
Expert System Resets Credit Limits Monthly

n Mimics credit managers decision making
n Scheduler prioritizes tasks: Staff handles delinquent accounts
Benefits
n More specialized and personalized customer service
n Efficient credit department
n Increased sales
n Happy customers
n Happy customer service representatives
91
Signatures and bank note verifications

Mortgage underwriting
Foreign exchange rate forecasting
Country risk rating
Bankruptcy prediction
Customer credit scoring
Credit card approval and fraud detection
Corporate merger and take over predictions
Currency trading
Stock and commodity selection and trading
92
Application (Continued.. )
Credit card profitability
Forecasting economic turning points
Foreign exchange trading
Bond rating and trading
Pricing initial public offerings
Load approvals
Economic and financial forecasting Risk management
Signature validation
93
For Credit Approval

Increases loan processor productivity by 25 to 35 % over other
computerized tools
Also detects credit card fraud
94
The ANN Method
Data from the application and into a database
Database definition
Preprocess applications manually
Neural network trained in advance with many good and bad risk cases
95
Neural Network Credit Authorizer Construction Process
Step 1: Collect data
Step 2: Separate data into training and test sets
Step 3: Transform data into network inputs
Step 4: Select, train and test network
Step 5: Deploy developed network application
96
Using ANNs for Bankruptcy Prediction
Concept Phase
Paradigm: Three-layer network, back-propagation

Training data: Small set of well-known financial ratios
Data available on bankruptcy outcomes
Supervised network
Training time not to be a problem
97
Application Design
Five Input Nodes
X1: Working capital/total assets
X2: Retained earnings/total assets
X3: Earnings before interest and taxes/total assets
X4: Market value of equity/total debt
X5: Sales/total assets
Single Output Node: Final classification for each firm
n Bankruptcy or
n Nonbankruptcy
Development Tool: NeuroShell
98
Application Design
Development
l Three-layer network with backpropagation (Figure 18.5)
l Continuous valued input
l Single output node: 0 = bankrupt, 1 = not bankrupt
Training
l Data Set: 129 firms
l Training Set: 74 firms; 38 bankrupt, 36 not
l Ratios computed and stored in input files for
n The neural network
n A conventional discriminant analysis program
99
Application Design
Parameters
l Learning threshold
l Learning rate
l Momentum
Testing
l Two Ways
n
n
Test data set: 27 bankrupt firms, 28 nonbankrupt firms

Comparison with discriminant analysis
The neural network correctly predicted

n
n
81.5 percent bankrupt cases

82.1 percent nonbankrupt cases
100
Application Design
ANN did better predicting 22 out of the 27 actual cases
Discriminant analysis predicted only 16 correctly
Error Analysis
n Five bankrupt firms misclassified by both methods
n Similar for nonbankrupt firms
Neural network at least as good as conventional
Accuracy of about 80 percent is usually acceptable for neural network
applications
101
Stock Market Prediction System with Modular Neural
Networks
Accurate Stock Market Prediction - Complex Problem
Several Mathematical Models - Disappointing Results
Fujitsu and Nikko Securities: TOPIX Buying and Selling Prediction

System
102
Stock Market Prediction System with Modular Neural
Networks
Input: Several technical and economic indexes
Several modular neural networks relate past indexes, and buy / sell
timing
Prediction system
l Modular neural networks
l Very accurate
103
Architecture
Network Architecture
Network Model (Figure 18.5): 3 layers, standard sigmoid function,
continuous output [0, 1]
High-speed Supplementary Learning Algorithm
Training Data
n Data Selection
n Training Data
104
Preprocessing: Input Indexes - Converted into spatial patterns,

preprocessed to regularize them
Moving Simulation Prediction Method (Figure 18.7)
Result of Simulations
l Simulation for Buying and Selling Stocks
l Example (Figure 18.8)
l Excellent Profit
105
Human Resources
Predicting employees performance and behavior
Determining personnel resource requirements
Management
Corporate merger prediction

Takeover target prediction
Country risk rating
106
Marketing
Consumer spending pattern classification

New product analysis
Customers characteristics
Sales forecasts
Data mining
Airline fare management
Direct mail optimization
Targeted marketing
107
Operations Airline Crew Scheduling

Predicting airline seat demand
Vehicle routing
Assembly and packaged goods inspection
Fruit and fish grading
Matching jobs to candidates
Production/job scheduling
And Many More
108
Examples of Integrated ANNs and Expert Systems

Resource Requirements Advisor
n
n
n
n
n
n
Advises users on database systems resource requirements

Predict the time and effort to finish a database project
ES shell AUBREY and neural network tool NeuroShell
ES supported data collection
ANN used for data evaluation
ES final analysis
109

Ann 26116

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ann 26116

Uploaded by

Copyright:

Available Formats

Artificial Neural Networks

Dr. Anupam Shukla, Professor

Biological (MOTOR) Neuron

Biological (MOTOR) Neuron

Neurons and Synapses

Neurons and Synapses

The Biological Neuron

The brain is a collection of about 10 billion interconnected neurons. Each

The Biological Neuron

A neuron's dendritic tree is connected to a thousand neighbouring

Neurons vs units (2)

Chemistry, biochemistry, quantumness.

A branching input (dendrites)

Synapses vary in strength

Soma (cell body)

Comparison between brain verses computer

Few nano sec. massive ||el

Size and complexity

1011 neurons & 1015

Stores information in its

Contiguous memory locations

Has fault tolerance

No fault tolerance Inf gets

Deductive Reasoning. We apply

Memory is packetted, literally stored,

Not fault tolerant. One transistor goes

Applicable if well defined rules with

Inductive Reasoning. Given input and

Basic models of ANN

Basic Models of ANN

Figure: The basic block diagram of reinforcement learning.

ASSOCIATION OF BIOLOGICAL NET WITH

Output function f(.)

Models of Neuron (contd)

Models of Neuron (contd)

Summing unit Output unit

Models of Neuron (contd)

where is the learning rate parameter.

Models of Neuron (contd)

Output function f(.)

Models of Neuron (contd)

where is the learning rate parameter

Transfer functions (contd)

Transfer functions (contd)

The sigmoid transfer function produces a continuous value in the range 0

Transfer functions (contd)

The Neuron Model

The Neuron Model

Multilayer feed forward network

Single layer Feedforward Network

Adaptive Linear Neuron (ADALINE)

Input Signals (External Stimuli)

Layers in Neural Network

BACK PROPAGATION Algorithm model

Multi-layer Neural Network

To illustrate this process let us take an example of three layer neural

Coefficient affects network teaching speed. There are a few techniques to

Back-Propagation Training Algorithm

Architecture of Back-Propagation Algorithm

The BPA Algorithm

The BPA Algorithm (contd)

The BPA Algorithm (contd)

Step 5: Repeat steps 2 to 4.

Transfer functions (contd)

Transfer functions (contd)

Transfer functions (contd)

Multi-layer Neural Network

The error landscape in a multilayer perceptron

Learning difficulties in multilayer perceptrons - local minima

Some advantages of ANNs