You are on page 1of 5

Artificial Neural Networks

Single Layer Networks

Multi Layer Networks

Generalization

Artificial Neural Networks

Artificial Neural Networks

Single Layer Networks

Multi Layer Networks

Generalization

Artificial Neural Networks

Single Layer Networks

Artificial Neural Networks


Properties
Applications
Classical Examples
Biological Background

Single Layer Networks


Limitations
Training Single Layer Networks

Multi Layer Networks


Possible Mappings
Backprop algoritmen
Practical Problems

Generalization

Artificial Neural Networks

Single Layer Networks

Multi Layer Networks

Generalization

Multi Layer Networks

Generalization

Properties

Artificial Neural Networks


Properties
Applications
Classical Examples
Biological Background

Single Layer Networks


Limitations
Training Single Layer Networks

Multi Layer Networks


Possible Mappings
Backprop algoritmen
Practical Problems

Generalization

Artificial Neural Networks

Single Layer Networks

Artificial Neural Networks (ANN)


Inspired from the nervous system
Parallel processing
We will focus on one class of ANNs:
Feed-forward Layered Networks

Multi Layer Networks

Generalization

Artificial Neural Networks

Applications

Classical Examples

Applications

Classical Examples

Single Layer Networks

Multi Layer Networks

Operates like a general Learning Box!


ALVINN

Classification

Autonomous driving

Yes/No

Function Approximation

[1, 1]

Video image
Trained to mimic the behavior of human drivers

Multidimensional Mapping

Steering

Generalization

Artificial Neural Networks

Single Layer Networks

Multi Layer Networks

Generalization

Classical Examples

Artificial Neural Networks

Single Layer Networks

Multi Layer Networks

Generalization

Biological Background

Classical Examples

How do real neurons (nerve cells) work?

NetTalk
Speech Synthesis
Dendrites Soma

"Hello"

Phonem

Written text

Dendrites
Passive reception of (chemical) signals
Soma (Cell Body)
Summing, Thresholding

Coded pronunciation

Axon
Aktive pulses are transmitted to other cells

Trained using a large database of spoken text

Artificial Neural Networks

Single Layer Networks

Multi Layer Networks

Axon

Generalization

Biological Background

Artificial Neural Networks

Single Layer Networks

Multi Layer Networks

Generalization

Biological Background

ANN-caricatures

Nerve cells can vary in shape and other properties

(simplified view of the neural information processing)

Weighted input signals


Summing
Thresholded output

Artificial Neural Networks

Single Layer Networks

Multi Layer Networks

Generalization

Artificial Neural Networks

Single Layer Networks

Multi Layer Networks

Limitations

Artificial Neural Networks


Properties
Applications
Classical Examples
Biological Background

Single Layer Networks


Limitations
Training Single Layer Networks

Multi Layer Networks


Possible Mappings
Backprop algoritmen
Practical Problems

Generalization

What do we mean by a Single Layer Network?

Each cell operates independently of the others!


It is sufficient to understand what one cell can compute

Generalization

Artificial Neural Networks

Single Layer Networks

Multi Layer Networks

Generalization

Limitations

Artificial Neural Networks

Single Layer Networks

Multi Layer Networks

Generalization

Limitations

What can a single cell compute?

o = sign

!
xi wi

Geometrical interpretation
x2

~x Input in vector format

~ Weights in vector format


w

x1

o Output
o = sign

!
xi wi
Separating hyper plane
Linear separability

Artificial Neural Networks

Single Layer Networks

Multi Layer Networks

Generalization

Training Single Layer Networks

Artificial Neural Networks

Single Layer Networks

Multi Layer Networks

Generalization

Training Single Layer Networks

Learning in ANNs

Perceptron Learning

What does learning mean here?

Incremental learning

The network structure is normally fixed

Weights only change when the output is wrong

Learning means finding the best weights wi

Update rule: wi wi + (t o)xi


Always converges if the problem is solvable

Two good algorithms for single layer networks:


Perceptron Learning
Delta Rule

Artificial Neural Networks

Single Layer Networks

Multi Layer Networks

Generalization

Artificial Neural Networks

Single Layer Networks

Training Single Layer Networks

Artificial Neural Networks


Properties
Applications
Classical Examples
Biological Background

Single Layer Networks


Limitations
Training Single Layer Networks

Multi Layer Networks


Possible Mappings
Backprop algoritmen
Practical Problems

Generalization

Delta Rule (LMS-rule)


Incremental learning
Weights always change
~ T ~x )xi
wi wi + (t w
Converges only in the mean
Will find an optimal solution even it the problem can not be
fully solved

Multi Layer Networks

Generalization

Artificial Neural Networks

Single Layer Networks

Multi Layer Networks

Generalization

Possible Mappings

Artificial Neural Networks

Single Layer Networks

Multi Layer Networks

Generalization

Possible Mappings

What is the point of having multiple layers?


Will it be even better with more layers?
Two layers can describe any classification
Two layers can approximate any continuous function
Three layers can sometimes do the same thing more efficiently
More than three layers are rarely used

A two layer network can implement arbitrary decision surfaces


...provided we have enough hidden units

Artificial Neural Networks

Single Layer Networks

Multi Layer Networks

Generalization

Backprop algoritmen

Artificial Neural Networks

Single Layer Networks

Multi Layer Networks

Generalization

Backprop algoritmen

How can we train a multi layer network?


Neither perceptron learning, nor the delta rule can be used
Fundamental problem:
When the network gives the wrong answer
there is no information on in which direction
the weights need to change to improve the result

Basic idea:
Minimize the error (E ) as a function of all weights (~
w)

Fundamental trick:
Use threshold-like, but continuous functions

Artificial Neural Networks

Single Layer Networks

Multi Layer Networks

Backprop algoritmen

Compute the direction in weight space where the error


increases the most gradw~ (E )
Change the weights in the opposite direction
wi wi

Generalization

Artificial Neural Networks

Single Layer Networks

E
wi

Multi Layer Networks

Generalization

Backprop algoritmen

Normally one can use the error from each example separately
E=

1 X
(tk ok )2
2

The gradient can be expressed as a function of a local generalized


error
E
= i xj
wji

kOut

A common threshold-like function is

Output layer:

1
(y ) =
1 + e y

wji wji + i xj

k = ok (1 ok ) (tk ok )

Hidden layers:

0.75

h = oh (1 oh )

1
1 + ex

0.5

wkh k

kOut

0.25

0
10

10

The error propagates backwards through the layers


Error backpropagation (BackProp)

Artificial Neural Networks

Single Layer Networks

Multi Layer Networks

Generalization

Artificial Neural Networks

Single Layer Networks

Multi Layer Networks

Generalization

Multi Layer Networks

Generalization

Practical Problems

Artificial Neural Networks


Properties
Applications
Classical Examples
Biological Background

Single Layer Networks


Limitations
Training Single Layer Networks

Multi Layer Networks


Possible Mappings
Backprop algoritmen
Practical Problems

Generalization

Things to think about then using BackProp


Sloooow
Normal to require thousands of iterations through the dataset
Gradient following
Risk of getting stuck in local minima
Many parameters
Step size
Number of layers
Number of hidden units
Input and output representation
Initial weights

Artificial Neural Networks

Single Layer Networks

Multi Layer Networks

Generalization

Artificial Neural Networks

Single Layer Networks

Generalization
Risk for overfitting (
overinlarning)!
The net normally interpolates smoothly between the data points

If the network has too many degrees of freedom (weights), the risk
increases that learning will find a strange solution

1
1
0.75
0.75
0.5
0.5
0.25
0.25
0
0

0.2

0.4

0.6

0.8

Results in good generalization

Artificial Neural Networks

Single Layer Networks

Multi Layer Networks

Limiting the number of hidden units tends to improve


generalization
1

0.75

0.5

0.25

0
0

0.2

0.4

0.6

0.8

Generalization

0.2

0.4

0.6

0.8

You might also like