Adapt Data Forec-2016 P3 Lappeenranta v3

Data Forecasting using Artificial
Intelligence
Course 3
Iulian Nastac
Polytechnic University of Bucharest
nastac@ieee.org
Lappeenranta, August 2016

Recap from previous course
Artificial Neural Networks
• computational models inspired by animal
nervous systems (especially the brain) that
are capable of learning and pattern
recognition
• ANNs are usually presented as systems of

interconnected "neurons" that can deal
(compute) with large volume of information
I. Nastac, August 2016 2

Recap:
Classic structure of a first order neuron
x1 bi
.
.
wi1
.
.
wij ni
xj   yi
.
.
.
.
.
win
xn
3
Recap: Solving Problems with
Neural Networks
• As a user of neural networks you must know
what problems are adaptable to neural
networks.
• You must also be aware of what problems are
not particularly well suited to neural
networks.
• Like for most computer technologies and
techniques, often the real important thing is
when to use the technology and when not to. 4
Recap: Training Neural Networks
Connectionist approach: Individual neurons in a NN
are interconnected through synapses.
• The connections allow the neurons to signal each other

as information is processed.
• Each connection is assigned a connection weight.
• If there is no connection between two neurons, then
their connection weight is zero.
• The weights determine the output of the neural network
of a given input.
• The connection weights form the memory of the neural
network. 5
Artificial Neural Network as a
Black Box
6
Recap: Problems Suited to
Artificial Neural Network
• Function identification
• Clustering and classification
• Data forecasting
7
How to design an ANN
application
• Analyze the database

• Find the proper ANN architecture
• Use an efficient training algorithm
• Test it on specific conditions
8
General Problem
• computing speed
• parallelism
• reliability
• programming
I. Nastac, April 2016 9
• generalization
THE RETRAINING
PROCEDURE OF AN ANN
(first version)
• Training an Artificial Neural Network in standard way
with validation stop
• Reduction of the first network weights with a scaling

factor  (0 < < 1). Usually,  = 0.1, 0.2, …, 0.9
• Retraining the network with the new initial weights
• Compare the training cycle number for each case

10
3000
2500
Training Cycles Number
2000
1500
1000
500
0
0,0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0
Retraining procedure through BKP method

(different function g) 11
Notes:
• Regardless of the training procedures, we get that
varying the scaling value, the number of retraining
cycles starts from a higher value than the reference
cycles number (marked with a horizontal line), and
then it progressively decreases in accordance with
the increases of , much below the reference value,
especially for   0.3
• Sometimes, for the values of  higher than 0.6, we

have significant jumps of the learning cycles
number that are associated with the network
paralysis or over-learning phenomenon
12
Outcomes:
• opt[0.4, 0.6]
• An increase of the training cycles number for   0.7 in

35% of the analyzed cases
• Generalization of the conclusion regarding retraining

procedure for networks with any dimension
• Neural classification system for 2D- and 3D-image

recognition in tissue
13
Effect of hidden layer size on network
generalization
hidden
neurons

Validation-stop improvement
Training set:
• 85% for training
• 15% for validation

Can validation set act as a kind of
test set at the same time?
16
Testing the model
• The ANN model very depend on the
training set (that include the validation set).
• Sometimes the test set can come from a

changing environment…
• How to deal with non-stationary systems?

17
Pyramidal
ANN
structure

THE RETRAINING
PROCEDURE OF AN ANN
(enhanced version)
• Training an Artificial Neural Network in standard way
with validation stop
• Reduction of the first network weights with a scaling

factor  (0 < < 1). Usually,  = 0.1, 0.2, …, 0.9
• Retraining the network with the new initial weights
• Compare the validation error in both cases

19
Adaptive Retraining Technique
(step by step)

Finding ANN Structure
• Each of the training sessions starts with the weights
initialized to small, uniformly distributed values.
• Test several pyramidal ANN architectures, with Nh1 and
Nh2 taking values in the vicinity of the geometric mean of
the neighboring layers.
• Chose the best model with respect to the smallest error

between the desired and the simulated output. 21
Forecasting issues
Nonstationary Sequences Forecasting
Sequences: ● temporal sequences (time series)
● spatial sequences (chains of objects, features, DNA, etc.)
Tool: Artificial Intelligence Model

Retraining Neuro-Adaptive Technique
22
Starting model
• The data vary dynamically, and large space-delays might occur.

• Solutions are ranked by accuracy of the output forecasting according to the
following formula:
100 T ORkp  OFkp

ERR  
T p 1 ORkp
 f ( p)
where T is the number of time steps, ORkp – the real output_k at space/time step p,
OFkp – the forecasted output_k at space/time step p, and
T
f ( p)  23
Tp
Research Approach
• The sequences under discussion are inherently nonstationary.
• Nonstationary implies that the distribution of the sequences may

changes at different positions.
• Some gradual changes in the dependency between the input and output
variables may appear.
• The recent data points could provide more important information than
the distant data points.
• Use the adaptive retraining mechanism to take this characteristic into

24
account.
Basic idea
25
Forecasting Neural Network
Training Process
26
yk (t  1)  F ( X (t  1  In _ Del (i )), Y (t  Out _ Del ( j )))
Principal Component Analysis
(PCA)
• Principal Component Analysis (PCA) can be used to
reduce the feature vector dimension while retaining most
of the information in the feature information by
constructing a linear transformation matrix.
• The transformation matrix is made up of the most

significant eigenvectors of the feature vector covariance
matrix.
• The eigenvectors are orthonormal (orthogonal and

normalised) so they transform the original data into
independent feature information having maximal variance.
27
Adaptive Retraining
28
Extending the forecasts -
Iterative Simulations (IS)
Steps:
• Start: Construct at least one forecast using real data
at the input (selected by In_Del and Out_Del).
• Treat the previous forecasts (selected by Out_Del) as
observed values in order to produce other successive
forecasts (there is a combination of real and simulated
data at the input).
Using an iterative process, a forecast can be extended as

many time steps as required.
29
Alternative:
Always Real Inputs (ARI)
• The Always Real Inputs (ARI) approach

employs the real previous outputs and not
estimated ones.
30
Output Forecasting

Experiments
• Cases: I. Time series: Industrial & Financial Forecasting
II. Spatial Sequences: DNA Sequence Forecasting
• Scale Conjugate Gradient (SCG) as basic training algorithm

• Different delay vectors for In_Del and Out_Del :
In _ Del  [i _ d1, i _ d2 ,...,i _ dn ]
Out _ Del  o _ d1 , o _ d 2 ,..., o _ d m 
• ERR
32
Graphical Analysis
The quality of the predictions can be also analyzed graphically, by enforcing a
tube around the real outputs, given by a function like the one below:
f (n)  A  q  n
where:
• A is an acceptable prediction error;
• q is an increasing factor;
• n is the number of predicted timesteps.
Then, the predicted output values should lay in the interval output_k(n) +/- f(n).
For example: f ( n )  300  0 . 05  n

33
Time Series
A. Glass manufacturing process prediction
International Competition on Artificial Intelligence EUNITE 2003
(European Network of Excellence on Intelligent Technologies for Smart
Adaptive System)
• The goal is to find a practical mathematical model that describes the

relationship between 16 input variables and 4 output variables that model a
process in glass manufacturing .
• The raw data consist of 16000 rows (timesteps) – one data row every 15
minutes during 6 months. 34
Premise
• For this industrial example it seems that the outputs have different meanings and
different time-delay behaviors. Consequently, the normal way is to split the initial
system in four subsystems. Good results were obtained when we started to work
under the following assumptions:
– In_Del = [0 1 3] and Out_Del = [0 1] for output 1;
– In_Del = [0 1 2 3 5 8 12 18 28] and Out_Del = [0 1 3] for output 2;
– In_Del = [0 1 2 3 4 6 8 12 20] and Out_Del = [0 1] for outputs 3 and 4;
– V = 7500 timesteps are enough data for each training / retraining phase;
– T = 500 timesteps represent the prediction horizon;
– Shift = 500 timesteps is the shifting time for the next retraining.
• The prediction horizon can be, for example, enlarged to 1500 timesteps. The number
of samples for the training (or retraining) interval can be modified according to the
experience accumulated.
35
Unexpected effect of error's decreasing
for Iterative Simulations (IS)
250
245
240
235
1
output
230
225
220
215
210
8500 8550 8600 8650 8700 8750 8800 8850 8900 8950 9000
f (n)  3  0.003 n 36
Extension of the forecasting
245
240
235
500 timesteps
1
output
230
225
220
1.6 1.605 1.61 1.615 1.62 1.625 1.63 1.635 1.64 1.645 1.65
4
x 10
255
250
245
240
1568 timesteps
1
output
235
230
225
220
1.6 1.62 1.64 1.66 1.68 1.7 1.72 1.74 1.76
4
x 10
What's the matter without
retraining?
38
185
180
175
4
output
170
165
160
155
8000 8050 8100 8150 8200 8250 8300 8350 8400 8450 8500
39
180
175
170
165
160
4
output
155
150
145
140
135
8500 8550 8600 8650 8700 8750 8800 8850 8900 8950 9000
40
180
170
160
4
output
150
140
130
120
9000 9050 9100 9150 9200 9250 9300 9350 9400 9450 9500
41
175
170
165
4
output
160
155
150
9500 9550 9600 9650 9700 9750 9800 9850 9900 9950 10000
42
185
180
175
170
4
output
165
160
155
150
1 1.005 1.01 1.015 1.02 1.025 1.03 1.035 1.04 1.045 1.05
4
x 10
43
175
170
165
160
4
output
155
150
145
140
1.4 1.405 1.41 1.415 1.42 1.425 1.43 1.435 1.44 1.445 1.45
4
x 10
44
Effect of PCA transformation matrix adaptation

Data forecasting of outputs 3 and 4 for a unified model
46
B. Adaptive Monitoring
Iulian Nastac and Paul Cristea
Politehnica University of Bucharest
• Goal: model the catalyst activity of a multi-tube reactor, used

to oxidize a gaseous feed.
• The primary raw data consist of 5807 rows (timesteps) – one

data row every one hour during 8 months.
• Other four sets of 720 rows (one month) each are successively
used for test and update the model also.
47
Inputs:
• Measured flow of air (kg/hr);
• Measured flow of combustible gas (kg/hr);
• Measured concentration of combustible component in
combustible gas feed in mass fraction;
• Total feed temperature;
• Cooling temperature;
• Seven temperatures (in Celsius) from different points of
reactor length;
• Product concentration of oxygen in mass fraction;
• Product concentration of combustible component in mass
fraction.
Output: catalyst activity of a multi-tube reactor

48
Premise
●V = 3407 timesteps are enough for the first training

phase and then for each retraining phase;
● T = 720 (or 800 for initially tests) timesteps

represent the prediction horizon;
● Shift = 720 (or 800 for initially tests) timesteps is

the shifting time for the next retraining.
49
Training and retraining phases
50
Characteristics of the models
Input delay vector Output Use Dimension of ANN
(In_Del) delay time as PCA trans. Structure
vector input matrix
(Out_Del)
Model 1 0123456 012 Yes 40  101 40:19:8:1
Model 2 01234568 012 Yes 40  115 40:17:3:1
Model 3 01234568 012 No 40  107 40:10:5:1
Model 4 0 1 2 3 4 5 6 8 12 012 Yes 40  129 40:18:8:1
Model 5 0 1 2 4 6 9 12 16 20 24 39 0124 Yes 50  158 50:9:8:1
51
Evolution of ERR during First Training,
Retraining 1 and Retraining 2
ERR
Model First training Retraining 1 Retraining 2
Model 1 16.371 2.3826 7.2881
Model 2 12.922 1.333 10.562
Model 3 17.879 3.8248 7.5423
Model 4 21.93 22.307 8.113
Model 5 14.768 2.656 8.1371

52
Evolution of ERR during test phases
Model ERRh
Test 1 Test 2 Test 3
(Retraining 3) (Retraining 4) (Retraining 5)
Model 1 53.408 19.726 24.108
Model 2 38.833 20.87 12.046
Model 3 46.409 20.49 24.805
Model 4 42.979 21.485 11.185

Model 5 44.369 28.099 12.829
53
Data forecasting of Model 1 for test intervals of first
training, retraining 1 and retraining 2
Iterative simulations of output
0.5
0.4
0.3
0.2
0.1
140 145 150 155 160 165 170 175 180
0.4
0.3
0.2
0.1
0
175 180 185 190 195 200 205 210

0.5
0.4
0.3
0.2
0.1
54
0
205 210 215 220 225 230 235 240 245
Error (ERR) of Model 3
Test 1 Test 2 Test 3

(Retraining 3) (Retraining 4) (Retraining 5)
Naive
Model 64.2225 33.7336 40.6924
ARMA
Model 50.915 20.483 41.591
Neural
Adaptive 46.409 20.49 24.805
Model
55
Graphical results of ARMA and Neural-
Adaptive models
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
230 240 250 260 270 280 290 300 310 320 330 340
real data
naive simulated output
ARMA simulated output
56
neural-adaptive simulated output (Model 3)
C. Smart predictive model for air
pollutants
• We were particularly focused on NO2, because its concentration in the

air appears to play a negative role in human health.
• The model uses 21 inputs that include different kinds of emissions,
particles and meteorological parameters.
• These parameters were collected every hour during the time period
starting from January 2007 to December 2010.
• For the training purpose, we used V=8808 lines of data that were
randomly split in the training set (approx. 85%) and validation set
(approx. 15%).
57
Premise
• For this example we started to work under the following assumptions:
– In_Del = [3 4 5 6 7 8 9 ] and Out_Del = [2];

– V = 8808 timesteps are enough data for each training / retraining
phase since there are data for a bit more than one year;
– T = 720 timesteps represent the prediction horizon of one month
(2430);
– Shift = 720 timesteps is the shifting time for the next retraining.
• The prediction horizon can be, for example, enlarged to 1500

timesteps. The number of samples for the training (or retraining)
interval can be modified according to the experience accumulated.
58
ERR trend of test sets

Graphical Analysis
f (n)  A  q  n
where:
For example:
f ( n )  15  0 . 005  n
60
Data forecasting for test interval of retraining 8
200
150
100
50
0
1.45 1.46 1.47 1.48 1.49 1.5 1.51 1.52 1.53
4
x 10
Always real inputs

200
150
100
50
0
1.45 1.46 1.47 1.48 1.49 1.5 1.51 1.52 1.53
4
I. Nastac, April 2016 x 1061
DNA Sequence Forecasting
Iulian Nastac and Paul Cristea
Biomedical Engineering Centre, UPB
Training Process
y(t 1)  F ( y(t  Out _ Del( j))

The model tries to match the desired value of the output, by properly
adjusting the structure in function of the previous values in the DNA
sequence. 62
Premise
• We worked under the following assumptions:
- V = 5000 is the interval employed for training/validation purpose;
- T = 500 steps represent the prediction horizon (test set);
- Shift = 500 is the shifting interval for the next retraining.
• The prediction horizon can be easily enlarged to 5000 steps.
• The predicted output values should lay in the interval

output(n) +/- f(n), where
f ( n )  1  0 . 05  n
Note: We use the cumulated phases of prokaryote and eukaryote DNA
sequences. 63
Prokaryote DNA sequences
forecasting
Evolution of test error (ERR) for Escherichia coli DNA forecasting
Training/ Test Error Test Error

Training No. Test interval
retraining interval (IS) (ARI)
First training 1 - 5000 5001 - 5500 13.051 2.3181
Retraining 1 501 - 5500 5501 - 6000 22.842 0.5292
Retraining 2 1001 - 6000 6001 - 6500 24.528 0.8818
Retraining 3 1501 - 6500 6501 - 7000 3.9406 0.4988
Retraining 4 2001 - 7000 7001 - 7500 2.463 0.2971
… … … … …
Retraining 85 42501-47500 47501-48000 0.1772 0.0936
Retraining 86 43001-48000 48001-48500 0.6931 0.0459
Retraining 87 43501-48500 48501-49000 0.1590 0.0480
64
Note: Out_Del=[1 3 5 7 10 14 18 23 28 32 39 47 55 65 80 100 140]
65
when test set is enlarged to 5000 steps
66
Note: ERRARI = 1.4388
when T = 5000

Matrix of the initial PCA block
(the most important eigenvectors)
68
Matrix of the PCA block
(initial and after 87 retraining steps )
69
(Out_Del=[1 2 3 4 5 … 35] )
-0.168
0.2
1
2
e
e
in
in
-0.169 0
L
L
-0.2
-0.17
0 10 20 30 0 10 20 30
0.2 0.2
3
4
e
e
in
in
0 0
L
L
-0.2 -0.2
0 10 20 30 0 10 20 30
0.2 0.2
5
6
e
e
in
in
0 0
L
L
-0.2 -0.2
0 10 20 30 0 10 20 30
0.2 0.2
7
8
e
e
in
in
0 0
L
-0.2 -0.2
0 10 20 30 0 10 20 30
70
71
Eukaryote DNA sequences
forecasting
Evolution of test error (ERR) for human DNA sequence forecasting
(chromosome 22)
Training/ Test interval Test Error Test Error
retraining interval (IS) (ARI)
First training 1 – 5000 5001–5500 27.331 0.5768

Retraining 1 501–5500 5501–6000 3.5955 1.1674
Retraining 2 1001–6000 6001–6500 5.5203 1.2491
Retraining 3 1501–6500 6501–7000 2.3656 0.4446
Retraining 4 2001-7000 7001–7500 6.9733 0.8124
.... .... .... .... ....
Retraining 74 37001-42000 42001-47000 0.6712 0.0429
Retraining 75 37501-42500 42501-47500 0.0847 0.0223
Retraining 76 38001-43000 43001-48000 0.2895 0.0244
72
Note: Out_Del = [1, 2, 3, ...., 35]
(Homo Sapiens case)
73
Note: T = 500 steps and L = 76 successive retraining phases
(Homo Sapiens case)
74
Note: T = 5000 steps, ERRARI = 0.44718
75
An interesting result
• When training the PCA block on a set of sequences satisfying certain
mild statistical regularities, the rows of the resulting matrix are
related to the Digital Fourier Transform operator.
• The resemblance with the elements of the Fourier Transformation

operator is obvious when the delay vector consists of successive
values.
• The evolution of this matrix seems to be very little affected by the

changes of the shifted data sets during the retraining steps.
• The first line of the PCA matrix appears to have a slight convex
curvature that increases if the genomic signal has a general upward
trend (on Escherichia coli sequence), and is going to be more relaxed
when genomic signal has an increased trend (on human chromosome
22). 76
One time series that is not
related with DNA
-0.168
0.2
1
2
Line
Line
-0.169 0
-0.2
-0.17
0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35
0.2 0.2
3
4
Line
Line
0 0
-0.2 -0.2
0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35
0.2 0.2
5
6
Line
Line
0 0
-0.2 -0.2
0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35
0.2 0.2
7
8
Line
Line
0 0
-0.2 -0.2
-0.4 -0.4
0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35
77
Note: Out_Del = [1, 2, 3, ...., 35]
… when we involved
many time series
0.05 0.1
Line 1
Line 2
0 0
-0.05 -0.1
0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500
0.1 0.1
Line 3
Line 4
0 0
-0.1 -0.1
0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500
0.1 0.1
Line 5
Line 6
0 0
-0.1 -0.1
0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500
0.1 0.1
0
Line 7
Line 8
0
-0.1
-0.1
0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500

Note: Out_Del = [1, 2, 3, ...., 35]
E. Stock Market Forecasting
• The HEX Forest Industry Index is an important index in Finland
economy (HEX - Helsinki Stock Exchange) that illustrates the average
global trend in forest industry.
• The goal of was to find a practical mathematical model that describes

the relationship between 8 input variables (indices of four biggest
Finnish company in forest/ pulp & paper industry; EUR / US
Dollar exchange rate; Gold price; NYSE composite index
considered to have a noticeable impact; and month indicator), and
one output variable that model a forecasting process of this HEX
Forest Industry Index.
• The raw data consist of 2960 rows (timesteps) – one data row every
working day during 11 years (February 1993 – June 2004).
79
Premise
- T = 20 steps represent the prediction horizon (test set);
• The prediction horizon can be easily enlarged to 50 or 100

working days.
• Model:
In_Del = [5 6 7 8 9 10 12 16]
Out_Del = [0 1 2 4]
80
Evolution of test error (ERR)
81
Data forecasting for test interval of
retraining 20
4100
4000
HEX forest index
3900
3800
3700
3600
2940 2942 2944 2946 2948 2950 2952 2954 2956 2958 2960
Always real inputs

4100
4000
HEX forest index
3900
3800
3700
3600
2940 2942 2944 2946 2948 2950 2952 2954 2956 2958 2960

f ( n )  100  0 . 05  n
F. Exchange Rate Forecasting
Iulian Nastac and Emilian Dobrescu
UPB and Romanian Academy
• The particular system, which resulted by using our approaches,

describes the relationship between over 30 variables and one
output variable that model the EURO-ROL exchange rate.
• The raw data consist of over 2500 rows (time steps) – one data
row every day during 8 years (2000 – 2008).
83
Database indicators
Indicators Symbol Frequency
I. Statistical information
A. General information
1 1. Real Gross Domestic Product growth GDP Quarterly
2 2. Current Account deficit CA Monthly
3 3. Consolidated general budget deficit as percentage on GDP CGD Quarterly
4 4. Net foreign direct investment FDI Monthly
5 5. Medium and long term external dept ExD Monthly
6 6. NBR Foreign exchange reserve ER Monthly
7 7. Export of good and services X Monthly
8 8. Import of good and services M Monthly
9 9. Net monthly average wage on the economy Nw Monthly
84
B. Specifics information
10 1. Exchange rate Dollar/ROL E$ Daily
11 2. Exchange rate EUR/ROL Eeur Daily
12 3. Consumer goods index CPIR Monthly
13 4. Monetary base M0 M0 Monthly
14 5. Reference rate of BNR rd Monthly
15 6. Speed between lending and deposit average interest rata of Δr Monthly

banks for non-government, non-banks clients
16 7. Total domestic credit DC Monthly
17 8. Portfolio investment, sold PI Monthly
18 9. Current transfers and incomes CTI Monthly
19 10. Turnover T Monthly
20 11. BET Index BET Daily
C. External Information
21 1. Ratio EUR/Dollar Ra Daily
22 2. Exchange rate Euro/ROL ReurEU Daily
23 3. Refinancing ECB interest rate Recb Monthly
24 4. Brent oil price op Monthly
25 5. HIPC (EU 27) HIPC Monthly
85
II: Prospective information
A. General information
26 1. Real GDP growth GDPf Annual
27 2. Export of good and services, FOB, growth rate Xf Annual
28 3. Import of good and services, FOB, growth rate Mf Annual
29 4. Commercial trade deficit, mill Euro Ctf Annual
30 5. Growth of consumer price, annual average CPIf Annual
31 6. Growth of consumer price, December/December CPIdf Annual
B. Specific forecasting information

32 1. Inflation target ITf Annual
33. 2. Future exchange rate Dollar/ROL, 1 month Fe$ Daily
34 3. Future exchange rate Euro/ROL 1 month FeEur Daily
35 4. Ratio EUR/Dollar, 1 month Fra Daily

86
Input variables:
• 32 (or 35) statistical and prospective financial
variables;
• Month indicator L (the days of January are noted
by 1, the days of February by 2 and so on).
Output variable:
 Daily exchange rate EUR/ROL
87
Premise
- T = 1 (or 3, 7, 15, 30) days represent the prediction horizon (test
set);
• The prediction horizon can be easily enlarged to 60 steps or even more.
• Models:
Case I: In_Del = [1 2 3 4 5 6 8 12] and
Out_Del = [0 1 2 4].
………………….
Case VII: In_Del = [7 8 9 10 11 13 16] and

Out_Del = [0 1 2 4].
88
• Scale Conjugate Gradient (SCG) is the basic training algorithm.
Evolution of test error (ERR) for
Iterative Simulations
89
Evolution of test error (ERR) for Iterative
Simulations (IS) and Always Real Inputs
(ARI) when T=30

ERR trend (Case I – 33 inputs) of test
sets for the first training and L = 40
successive retraining phases
Iterative Simulation Always real inputs
4.5 4.5
4 4
3.5 3.5
3 3
2.5 2.5
ERR
ERR
2 2
1.5 1.5
1 1
0.5 0.5
0 0
0 10 20 30 40 0 10 20 30 40
91
ERR trend (Case I – 36 inputs) of test
sets for the first training and L = 40
successive retraining phases
7 7
6 6
5 5
4 4
ERR
ERR
3 3
2 2
1 1
0 0
0 10 20 30 I. Nastac,
40 April02016 10 20 30 40 92
Remember
Graphical Analysis
f (n)  A  q  n
where:
For example: f ( n )  300  0 . 05  n

93
Data forecasting for the test interval
of retraining 29 (Case I – 33 inputs)
4
x 10 Iterative simulations of output
3.7
3.6
Year 2006
3.5
3.4
35 40 45 50 55 60 65 70
4
x 10 Always real inputs
3.7
3.6
Year 2006
3.5
3.4
35 40 45 50 April55
I. Nastac, 2016 60 65 70 94
Data forecasting for the test interval of
retraining 19 (Case I – 33 inputs)
4
3.7
3.6
Year 2006
3.5
3.4
3.3
25 30 35 40 45 50 55 60
4
3.7
3.6
Year 2006
3.5
3.4
3.3
25 30 35 40 April45
I. Nastac, 2016 50 55 60 95
Data forecasting for the test interval of
retraining 14 (Case I – 33 inputs)
4
3.7
3.6
Year 2006
3.5
3.4
20 25 30 35 40 45 50 55
4
3.7
3.6
Year 2006
3.5
3.4
20 25 30 35 40 45 50 55
A new approach
• continuous sets of 3364 data (January
2000–December 2012)
• propose an intermediate form of predictions
between IS and ARI approaches
• a set of six predictors has been used to
provide the evolution of the exchange rate
in a forecasting horizon of the next 6
working days (i.e., more than a week)
97
Notes:
• The first prediction is similar, both in IS and
ARI approaches (then, the IS progressively
uses the past computed values of the
model).
• Intuitively, we focus on the fact that each of
the predictors is a ‘specialist’ for a certain
forecasting window, which implies a
specified number of time steps.
98
A collection of predictors that forecast a
number of z successive steps
99
Observation
• The predicted time horizon will always exceed the interval of
one week that includes 2 days of the weekend (Saturday and
Sunday).
• As an exercise of imagination, it can be easily noted that,
given the system parameters (incoming and outgoing data)
available in Friday, the exchange rate for the next week plus
Monday (on the following week) can be predicted. Therefore,
in this situation, we can count a period of 10 days on the
calendar.
• Moreover, if there are other days off on the predicted apparent
projection, then the forecasting calendar automatically spans a
longer period, although, essentially, only the following six
successive values are estimated.
100
The configuration of delay vectors
Predictive Input vector Output vector

system (In_Del) (Out_Del)
Predictor I [1 2 3 4] [0 1 2]
Predictor II [2 3 4 5] [1 2 3]
Predictor III [3 4 5 6] [2 3 4]
Predictor IV [4 5 6 7] [3 4 5]
Predictor V [5 6 7 8} [4 5 6]
Predictor VI [6 7 8 9] [5 6 7]
101
Training set
• The length of the training set includes 1500 lines
of data that represent a period that exceeds 5
years.
• This dimension of data volume, which was
initially used to find an adequate neural network
for each predictive system, was then successively
employed with shifted data during many retraining
phases.
• These retrainings were performed sequentially at
intervals of one working day.
102
Evolution of the error ERR during 1844 retraining
phases (4 October 2005 – 7 December 2012)
Cumulative Simulation
3.5
2.5
2
ERR
1.5
0.5
0
0 200 400 600 800 1000 1200 1400 1600 1800
The mean values of ERR for
different intervals
mERR whole interval 0.56369
mERR for the second half 0.43586

mERR for last quarter 0.40761
mERR for the last 1/8 of the interval 0.40423
104
Data forecasting of the test interval after
first training (ERR=0.91485)
- the confidence interval is 1% -
3.65
3.6
Exchange rate
3.55
3.5
1500 1501 1502 I. Nastac, April
1503 2016 1504 1505 105
1506
From 2005 / 10 / 4 to 2005 / 10 / 12
third retraining (ERR=0.26174)
3.68
3.66
3.64
Exchange rate
3.62
3.6
3.58
3.56
3.54
1503 1504 1505 1506 1507 1508 1509
From 2005 / 10 / 7 to 2005 / 10 / 17
106
retraining 168 (ERR= 0.14425)
3.6
3.58
3.56
Exchange rate
3.54
3.52
3.5
3.48
1668 1669 1670 1671 1672 1673 1674
From 2006 / 5 / 29 to 2006 / 6 / 6
107
3.74
3.72
3.7
Exchange rate
3.68
3.66
3.64
3.62
2196 2197 2198 2199 2200 2201 2202
From 2008 / 6 / 17 to 2008 / 6 / 25
108
3.95
3.9
3.85
Exchange rate
3.8
3.75
3.7
3.65
3.6
3.55
2270 2271 2272 2273 2274 2275 2276
From 2008 / 9 / 29 to 2008 / 10 / 7
109
3.95
3.9
3.85
Exchange rate
3.8
3.75
3.7
3.65
3.6
2273 2274 2275 2276 2277 2278 2279
From 2008 / 10 / 2 to 2008 / 10 / 10
110
3.95
3.9
Exchange rate
3.85
3.8
3.75
3.7
2274 2275 2276 2277 2278 2279 2280
From 2008 / 10 / 3 to 2008 / 10 / 13
111
A part of the ERR evolution
(between retrainings 1654 and 1844
i.e. 3 March 2012 - 7 December 2012)
112
4.6
4.58
4.56
Exchange rate
4.54
4.52
4.5
4.48
4.46
3343 3344 3345 3346 3347 3348 3349
From 2012 / 11 / 28 to 2012 / 12 / 6
113
H. An advanced model for electric
load forecasting
• I. Năstac, B. Păvăloiu, R. Tuduce si P. Cristea - ADAPTIVE
RETRAINING ALGORITHM WITH SHAKEN
INITIALIZATION Revue roumaine des sciences techniques -
Série Électrotechnique et Énergétique 2013 Vol. 58, Nr.1, 2013, pp.
101-111
• Use a shaking parameter (Qshake) of the

initial weights during retrainings.
• There will be a double loop that will control
this process.
114
• In our previous approaches, the reference network weights
were reduced with a scaling factor γ (0<γ<1).
• Usually, we applied successively this technique for nine

discrete values of γ (γ = 0.1, 0.2, …, 0.9), keeping the ANN
weight distribution that achieved the minimum error as the
reference network.
• Now, we repeated this step several times (by using the

parameter Nrep), and enhanced this approach by using a
shaking parameter (Qshake) of the initial weights:
for i = 1: Nrep
for j = 0.1:0.1:0.9
Netnew_weights = Netprevious_weights ·(j + Qshake·(rand(1)-0.5));
…. 115
• Furthermore, the shaking of weight positions could be
gradually made, from no shaking at i=1 to the maximum
shaking for i= Nrep, as following:
for i = 1: Nrep
for j = 0.1:0.1:0.9
Netnew_weights = Netprevious_weights ·(j + Qshake·(i-1)·(rand(1)-0.5));
….
116
Enhanced model
117
We can further take in consideration other similar mean values calculated from two
years ago and so on… 118
Model
Delay vectors
In _ Del  [i _ d1 , i _ d 2 ,..., i _ d n ]
Out _ Del  o _ d1 , o _ d 2 ,..., o _ d m 
Output
y (t+1)=F ( X (t+1− In Del ), y (t − Out Del ) ,mean( past y ))
ANN
• Two hidden layers
• Number of training samples / Number of the weights  5
• Preprocessing
• Replacement of missing values,
• Peak-shaving of outliers,
• Normalization. 119
Experimental results
• Over 30,000 records (time steps) – taken every hour during 4 years
(from January 2008 till December 2011)
• Each record contains the supply of electricity (coal, hydro, oil, nuclear,
wind, import) together with the time marks (hour, month calendar, year)
In_Del = [1, 2, 3, 4, 6]
Out_Del = [0, 1, 2]
- Training set involves 8800 data lines

- Seasonal interval of time from previous year = 480 hours (20 days)
120
ANN architecture
• After the iterative process of searching, we obtained an

ANN with:
– 23 neurons on first hidden layer
– 12 neurons on the second hidden layer
• The PCA matrix has 54 lines and 30 columns ( there is a

dimensional reduction from 54 dimension to 30, by
preserving more than 99% from the initial information)
• forecasting time horizon is established to 120 hours (five

days)
121
Cases
• Case I : Shift  1 hour
• Case II : Shift  24 hours
• Case III : Shift  120 hours (five days)
122
Case I
14 14
12 12
10 10
8 8
ERR
ERR
6 6
4 4
2 2
0 0
0 10 20 30 40 50 0 10 20 30 40 50
123
retrainings numbers retrainings numbers
The mean values of ERR for different
intervals (Case I)
ERRIS ERRARI
mERR whole interval 1.5960 1.4334

mERR for the second half 1.1431 0.9845
mERR for last quarter 1.1229 0.9861
mERR for the last 1/8 1.1282 0.9835
124
Forecasted data after retraining 10
(Case I)
11000
predicted values
10000
Model output (MW)
real data
upper confidence interval limit
9000
lower confidence interval limit
8000
7000
6000
5000
1.762 1.763 1.764 1.765 1.766 1.767 1.768 1.769 1.77 1.771 1.772 1.773
Time instances from 2010/1/29 10 AM to 2010/2/3 10 AM 4
x 10
Always real inputs

10000 predicted values
Model output (MW)
real data
9000 upper confidence interval limit
lower confidence interval limit
8000
7000
6000
1.762 1.763 1.764 1.765 1.766 1.767 1.768 1.769 1.77 1.771 1.772 1.773
Time instances from 2010/1/29 10 AM to 2010/2/3 10 AM 4 125
x 10
Case II
16 16
14 14
12 12
10 10
ERR
ERR
8 8
6 6
4 4
2 2
0 0
0 10 20 30 40 50 0 10 20 30 40 50
retrainings numbers retrainings numbers 126
The mean values of ERR for different
intervals (Case II)
ERRIS ERRIS
127
Case III
12 12
10 10
8 8
ERR
ERR
6 6
4 4
2 2
0 0
0 10 20 30 40 50 0 10 20 30 40 50
Case I (without supplementary input)
14 14
12 12
10 10
8 8
ERR
ERR
6 6
4 4
2 2
0 0
0 10 20 30 40 50 0 10 20 30 40 50
The mean values of ERR for the model with and
without supplementary input
(IS approach for Case I)
ERRIS ERRIS
(model with (model without
supplementary supplementary
input) input)
130
New goal: Forecast the energy consumption
in advance with six hours
by using six predictors in cascade
131
Remember
A collection of predictors that forecast a
number of z successive steps
132
The characteristics of predictive
systems
First Second
Predictive Input vector Output vector
hidden hidden
system layer layer (In_Del) (Out_Del)
Predictor I 25 5 [1 2 3 4 6 9] [0 1 2]
Predictor II 21 9 [2 3 4 5 7 10] [1 2 3]
Predictor III 24 9 [3 4 5 6 8 11] [2 3 4]
Predictor IV 24 6 [4 5 6 7 9 12] [3 4 5]
Predictor V 23 15 [5 6 7 8 10 13] [4 5 6]
Predictor VI 25 13 [6 7 8 9 11 14] [5 6 7]
133
The training
• Restart five times thelast loop for each ANN architecture
• Select the configuration that gives the minimum error and satisfies
the condition
Eval  threshold  Etr
• The ANN is used to predict the first time horizon.

• The architecture remains unchanged and the ANN is successively
retrained with one hour shifted database.
• Each retraining phase uses nine values of the scaling factor ( = 0.1
÷ 0.9) to recreate the initial values of the weights for the learning
process, and is repeated five times before predicting the
corresponding time horizon.
• A complete retraining phase lasts about 20 minutes.
• L = 40 successive retraining phase.
134
The mean values of ERR for successive
intervals
ERR
mERR whole interval 3.9997
mERR for the second half 3.1533
mERR for last quarter 2.6895
mERR for the last 1/8 1.5968
135
Evolution of Test Error (ERR)
Cumulative Simulation
9
5
ERR
0
0 5 10 15 20 25 30 35 40
136
the first training (ERR = 4.85096)
Simulations of output
8000
7500
Power (MW)
7000
6500
6000
5500
2.5469 2.547 2.5471 2.5472 2.5473 2.5474 2.5475
Time instances from 2010/12/31 11 PM to 2011/1/1 5 AM 4
x 10
137
the retraining 24 (ERR = 1.44723)
7000
6800
6600
6400
Power (MW)
6200
6000
5800
5600
5400
2.5493 2.5494 2.5495 2.5496 2.5497 2.5498 2.5499
Time instances from 2011/1/1 11 PM to 2011/1/2 5 AM 4
x 10 138
the retraining 40 (ERR = 0.67518)
8000
7500
Power (MW)
7000
6500
6000
5500
2.5509 2.551 2.5511 2.5512 2.5513 2.5514 2.5515
Time instances from 2011/1/2 3 PM to 2011/1/2 9 PM 4
x 10
139
Data forecasting when using Iterative
Simulations (IS)
8000
7500
7000
Power (MW)
6500
6000
5500
5000
4500
1.468 1.47 1.472 1.474 1.476 1.478 1.48 1.482 1.484 1.486
Time instances from 2009/9/29 3 AM to 2009/10/6 4 AM x 10
4
140
Future work
• The research are conducted to add other
supplementary inputs and analyze if this
extended approach is feasible.
• A short-term load forecasting system could

be vital for the efficient operation of power
systems at the national level.
141
Open Issues
• Searching for optimum configurations of In_Del and Out_Del to reduce the
number of inputs in the recurrent relation:
Y (t  1)  F ( X (t  1  In _ Del (i )), Y (t  Out _ Del ( j )))
• Managing the length of the data set.

• Preprocessing the data and outliers’ removing.
• Selecting the proper shifting interval.
• Computing the PCA, which depends on:
- the number of initial inputs and outputs;
- the number of the elements of In_Del and Out_Del;
- the number of training pairs;
- software/hardware limitations.
142
• Finding the best architecture is time consuming.
A predictive system that can be
implemented in a real application
143
Classifications
A. Predictive Performance of ANN Classifiers
Iulian Nastac and Adrian Costea
(TUCS Technical Report 679/2005)
• The influence of data distribution, preprocessing

methods and training mechanisms
• A comparison between retraining technique and

genetic algorithms
144
B. A Cardio Vascular Predictor with
ANN-Classification-Based System
Oana Sandu1¹, Iulian Nastac², Jaime Uribarri¹
¹Mount Sinai School of Medicine,NY,US
²Polytechnic University of Bucharest, Romania

Background and data
• Functional and structural vascular abnormalities are commonly
observed in patients with renal disease.
• Case study in chronic kidney disease (CKD) and end stage renal
disease (ESRD) patients: the structural arterial disease by measuring
the common carotid artery intima-media thickness (CIMT), internal
carotid (ICA) plaque and stenosis and the endothelial vascular
function by assessing the brachial arterial flow mediated dilatation
(FMD) and hemodialysis treatment (HD) effect on it.
• Evaluate the prediction power of cardiovascular risk factors for yearly
end point events (cardiovascular events and death).
• A total of 93 subjects: 67 patients with renal disease: 24 CKD and 43
ESRD [23 peritoneal dialysis (PD) and 20 hemodialysis (HD)] and 26
healthy age-matched controls underwent carotid and brachial artery
Doppler examinations.
• Measurements have been performed with a Biosound Doppler
machine and IMT software. 146
Data
• Inputs : age, diabetes, gender, race, bilateral
CIMT, IMT plaque/stenosis, Hct, diagnosis (renal
stage CKD, ESRD - PD, ESRD - HD, or control),
systolic and diastolic blood pressure, body weight,
brachial artery diameters, FMD (measured before
and as % differences after HD in the case of HD
group)
• Outputs:
– Yearly CV end point events (class1),
– Previously events (class2)
– No observed cardiovascular events (class3)
147
Experiment The The total The distribution of ANN Success rate
number number samples architecture on test set
of of P
inputs samples Class 1 Class 2 Class 3
Experiment 1 24 87 10 5 72 21 : 11 : 4 : 3 0.78571
Experiment 2 24 93 10 5 78 21 : 14 : 4 : 3 0.8125
Experiment 3 22 87 10 5 72 20 : 14 : 3 : 3 0.57143
(without
ICAstenosis/plaque
and IMT)
Experiment 4 22 93 10 5 78 20 : 12 : 4 : 3 0.6
(without
ICAstenosis/plaque
and IMT)
148
Outcome of the application
• CIMT and internal carotid plaque/stenosis greatly
enhance prediction of yearly cardiovascular end
point events in patients with renal disease
• The data base used can be enlarged, enabling an
enhanced model to capture increased patterns that
separate the classes of markers
• The model can serve as predictor/classifier for
cardiovascular outcome and guideline the
initiation of treatment aimed at reducing
cardiovascular risks in patients with renal disease.
149
C. Image identification
Iulian Nastac and David Stanescu
Polytechnic University of Bucharest, Romania
• The goal of the research was to find a tool

able to recognize a person from a total of 40
individuals.
• The tool is a modified SOM (Self

Organized Map), trained in an original
manner.
150
Training set
151
152
Test set
153
Final map
155
The accuracy on test set is 171 out of 200
(85.5 %)
156
(87 %)
157
(88 %)
158
Outcome of the application
• Preliminary results from ongoing research

were presented.
• SOM model was modified in order to
include a supervised training.
• The pictures where used only to prove the
robustness of the classification algorithm.
• Current research targets the implementation
of this algorithm in an bio-medical
application. 159
Overall Conclusions
• The adaptive retraining technique can gradually improve, on
average, the achieved results.
• The first training always takes a relatively long time, but then the
system can be very easily retrained, since there are no changes in the
structure.
• The advantage of retraining procedure is that some relevant aspects

are preserved (“remembered”) not only from the immediate previous
training phase, but also from the previous but one phase, and so on.
• The old information accumulated during the older trainings will be

slowly forgotten and the learning process will be adapted to the
newest evolutions of the process.
160
Conclusions (cont.)
• Improve the performance of training with validation
stop.
• Reduce the number of inputs with PCA.
• It is very easy to change in our tool the SCG algorithm

with another one because at the basic level the
architecture and the retraining procedure are
independent of the training algorithm.
161
Final remarks
• There is a great potential for further
development in data mining.
• The tool/method can be easily

adapted for other kinds of
predictions or classification
processes.
162
Selected References
• Dobrescu, E., Nastac, D.I., and Pelinescu, E.: ”Short-term financial forecasting using ANN
adaptive predictors in cascade”, International Journal of Process Management and
Benchmarking, Vol. 4, No. 4, 2014, pp. 376-405.
• D.I. Nastac, A.P. Ulmeanu, R. Tuduce, and P.D Cristea.: A joint of adaptive predictors for
electric load forecasting, in Proceedings of IWSSIP 2013, Bucharest, Romania, July 7-9,
2013, pp. 51-54
• D.I. Nastac, I.B Pavaloiu, R. Tuduce, and Cristea P.D.: Adaptive retraining algorithm with
shaken initialization, Revue roumaine des sciences techniques – Série Électrotechnique et
Énergétique, Vol. 58, Nr.1, 2013, pp. 101-111
• I. Nastac: “An adaptive forecasting intelligent model for nonstationary time series”, Journal
of Applied Operational Research, Vol. 2, 2010, No. 2, pp. 117–129
• I. Nastac, A. Bacivarov, and A. Costea: “A Neuro-Classification Model for Socio-Technical
Systems”, Romanian Journal of Economic Forecasting, Vol. XI, No. 3/ 2009, pp. 100-109.
• P. Cristea, R. Tuduce, I. Nastac, J. Cornelis, R. Deklerck, and M. Andrei.: “Signal
Representation and Processing of Nucleotide Sequences”, International Journal of Functional
Informatics and Personalised Medicine (IJFIPM), Vol. 1, No. 3, 2008, pp. 253-268.
• I. Nastac, E. Dobrescu, and E. Pelinescu, “Neuro-Adaptive Model for Financial Forecasting”,
Romanian Journal of Economic Forecasting, Vol. 4, No. 3, 2007.
• A. Costea, and I. Nastac, “Assessing the Predictive Performance of ANN Classifiers Based on
Different Data Preprocessing Methods”, Internat. Journal of Intelligent Sys. Acc. Fin. Mgmt.
vol. 13, issue 4, December 2006, pp. 217-250.
• I. Nastac, F.Y. Collan, B. Back, M. Collan, T. Kuopio and P. Jalva: “Breast cancer prediction
using a neural network model”, World Automation Congress (WAC), Seville, June 2004.
163

Adapt Data Forec-2016 P3 Lappeenranta v3

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Adapt Data Forec-2016 P3 Lappeenranta v3

Uploaded by

Copyright:

Available Formats

Data Forecasting using Artificial

Lappeenranta, August 2016

• ANNs are usually presented as systems of

I. Nastac, August 2016 2

• The connections allow the neurons to signal each other

• Clustering and classification

• Analyze the database

• Reduction of the first network weights with a scaling

• Retraining the network with the new initial weights

• Compare the training cycle number for each case

Retraining procedure through BKP method

• Sometimes, for the values of  higher than 0.6, we

• An increase of the training cycles number for   0.7 in

• Generalization of the conclusion regarding retraining

• Neural classification system for 2D- and 3D-image

I. Nastac, April 2016 14

I. Nastac, April 2016 15

• Sometimes the test set can come from a

• How to deal with non-stationary systems?

I. Nastac, April 2016 18

• Reduction of the first network weights with a scaling

• Retraining the network with the new initial weights

• Compare the validation error in both cases

I. Nastac, April 2016 20

• Chose the best model with respect to the smallest error

● spatial sequences (chains of objects, features, DNA, etc.)

Tool: Artificial Intelligence Model

• The data vary dynamically, and large space-delays might occur.

100 T ORkp  OFkp

• Nonstationary implies that the distribution of the sequences may

• Use the adaptive retraining mechanism to take this characteristic into

• The transformation matrix is made up of the most

• The eigenvectors are orthonormal (orthogonal and

Using an iterative process, a forecast can be extended as

• The Always Real Inputs (ARI) approach

I. Nastac, April 2016 31

II. Spatial Sequences: DNA Sequence Forecasting

• Scale Conjugate Gradient (SCG) as basic training algorithm

For example: f ( n )  300  0 . 05  n

• The goal is to find a practical mathematical model that describes the

I. Nastac, April 2016 45

• Goal: model the catalyst activity of a multi-tube reactor, used

• The primary raw data consist of 5807 rows (timesteps) – one

Output: catalyst activity of a multi-tube reactor

●V = 3407 timesteps are enough for the first training

● T = 720 (or 800 for initially tests) timesteps

● Shift = 720 (or 800 for initially tests) timesteps is

Model 2 01234568 012 Yes 40  115 40:17:3:1

Model 3 01234568 012 No 40  107 40:10:5:1

Model 4 0 1 2 3 4 5 6 8 12 012 Yes 40  129 40:18:8:1

Model 5 0 1 2 4 6 9 12 16 20 24 39 0124 Yes 50  158 50:9:8:1

Model 1 16.371 2.3826 7.2881

Model 2 12.922 1.333 10.562

Model 3 17.879 3.8248 7.5423

Model 4 21.93 22.307 8.113

Model 5 14.768 2.656 8.1371

Model 2 38.833 20.87 12.046

Model 3 46.409 20.49 24.805

Model 4 42.979 21.485 11.185

Iterative simulations of output

Test 1 Test 2 Test 3

• We were particularly focused on NO2, because its concentration in the

– In_Del = [3 4 5 6 7 8 9 ] and Out_Del = [2];

• The prediction horizon can be, for example, enlarged to 1500