You are on page 1of 163

Data Forecasting using Artificial

Intelligence
Course 3

Iulian Nastac
Polytechnic University of Bucharest
nastac@ieee.org

Lappeenranta, August 2016


Recap from previous course
Artificial Neural Networks
• computational models inspired by animal
nervous systems (especially the brain) that
are capable of learning and pattern
recognition

• ANNs are usually presented as systems of


interconnected "neurons" that can deal
(compute) with large volume of information

I. Nastac, August 2016 2


Recap:
Classic structure of a first order neuron

x1 bi
.
.
wi1
.
.
wij ni
xj   yi
.
.
.
.
.
win
xn
3
Recap: Solving Problems with
Neural Networks
• As a user of neural networks you must know
what problems are adaptable to neural
networks.
• You must also be aware of what problems are
not particularly well suited to neural
networks.
• Like for most computer technologies and
techniques, often the real important thing is
when to use the technology and when not to. 4
Recap: Training Neural Networks
Connectionist approach: Individual neurons in a NN
are interconnected through synapses.

• The connections allow the neurons to signal each other


as information is processed.
• Each connection is assigned a connection weight.
• If there is no connection between two neurons, then
their connection weight is zero.
• The weights determine the output of the neural network
of a given input.
• The connection weights form the memory of the neural
network. 5
Artificial Neural Network as a
Black Box

6
Recap: Problems Suited to
Artificial Neural Network

• Function identification

• Clustering and classification

• Data forecasting
7
How to design an ANN
application

• Analyze the database


• Find the proper ANN architecture
• Use an efficient training algorithm
• Test it on specific conditions
8
General Problem

• computing speed

• parallelism

• reliability

• programming
I. Nastac, April 2016 9
• generalization
THE RETRAINING
PROCEDURE OF AN ANN
(first version)
• Training an Artificial Neural Network in standard way
with validation stop

• Reduction of the first network weights with a scaling


factor  (0 < < 1). Usually,  = 0.1, 0.2, …, 0.9

• Retraining the network with the new initial weights

• Compare the training cycle number for each case


10
3000

2500
Training Cycles Number

2000

1500

1000

500

0
0,0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0

Retraining procedure through BKP method


(different function g) 11
Notes:
• Regardless of the training procedures, we get that
varying the scaling value, the number of retraining
cycles starts from a higher value than the reference
cycles number (marked with a horizontal line), and
then it progressively decreases in accordance with
the increases of , much below the reference value,
especially for   0.3

• Sometimes, for the values of  higher than 0.6, we


have significant jumps of the learning cycles
number that are associated with the network
paralysis or over-learning phenomenon
12
Outcomes:

• opt[0.4, 0.6]

• An increase of the training cycles number for   0.7 in


35% of the analyzed cases

• Generalization of the conclusion regarding retraining


procedure for networks with any dimension

• Neural classification system for 2D- and 3D-image


recognition in tissue

13
Effect of hidden layer size on network
generalization

hidden
neurons

I. Nastac, April 2016 14


Validation-stop improvement

Training set:
• 85% for training
• 15% for validation

I. Nastac, April 2016 15


Can validation set act as a kind of
test set at the same time?

16
Testing the model
• The ANN model very depend on the
training set (that include the validation set).

• Sometimes the test set can come from a


changing environment…

• How to deal with non-stationary systems?


17
Pyramidal
ANN
structure

I. Nastac, April 2016 18


THE RETRAINING
PROCEDURE OF AN ANN
(enhanced version)
• Training an Artificial Neural Network in standard way
with validation stop

• Reduction of the first network weights with a scaling


factor  (0 < < 1). Usually,  = 0.1, 0.2, …, 0.9

• Retraining the network with the new initial weights

• Compare the validation error in both cases


19
Adaptive Retraining Technique
(step by step)

I. Nastac, April 2016 20


Finding ANN Structure
• Each of the training sessions starts with the weights
initialized to small, uniformly distributed values.
• Test several pyramidal ANN architectures, with Nh1 and
Nh2 taking values in the vicinity of the geometric mean of
the neighboring layers.

• Chose the best model with respect to the smallest error


between the desired and the simulated output. 21
Forecasting issues
Nonstationary Sequences Forecasting
Sequences: ● temporal sequences (time series)

● spatial sequences (chains of objects, features, DNA, etc.)

Tool: Artificial Intelligence Model


Retraining Neuro-Adaptive Technique
22
Starting model

• The data vary dynamically, and large space-delays might occur.


• Solutions are ranked by accuracy of the output forecasting according to the
following formula:

100 T ORkp  OFkp


ERR  
T p 1 ORkp
 f ( p)

where T is the number of time steps, ORkp – the real output_k at space/time step p,
OFkp – the forecasted output_k at space/time step p, and
T
f ( p)  23
Tp
Research Approach
• The sequences under discussion are inherently nonstationary.

• Nonstationary implies that the distribution of the sequences may


changes at different positions.

• Some gradual changes in the dependency between the input and output
variables may appear.

• The recent data points could provide more important information than
the distant data points.

• Use the adaptive retraining mechanism to take this characteristic into


24
account.
Basic idea

25
Forecasting Neural Network
Training Process

26
yk (t  1)  F ( X (t  1  In _ Del (i )), Y (t  Out _ Del ( j )))
Principal Component Analysis
(PCA)
• Principal Component Analysis (PCA) can be used to
reduce the feature vector dimension while retaining most
of the information in the feature information by
constructing a linear transformation matrix.

• The transformation matrix is made up of the most


significant eigenvectors of the feature vector covariance
matrix.

• The eigenvectors are orthonormal (orthogonal and


normalised) so they transform the original data into
independent feature information having maximal variance.
27
Adaptive Retraining

28
Extending the forecasts -
Iterative Simulations (IS)
Steps:
• Start: Construct at least one forecast using real data
at the input (selected by In_Del and Out_Del).
• Treat the previous forecasts (selected by Out_Del) as
observed values in order to produce other successive
forecasts (there is a combination of real and simulated
data at the input).

Using an iterative process, a forecast can be extended as


many time steps as required.

29
Alternative:
Always Real Inputs (ARI)

• The Always Real Inputs (ARI) approach


employs the real previous outputs and not
estimated ones.

30
Output Forecasting

I. Nastac, April 2016 31


Experiments
• Cases: I. Time series: Industrial & Financial Forecasting

II. Spatial Sequences: DNA Sequence Forecasting

• Scale Conjugate Gradient (SCG) as basic training algorithm


• Different delay vectors for In_Del and Out_Del :
In _ Del  [i _ d1, i _ d2 ,...,i _ dn ]
Out _ Del  o _ d1 , o _ d 2 ,..., o _ d m 

• ERR

32
Graphical Analysis
The quality of the predictions can be also analyzed graphically, by enforcing a
tube around the real outputs, given by a function like the one below:

f (n)  A  q  n
where:
• A is an acceptable prediction error;
• q is an increasing factor;
• n is the number of predicted timesteps.

Then, the predicted output values should lay in the interval output_k(n) +/- f(n).

For example: f ( n )  300  0 . 05  n


33
Time Series
A. Glass manufacturing process prediction
International Competition on Artificial Intelligence EUNITE 2003
(European Network of Excellence on Intelligent Technologies for Smart
Adaptive System)

• The goal is to find a practical mathematical model that describes the


relationship between 16 input variables and 4 output variables that model a
process in glass manufacturing .

• The raw data consist of 16000 rows (timesteps) – one data row every 15
minutes during 6 months. 34
Premise
• For this industrial example it seems that the outputs have different meanings and
different time-delay behaviors. Consequently, the normal way is to split the initial
system in four subsystems. Good results were obtained when we started to work
under the following assumptions:
– In_Del = [0 1 3] and Out_Del = [0 1] for output 1;
– In_Del = [0 1 2 3 5 8 12 18 28] and Out_Del = [0 1 3] for output 2;
– In_Del = [0 1 2 3 4 6 8 12 20] and Out_Del = [0 1] for outputs 3 and 4;
– V = 7500 timesteps are enough data for each training / retraining phase;
– T = 500 timesteps represent the prediction horizon;
– Shift = 500 timesteps is the shifting time for the next retraining.

• The prediction horizon can be, for example, enlarged to 1500 timesteps. The number
of samples for the training (or retraining) interval can be modified according to the
experience accumulated.
35
Unexpected effect of error's decreasing
for Iterative Simulations (IS)
250

245

240

235
1
output

230

225

220

215

210
8500 8550 8600 8650 8700 8750 8800 8850 8900 8950 9000

f (n)  3  0.003 n 36
Extension of the forecasting
245

240

235

500 timesteps
1
output

230

225

220
1.6 1.605 1.61 1.615 1.62 1.625 1.63 1.635 1.64 1.645 1.65
4
x 10

255

250

245

240

1568 timesteps
1
output

235

230

225

220
I. Nastac, April 2016 37
1.6 1.62 1.64 1.66 1.68 1.7 1.72 1.74 1.76
4
x 10
What's the matter without
retraining?

38
185

180

175
4
output

170

165

160

155
8000 8050 8100 8150 8200 8250 8300 8350 8400 8450 8500

39
180

175

170

165

160
4
output

155

150

145

140

135
8500 8550 8600 8650 8700 8750 8800 8850 8900 8950 9000

40
180

170

160
4
output

150

140

130

120
9000 9050 9100 9150 9200 9250 9300 9350 9400 9450 9500

41
175

170

165
4
output

160

155

150
9500 9550 9600 9650 9700 9750 9800 9850 9900 9950 10000

42
185

180

175

170
4
output

165

160

155

150
1 1.005 1.01 1.015 1.02 1.025 1.03 1.035 1.04 1.045 1.05
4
x 10

43
175

170

165

160
4
output

155

150

145

140
1.4 1.405 1.41 1.415 1.42 1.425 1.43 1.435 1.44 1.445 1.45
4
x 10

44
Effect of PCA transformation matrix adaptation

I. Nastac, April 2016 45


Data forecasting of outputs 3 and 4 for a unified model

46
B. Adaptive Monitoring
Iulian Nastac and Paul Cristea
Politehnica University of Bucharest

• Goal: model the catalyst activity of a multi-tube reactor, used


to oxidize a gaseous feed.

• The primary raw data consist of 5807 rows (timesteps) – one


data row every one hour during 8 months.

• Other four sets of 720 rows (one month) each are successively
used for test and update the model also.
47
Inputs:
• Measured flow of air (kg/hr);
• Measured flow of combustible gas (kg/hr);
• Measured concentration of combustible component in
combustible gas feed in mass fraction;
• Total feed temperature;
• Cooling temperature;
• Seven temperatures (in Celsius) from different points of
reactor length;
• Product concentration of oxygen in mass fraction;
• Product concentration of combustible component in mass
fraction.

Output: catalyst activity of a multi-tube reactor


48
Premise

●V = 3407 timesteps are enough for the first training


phase and then for each retraining phase;

● T = 720 (or 800 for initially tests) timesteps


represent the prediction horizon;

● Shift = 720 (or 800 for initially tests) timesteps is


the shifting time for the next retraining.

49
Training and retraining phases

50
Characteristics of the models
Input delay vector Output Use Dimension of ANN
(In_Del) delay time as PCA trans. Structure
vector input matrix
(Out_Del)
Model 1 0123456 012 Yes 40  101 40:19:8:1

Model 2 01234568 012 Yes 40  115 40:17:3:1

Model 3 01234568 012 No 40  107 40:10:5:1

Model 4 0 1 2 3 4 5 6 8 12 012 Yes 40  129 40:18:8:1

Model 5 0 1 2 4 6 9 12 16 20 24 39 0124 Yes 50  158 50:9:8:1

51
Evolution of ERR during First Training,
Retraining 1 and Retraining 2
ERR
Model First training Retraining 1 Retraining 2

Model 1 16.371 2.3826 7.2881

Model 2 12.922 1.333 10.562

Model 3 17.879 3.8248 7.5423

Model 4 21.93 22.307 8.113

Model 5 14.768 2.656 8.1371


52
Evolution of ERR during test phases
Model ERRh
Test 1 Test 2 Test 3
(Retraining 3) (Retraining 4) (Retraining 5)
Model 1 53.408 19.726 24.108

Model 2 38.833 20.87 12.046

Model 3 46.409 20.49 24.805

Model 4 42.979 21.485 11.185


Model 5 44.369 28.099 12.829
53
Data forecasting of Model 1 for test intervals of first
training, retraining 1 and retraining 2
Iterative simulations of output
0.5

0.4

0.3

0.2

0.1
140 145 150 155 160 165 170 175 180
Iterative simulations of output
0.4

0.3

0.2

0.1

0
175 180 185 190 195 200 205 210

Iterative simulations of output


0.5

0.4

0.3

0.2

0.1
54
0
205 210 215 220 225 230 235 240 245
Error (ERR) of Model 3

Test 1 Test 2 Test 3


(Retraining 3) (Retraining 4) (Retraining 5)

Naive
Model 64.2225 33.7336 40.6924

ARMA
Model 50.915 20.483 41.591

Neural
Adaptive 46.409 20.49 24.805
Model

55
Graphical results of ARMA and Neural-
Adaptive models
0.4

0.35

0.3

0.25

0.2

0.15

0.1

0.05

0
230 240 250 260 270 280 290 300 310 320 330 340

real data
naive simulated output
ARMA simulated output
56
neural-adaptive simulated output (Model 3)
C. Smart predictive model for air
pollutants

• We were particularly focused on NO2, because its concentration in the


air appears to play a negative role in human health.
• The model uses 21 inputs that include different kinds of emissions,
particles and meteorological parameters.
• These parameters were collected every hour during the time period
starting from January 2007 to December 2010.
• For the training purpose, we used V=8808 lines of data that were
randomly split in the training set (approx. 85%) and validation set
(approx. 15%).
57
Premise
• For this example we started to work under the following assumptions:

– In_Del = [3 4 5 6 7 8 9 ] and Out_Del = [2];


– V = 8808 timesteps are enough data for each training / retraining
phase since there are data for a bit more than one year;
– T = 720 timesteps represent the prediction horizon of one month
(2430);
– Shift = 720 timesteps is the shifting time for the next retraining.

• The prediction horizon can be, for example, enlarged to 1500


timesteps. The number of samples for the training (or retraining)
interval can be modified according to the experience accumulated.
58
ERR trend of test sets

I. Nastac, April 2016 59


Graphical Analysis
The quality of the predictions can be also analyzed graphically, by enforcing a
tube around the real outputs, given by a function like the one below:

f (n)  A  q  n
where:
• A is an acceptable prediction error;
• q is an increasing factor;
• n is the number of predicted timesteps.

Then, the predicted output values should lay in the interval output_k(n) +/- f(n).
For example:
f ( n )  15  0 . 005  n
60
Data forecasting for test interval of retraining 8
Iterative simulations of output
200

150

100

50

0
1.45 1.46 1.47 1.48 1.49 1.5 1.51 1.52 1.53
4
x 10

Always real inputs


200

150

100

50

0
1.45 1.46 1.47 1.48 1.49 1.5 1.51 1.52 1.53
4
I. Nastac, April 2016 x 1061
DNA Sequence Forecasting
Iulian Nastac and Paul Cristea
Biomedical Engineering Centre, UPB

Training Process

y(t 1)  F ( y(t  Out _ Del( j))


The model tries to match the desired value of the output, by properly
adjusting the structure in function of the previous values in the DNA
sequence. 62
Premise
• We worked under the following assumptions:
- V = 5000 is the interval employed for training/validation purpose;
- T = 500 steps represent the prediction horizon (test set);
- Shift = 500 is the shifting interval for the next retraining.

• The prediction horizon can be easily enlarged to 5000 steps.

• The predicted output values should lay in the interval


output(n) +/- f(n), where
f ( n )  1  0 . 05  n
Note: We use the cumulated phases of prokaryote and eukaryote DNA
sequences. 63
Prokaryote DNA sequences
forecasting
Evolution of test error (ERR) for Escherichia coli DNA forecasting

Training/ Test Error Test Error


Training No. Test interval
retraining interval (IS) (ARI)
First training 1 - 5000 5001 - 5500 13.051 2.3181
Retraining 1 501 - 5500 5501 - 6000 22.842 0.5292
Retraining 2 1001 - 6000 6001 - 6500 24.528 0.8818
Retraining 3 1501 - 6500 6501 - 7000 3.9406 0.4988
Retraining 4 2001 - 7000 7001 - 7500 2.463 0.2971
… … … … …
Retraining 85 42501-47500 47501-48000 0.1772 0.0936
Retraining 86 43001-48000 48001-48500 0.6931 0.0459
Retraining 87 43501-48500 48501-49000 0.1590 0.0480
64
Note: Out_Del=[1 3 5 7 10 14 18 23 28 32 39 47 55 65 80 100 140]
ERR trend of test sets

65
Data forecasting for test interval of retraining 75
when test set is enlarged to 5000 steps

66
Note: ERRARI = 1.4388
ERR trend of test sets
when T = 5000

I. Nastac, April 2016 67


Matrix of the initial PCA block
(the most important eigenvectors)

68
Matrix of the PCA block
(initial and after 87 retraining steps )

69
Matrix of the PCA block
(Out_Del=[1 2 3 4 5 … 35] )
-0.168
0.2
1

2
e

e
in

in
-0.169 0
L

L
-0.2
-0.17
0 10 20 30 0 10 20 30

0.2 0.2
3

4
e

e
in

in
0 0
L

L
-0.2 -0.2

0 10 20 30 0 10 20 30

0.2 0.2
5

6
e

e
in

in
0 0
L

L
-0.2 -0.2

0 10 20 30 0 10 20 30

0.2 0.2
7

8
e

e
in

in

0 0
L

-0.2 -0.2

0 10 20 30 0 10 20 30
70
Matrix of the PCA block
(initial and after 40 retraining steps )

71
Eukaryote DNA sequences
forecasting
Evolution of test error (ERR) for human DNA sequence forecasting
(chromosome 22)
Training/ Test interval Test Error Test Error
retraining interval (IS) (ARI)

First training 1 – 5000 5001–5500 27.331 0.5768


Retraining 1 501–5500 5501–6000 3.5955 1.1674
Retraining 2 1001–6000 6001–6500 5.5203 1.2491
Retraining 3 1501–6500 6501–7000 2.3656 0.4446
Retraining 4 2001-7000 7001–7500 6.9733 0.8124
.... .... .... .... ....
Retraining 74 37001-42000 42001-47000 0.6712 0.0429
Retraining 75 37501-42500 42501-47500 0.0847 0.0223
Retraining 76 38001-43000 43001-48000 0.2895 0.0244
72
Note: Out_Del = [1, 2, 3, ...., 35]
ERR trend of test sets
(Homo Sapiens case)

73
Note: T = 500 steps and L = 76 successive retraining phases
Data forecasting for test interval of retraining 75
(Homo Sapiens case)

74
Note: T = 5000 steps, ERRARI = 0.44718
Matrix of the PCA block
(initial and after 40 retraining steps )

75
An interesting result
• When training the PCA block on a set of sequences satisfying certain
mild statistical regularities, the rows of the resulting matrix are
related to the Digital Fourier Transform operator.

• The resemblance with the elements of the Fourier Transformation


operator is obvious when the delay vector consists of successive
values.

• The evolution of this matrix seems to be very little affected by the


changes of the shifted data sets during the retraining steps.

• The first line of the PCA matrix appears to have a slight convex
curvature that increases if the genomic signal has a general upward
trend (on Escherichia coli sequence), and is going to be more relaxed
when genomic signal has an increased trend (on human chromosome
22). 76
One time series that is not
related with DNA
-0.168
0.2
1

2
Line

Line
-0.169 0

-0.2
-0.17
0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35

0.2 0.2
3

4
Line

Line
0 0

-0.2 -0.2

0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35

0.2 0.2
5

6
Line

Line
0 0

-0.2 -0.2

0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35

0.2 0.2
7

8
Line

Line

0 0
-0.2 -0.2
-0.4 -0.4
0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35

77
Note: Out_Del = [1, 2, 3, ...., 35]
… when we involved
many time series
0.05 0.1
Line 1

Line 2
0 0

-0.05 -0.1
0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500

0.1 0.1
Line 3

Line 4
0 0

-0.1 -0.1
0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500

0.1 0.1
Line 5

Line 6
0 0

-0.1 -0.1
0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500

0.1 0.1

0
Line 7

Line 8
0

-0.1
-0.1
0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500

I. Nastac, April 2016 78


Note: Out_Del = [1, 2, 3, ...., 35]
E. Stock Market Forecasting
• The HEX Forest Industry Index is an important index in Finland
economy (HEX - Helsinki Stock Exchange) that illustrates the average
global trend in forest industry.

• The goal of was to find a practical mathematical model that describes


the relationship between 8 input variables (indices of four biggest
Finnish company in forest/ pulp & paper industry; EUR / US
Dollar exchange rate; Gold price; NYSE composite index
considered to have a noticeable impact; and month indicator), and
one output variable that model a forecasting process of this HEX
Forest Industry Index.

• The raw data consist of 2960 rows (timesteps) – one data row every
working day during 11 years (February 1993 – June 2004).
79
Premise
• We worked under the following assumptions:
- V = 2920 is the interval employed for training/validation purpose;
- T = 20 steps represent the prediction horizon (test set);
- Shift = 1 is the shifting interval for the next retraining.

• The prediction horizon can be easily enlarged to 50 or 100


working days.

• Model:
In_Del = [5 6 7 8 9 10 12 16]
Out_Del = [0 1 2 4]
80
Evolution of test error (ERR)

81
Data forecasting for test interval of
retraining 20
Iterative simulations of output
4100

4000
HEX forest index

3900

3800

3700

3600
2940 2942 2944 2946 2948 2950 2952 2954 2956 2958 2960

Always real inputs


4100

4000
HEX forest index

3900

3800

3700

3600
2940 2942 2944 2946 2948 2950 2952 2954 2956 2958 2960

I. Nastac, April 2016 82


f ( n )  100  0 . 05  n
F. Exchange Rate Forecasting
Iulian Nastac and Emilian Dobrescu
UPB and Romanian Academy

• The particular system, which resulted by using our approaches,


describes the relationship between over 30 variables and one
output variable that model the EURO-ROL exchange rate.

• The raw data consist of over 2500 rows (time steps) – one data
row every day during 8 years (2000 – 2008).
83
Database indicators
Indicators Symbol Frequency

I. Statistical information
A. General information
1 1. Real Gross Domestic Product growth GDP Quarterly

2 2. Current Account deficit CA Monthly

3 3. Consolidated general budget deficit as percentage on GDP CGD Quarterly

4 4. Net foreign direct investment FDI Monthly

5 5. Medium and long term external dept ExD Monthly

6 6. NBR Foreign exchange reserve ER Monthly

7 7. Export of good and services X Monthly

8 8. Import of good and services M Monthly

9 9. Net monthly average wage on the economy Nw Monthly

84
B. Specifics information
10 1. Exchange rate Dollar/ROL E$ Daily

11 2. Exchange rate EUR/ROL Eeur Daily

12 3. Consumer goods index CPIR Monthly

13 4. Monetary base M0 M0 Monthly

14 5. Reference rate of BNR rd Monthly

15 6. Speed between lending and deposit average interest rata of Δr Monthly


banks for non-government, non-banks clients

16 7. Total domestic credit DC Monthly

17 8. Portfolio investment, sold PI Monthly

18 9. Current transfers and incomes CTI Monthly

19 10. Turnover T Monthly

20 11. BET Index BET Daily

C. External Information
21 1. Ratio EUR/Dollar Ra Daily

22 2. Exchange rate Euro/ROL ReurEU Daily

23 3. Refinancing ECB interest rate Recb Monthly

24 4. Brent oil price op Monthly

25 5. HIPC (EU 27) HIPC Monthly

85
II: Prospective information
A. General information
26 1. Real GDP growth GDPf Annual

27 2. Export of good and services, FOB, growth rate Xf Annual

28 3. Import of good and services, FOB, growth rate Mf Annual

29 4. Commercial trade deficit, mill Euro Ctf Annual

30 5. Growth of consumer price, annual average CPIf Annual

31 6. Growth of consumer price, December/December CPIdf Annual

B. Specific forecasting information


32 1. Inflation target ITf Annual

33. 2. Future exchange rate Dollar/ROL, 1 month Fe$ Daily

34 3. Future exchange rate Euro/ROL 1 month FeEur Daily

35 4. Ratio EUR/Dollar, 1 month Fra Daily


86
Input variables:
• 32 (or 35) statistical and prospective financial
variables;
• Month indicator L (the days of January are noted
by 1, the days of February by 2 and so on).

Output variable:
 Daily exchange rate EUR/ROL
87
Premise
• We worked under the following assumptions:
- V = 2200 is the interval employed for training/validation purpose;
- T = 1 (or 3, 7, 15, 30) days represent the prediction horizon (test
set);
- Shift = 1 is the shifting interval for the next retraining.
• The prediction horizon can be easily enlarged to 60 steps or even more.
• Models:
Case I: In_Del = [1 2 3 4 5 6 8 12] and
Out_Del = [0 1 2 4].
………………….

Case VII: In_Del = [7 8 9 10 11 13 16] and


Out_Del = [0 1 2 4].
88
• Scale Conjugate Gradient (SCG) is the basic training algorithm.
Evolution of test error (ERR) for
Iterative Simulations

89
Evolution of test error (ERR) for Iterative
Simulations (IS) and Always Real Inputs
(ARI) when T=30

I. Nastac, April 2016 90


ERR trend (Case I – 33 inputs) of test
sets for the first training and L = 40
successive retraining phases
Iterative Simulation Always real inputs
4.5 4.5

4 4

3.5 3.5

3 3

2.5 2.5
ERR

ERR
2 2

1.5 1.5

1 1

0.5 0.5

0 0
0 10 20 30 40 0 10 20 30 40
91
ERR trend (Case I – 36 inputs) of test
sets for the first training and L = 40
successive retraining phases
Iterative Simulation Always real inputs
7 7

6 6

5 5

4 4
ERR

ERR
3 3

2 2

1 1

0 0
0 10 20 30 I. Nastac,
40 April02016 10 20 30 40 92
Remember
Graphical Analysis
The quality of the predictions can be also analyzed graphically, by enforcing a
tube around the real outputs, given by a function like the one below:

f (n)  A  q  n
where:
• A is an acceptable prediction error;
• q is an increasing factor;
• n is the number of predicted timesteps.

Then, the predicted output values should lay in the interval output_k(n) +/- f(n).

For example: f ( n )  300  0 . 05  n


93
Data forecasting for the test interval
of retraining 29 (Case I – 33 inputs)
4
x 10 Iterative simulations of output
3.7

3.6
Year 2006

3.5

3.4
35 40 45 50 55 60 65 70

4
x 10 Always real inputs
3.7

3.6
Year 2006

3.5

3.4
35 40 45 50 April55
I. Nastac, 2016 60 65 70 94
Data forecasting for the test interval of
retraining 19 (Case I – 33 inputs)
4
x 10 Iterative simulations of output
3.7

3.6
Year 2006

3.5

3.4

3.3
25 30 35 40 45 50 55 60

4
x 10 Always real inputs
3.7

3.6
Year 2006

3.5

3.4

3.3
25 30 35 40 April45
I. Nastac, 2016 50 55 60 95
Data forecasting for the test interval of
retraining 14 (Case I – 33 inputs)
4
x 10 Iterative simulations of output
3.7

3.6
Year 2006

3.5

3.4
20 25 30 35 40 45 50 55

4
x 10 Always real inputs
3.7

3.6
Year 2006

3.5

3.4
20 25 30 35 40 45 50 55
I. Nastac, April 2016 96
A new approach
• continuous sets of 3364 data (January
2000–December 2012)
• propose an intermediate form of predictions
between IS and ARI approaches
• a set of six predictors has been used to
provide the evolution of the exchange rate
in a forecasting horizon of the next 6
working days (i.e., more than a week)

97
Notes:
• The first prediction is similar, both in IS and
ARI approaches (then, the IS progressively
uses the past computed values of the
model).
• Intuitively, we focus on the fact that each of
the predictors is a ‘specialist’ for a certain
forecasting window, which implies a
specified number of time steps.

98
A collection of predictors that forecast a
number of z successive steps

99
Observation
• The predicted time horizon will always exceed the interval of
one week that includes 2 days of the weekend (Saturday and
Sunday).
• As an exercise of imagination, it can be easily noted that,
given the system parameters (incoming and outgoing data)
available in Friday, the exchange rate for the next week plus
Monday (on the following week) can be predicted. Therefore,
in this situation, we can count a period of 10 days on the
calendar.
• Moreover, if there are other days off on the predicted apparent
projection, then the forecasting calendar automatically spans a
longer period, although, essentially, only the following six
successive values are estimated.

100
The configuration of delay vectors

Predictive Input vector Output vector


system (In_Del) (Out_Del)

Predictor I [1 2 3 4] [0 1 2]

Predictor II [2 3 4 5] [1 2 3]

Predictor III [3 4 5 6] [2 3 4]

Predictor IV [4 5 6 7] [3 4 5]

Predictor V [5 6 7 8} [4 5 6]

Predictor VI [6 7 8 9] [5 6 7]

101
Training set
• The length of the training set includes 1500 lines
of data that represent a period that exceeds 5
years.
• This dimension of data volume, which was
initially used to find an adequate neural network
for each predictive system, was then successively
employed with shifted data during many retraining
phases.
• These retrainings were performed sequentially at
intervals of one working day.

102
Evolution of the error ERR during 1844 retraining
phases (4 October 2005 – 7 December 2012)
Cumulative Simulation
3.5

2.5

2
ERR

1.5

0.5

0
0 200 400 600 800 1000 1200 1400 1600 1800
I. Nastac, April 2016 103
The mean values of ERR for
different intervals

mERR whole interval 0.56369

mERR for the second half 0.43586


mERR for last quarter 0.40761
mERR for the last 1/8 of the interval 0.40423

104
Data forecasting of the test interval after
first training (ERR=0.91485)
- the confidence interval is 1% -
3.65

3.6
Exchange rate

3.55

3.5
1500 1501 1502 I. Nastac, April
1503 2016 1504 1505 105
1506
From 2005 / 10 / 4 to 2005 / 10 / 12
Data forecasting of the test interval after
third retraining (ERR=0.26174)

3.68

3.66

3.64
Exchange rate

3.62

3.6

3.58

3.56

3.54
1503 1504 1505 1506 1507 1508 1509
From 2005 / 10 / 7 to 2005 / 10 / 17
106
Data forecasting of the test interval after
retraining 168 (ERR= 0.14425)

3.6

3.58

3.56
Exchange rate

3.54

3.52

3.5

3.48
1668 1669 1670 1671 1672 1673 1674
From 2006 / 5 / 29 to 2006 / 6 / 6
107
Data forecasting of the test interval after
retraining 696 (ERR= 0.09972)
3.74

3.72

3.7
Exchange rate

3.68

3.66

3.64

3.62
2196 2197 2198 2199 2200 2201 2202
From 2008 / 6 / 17 to 2008 / 6 / 25
108
Data forecasting of the test interval after
retraining 770 (ERR= 3.36122)

3.95

3.9

3.85
Exchange rate

3.8

3.75

3.7

3.65

3.6

3.55
2270 2271 2272 2273 2274 2275 2276
From 2008 / 9 / 29 to 2008 / 10 / 7
109
Data forecasting of the test interval after
retraining 773 (ERR= 1.74547)

3.95

3.9

3.85
Exchange rate

3.8

3.75

3.7

3.65

3.6
2273 2274 2275 2276 2277 2278 2279
From 2008 / 10 / 2 to 2008 / 10 / 10
110
Data forecasting of the test interval after
retraining 774 (ERR= 0.80578)

3.95

3.9
Exchange rate

3.85

3.8

3.75

3.7
2274 2275 2276 2277 2278 2279 2280
From 2008 / 10 / 3 to 2008 / 10 / 13
111
A part of the ERR evolution
(between retrainings 1654 and 1844
i.e. 3 March 2012 - 7 December 2012)

112
Data forecasting of the test interval after
retraining 1843 (ERR= 0.08754)

4.6

4.58

4.56
Exchange rate

4.54

4.52

4.5

4.48

4.46
3343 3344 3345 3346 3347 3348 3349
From 2012 / 11 / 28 to 2012 / 12 / 6
113
H. An advanced model for electric
load forecasting
• I. Năstac, B. Păvăloiu, R. Tuduce si P. Cristea - ADAPTIVE
RETRAINING ALGORITHM WITH SHAKEN
INITIALIZATION Revue roumaine des sciences techniques -
Série Électrotechnique et Énergétique 2013 Vol. 58, Nr.1, 2013, pp.
101-111

• Use a shaking parameter (Qshake) of the


initial weights during retrainings.
• There will be a double loop that will control
this process.

114
• In our previous approaches, the reference network weights
were reduced with a scaling factor γ (0<γ<1).

• Usually, we applied successively this technique for nine


discrete values of γ (γ = 0.1, 0.2, …, 0.9), keeping the ANN
weight distribution that achieved the minimum error as the
reference network.

• Now, we repeated this step several times (by using the


parameter Nrep), and enhanced this approach by using a
shaking parameter (Qshake) of the initial weights:
for i = 1: Nrep
for j = 0.1:0.1:0.9
Netnew_weights = Netprevious_weights ·(j + Qshake·(rand(1)-0.5));
…. 115
• Furthermore, the shaking of weight positions could be
gradually made, from no shaking at i=1 to the maximum
shaking for i= Nrep, as following:

for i = 1: Nrep
for j = 0.1:0.1:0.9
Netnew_weights = Netprevious_weights ·(j + Qshake·(i-1)·(rand(1)-0.5));
….

116
Enhanced model

117
We can further take in consideration other similar mean values calculated from two
years ago and so on… 118
Model
Delay vectors

In _ Del  [i _ d1 , i _ d 2 ,..., i _ d n ]
Out _ Del  o _ d1 , o _ d 2 ,..., o _ d m 
Output
y (t+1)=F ( X (t+1− In Del ), y (t − Out Del ) ,mean( past y ))

ANN
• Two hidden layers
• Number of training samples / Number of the weights  5
• Preprocessing
• Replacement of missing values,
• Peak-shaving of outliers,
• Normalization. 119
Experimental results
• Over 30,000 records (time steps) – taken every hour during 4 years
(from January 2008 till December 2011)

• Each record contains the supply of electricity (coal, hydro, oil, nuclear,
wind, import) together with the time marks (hour, month calendar, year)

In_Del = [1, 2, 3, 4, 6]

Out_Del = [0, 1, 2]

- Training set involves 8800 data lines


- Seasonal interval of time from previous year = 480 hours (20 days)
120
ANN architecture

• After the iterative process of searching, we obtained an


ANN with:
– 23 neurons on first hidden layer
– 12 neurons on the second hidden layer

• The PCA matrix has 54 lines and 30 columns ( there is a


dimensional reduction from 54 dimension to 30, by
preserving more than 99% from the initial information)

• forecasting time horizon is established to 120 hours (five


days)
121
Cases

• Case I : Shift  1 hour

• Case II : Shift  24 hours

• Case III : Shift  120 hours (five days)

122
Case I
Iterative Simulation Always real inputs
14 14

12 12

10 10

8 8
ERR

ERR
6 6

4 4

2 2

0 0
0 10 20 30 40 50 0 10 20 30 40 50
123
retrainings numbers retrainings numbers
The mean values of ERR for different
intervals (Case I)

ERRIS ERRARI

mERR whole interval 1.5960 1.4334


mERR for the second half 1.1431 0.9845
mERR for last quarter 1.1229 0.9861
mERR for the last 1/8 1.1282 0.9835

124
Forecasted data after retraining 10
(Case I)
Iterative simulations of output
11000
predicted values
10000
Model output (MW)

real data
upper confidence interval limit
9000
lower confidence interval limit
8000

7000

6000

5000
1.762 1.763 1.764 1.765 1.766 1.767 1.768 1.769 1.77 1.771 1.772 1.773
Time instances from 2010/1/29 10 AM to 2010/2/3 10 AM 4
x 10

Always real inputs


10000 predicted values
Model output (MW)

real data
9000 upper confidence interval limit
lower confidence interval limit
8000

7000

6000

1.762 1.763 1.764 1.765 1.766 1.767 1.768 1.769 1.77 1.771 1.772 1.773
Time instances from 2010/1/29 10 AM to 2010/2/3 10 AM 4 125
x 10
Case II
Iterative Simulation Always real inputs
16 16

14 14

12 12

10 10
ERR

ERR
8 8

6 6

4 4

2 2

0 0
0 10 20 30 40 50 0 10 20 30 40 50
retrainings numbers retrainings numbers 126
The mean values of ERR for different
intervals (Case II)

ERRIS ERRIS
mERR whole interval 1.5849 2.4482
mERR for the second half 1.1540 1.8501
mERR for last quarter 1.1447 1.4092
mERR for the last 1/8 1.2839 1.4324

127
Case III
Iterative Simulation Always real inputs
12 12

10 10

8 8
ERR

ERR
6 6

4 4

2 2

0 0
0 10 20 30 40 50 0 10 20 30 40 50
retrainings numbers retrainings numbers 128
Case I (without supplementary input)
Iterative Simulation Always real inputs
14 14

12 12

10 10

8 8
ERR

ERR
6 6

4 4

2 2

0 0
0 10 20 30 40 50 0 10 20 30 40 50
retrainings numbers retrainings numbers 129
The mean values of ERR for the model with and
without supplementary input
(IS approach for Case I)
ERRIS ERRIS
(model with (model without
supplementary supplementary
input) input)
mERR whole interval 1.5960 2.5016
mERR for the second half 1.1431 1.1738
mERR for last quarter 1.1229 1.1458
mERR for the last 1/8 1.1282 1.1405
130
New goal: Forecast the energy consumption
in advance with six hours
by using six predictors in cascade

131
Remember
A collection of predictors that forecast a
number of z successive steps

132
The characteristics of predictive
systems
First Second
Predictive Input vector Output vector
hidden hidden
system layer layer (In_Del) (Out_Del)

Predictor I 25 5 [1 2 3 4 6 9] [0 1 2]

Predictor II 21 9 [2 3 4 5 7 10] [1 2 3]

Predictor III 24 9 [3 4 5 6 8 11] [2 3 4]

Predictor IV 24 6 [4 5 6 7 9 12] [3 4 5]

Predictor V 23 15 [5 6 7 8 10 13] [4 5 6]

Predictor VI 25 13 [6 7 8 9 11 14] [5 6 7]

133
The training
• Restart five times thelast loop for each ANN architecture
• Select the configuration that gives the minimum error and satisfies
the condition
Eval  threshold  Etr

• The ANN is used to predict the first time horizon.


• The architecture remains unchanged and the ANN is successively
retrained with one hour shifted database.
• Each retraining phase uses nine values of the scaling factor ( = 0.1
÷ 0.9) to recreate the initial values of the weights for the learning
process, and is repeated five times before predicting the
corresponding time horizon.
• A complete retraining phase lasts about 20 minutes.
• L = 40 successive retraining phase.
134
The mean values of ERR for successive
intervals

ERR
mERR whole interval 3.9997
mERR for the second half 3.1533
mERR for last quarter 2.6895
mERR for the last 1/8 1.5968

135
Evolution of Test Error (ERR)
Cumulative Simulation
9

5
ERR

0
0 5 10 15 20 25 30 35 40
136
Data forecasting of the test interval after
the first training (ERR = 4.85096)
Simulations of output
8000

7500
Power (MW)

7000

6500

6000

5500
2.5469 2.547 2.5471 2.5472 2.5473 2.5474 2.5475
Time instances from 2010/12/31 11 PM to 2011/1/1 5 AM 4
x 10
137
Data forecasting of the test interval after
the retraining 24 (ERR = 1.44723)
Simulations of output
7000

6800

6600

6400
Power (MW)

6200

6000

5800

5600

5400
2.5493 2.5494 2.5495 2.5496 2.5497 2.5498 2.5499
Time instances from 2011/1/1 11 PM to 2011/1/2 5 AM 4
x 10 138
Data forecasting of the test interval after
the retraining 40 (ERR = 0.67518)
Simulations of output
8000

7500
Power (MW)

7000

6500

6000

5500
2.5509 2.551 2.5511 2.5512 2.5513 2.5514 2.5515
Time instances from 2011/1/2 3 PM to 2011/1/2 9 PM 4
x 10
139
Data forecasting when using Iterative
Simulations (IS)
8000

7500

7000
Power (MW)

6500

6000

5500

5000

4500
1.468 1.47 1.472 1.474 1.476 1.478 1.48 1.482 1.484 1.486
Time instances from 2009/9/29 3 AM to 2009/10/6 4 AM x 10
4

140
Future work
• The research are conducted to add other
supplementary inputs and analyze if this
extended approach is feasible.

• A short-term load forecasting system could


be vital for the efficient operation of power
systems at the national level.

141
Open Issues
• Searching for optimum configurations of In_Del and Out_Del to reduce the
number of inputs in the recurrent relation:
Y (t  1)  F ( X (t  1  In _ Del (i )), Y (t  Out _ Del ( j )))

• Managing the length of the data set.


• Preprocessing the data and outliers’ removing.
• Selecting the proper shifting interval.
• Computing the PCA, which depends on:
- the number of initial inputs and outputs;
- the number of the elements of In_Del and Out_Del;
- the number of training pairs;
- software/hardware limitations.
142
• Finding the best architecture is time consuming.
A predictive system that can be
implemented in a real application

143
Classifications
A. Predictive Performance of ANN Classifiers
Iulian Nastac and Adrian Costea

(TUCS Technical Report 679/2005)

• The influence of data distribution, preprocessing


methods and training mechanisms

• A comparison between retraining technique and


genetic algorithms
144
B. A Cardio Vascular Predictor with
ANN-Classification-Based System
Oana Sandu1¹, Iulian Nastac², Jaime Uribarri¹
¹Mount Sinai School of Medicine,NY,US

²Polytechnic University of Bucharest, Romania

I. Nastac, April 2016 145


Background and data
• Functional and structural vascular abnormalities are commonly
observed in patients with renal disease.
• Case study in chronic kidney disease (CKD) and end stage renal
disease (ESRD) patients: the structural arterial disease by measuring
the common carotid artery intima-media thickness (CIMT), internal
carotid (ICA) plaque and stenosis and the endothelial vascular
function by assessing the brachial arterial flow mediated dilatation
(FMD) and hemodialysis treatment (HD) effect on it.
• Evaluate the prediction power of cardiovascular risk factors for yearly
end point events (cardiovascular events and death).
• A total of 93 subjects: 67 patients with renal disease: 24 CKD and 43
ESRD [23 peritoneal dialysis (PD) and 20 hemodialysis (HD)] and 26
healthy age-matched controls underwent carotid and brachial artery
Doppler examinations.
• Measurements have been performed with a Biosound Doppler
machine and IMT software. 146
Data
• Inputs : age, diabetes, gender, race, bilateral
CIMT, IMT plaque/stenosis, Hct, diagnosis (renal
stage CKD, ESRD - PD, ESRD - HD, or control),
systolic and diastolic blood pressure, body weight,
brachial artery diameters, FMD (measured before
and as % differences after HD in the case of HD
group)

• Outputs:
– Yearly CV end point events (class1),
– Previously events (class2)
– No observed cardiovascular events (class3)
147
Experiment The The total The distribution of ANN Success rate
number number samples architecture on test set
of of P
inputs samples Class 1 Class 2 Class 3

Experiment 1 24 87 10 5 72 21 : 11 : 4 : 3 0.78571

Experiment 2 24 93 10 5 78 21 : 14 : 4 : 3 0.8125

Experiment 3 22 87 10 5 72 20 : 14 : 3 : 3 0.57143
(without
ICAstenosis/plaque
and IMT)

Experiment 4 22 93 10 5 78 20 : 12 : 4 : 3 0.6
(without
ICAstenosis/plaque
and IMT)

148
Outcome of the application
• CIMT and internal carotid plaque/stenosis greatly
enhance prediction of yearly cardiovascular end
point events in patients with renal disease
• The data base used can be enlarged, enabling an
enhanced model to capture increased patterns that
separate the classes of markers
• The model can serve as predictor/classifier for
cardiovascular outcome and guideline the
initiation of treatment aimed at reducing
cardiovascular risks in patients with renal disease.

149
C. Image identification
Iulian Nastac and David Stanescu
Polytechnic University of Bucharest, Romania

• The goal of the research was to find a tool


able to recognize a person from a total of 40
individuals.

• The tool is a modified SOM (Self


Organized Map), trained in an original
manner.
150
Training set

151
152
Test set

153
I. Nastac, April 2016 154
Final map

155
The accuracy on test set is 171 out of 200
(85.5 %)

156
The accuracy on test set is 174 out of 200
(87 %)

157
The accuracy on test set is 176 out of 200
(88 %)

158
Outcome of the application

• Preliminary results from ongoing research


were presented.
• SOM model was modified in order to
include a supervised training.
• The pictures where used only to prove the
robustness of the classification algorithm.
• Current research targets the implementation
of this algorithm in an bio-medical
application. 159
Overall Conclusions
• The adaptive retraining technique can gradually improve, on
average, the achieved results.

• The first training always takes a relatively long time, but then the
system can be very easily retrained, since there are no changes in the
structure.

• The advantage of retraining procedure is that some relevant aspects


are preserved (“remembered”) not only from the immediate previous
training phase, but also from the previous but one phase, and so on.

• The old information accumulated during the older trainings will be


slowly forgotten and the learning process will be adapted to the
newest evolutions of the process.
160
Conclusions (cont.)
• Improve the performance of training with validation
stop.

• Reduce the number of inputs with PCA.

• It is very easy to change in our tool the SCG algorithm


with another one because at the basic level the
architecture and the retraining procedure are
independent of the training algorithm.

161
Final remarks
• There is a great potential for further
development in data mining.

• The tool/method can be easily


adapted for other kinds of
predictions or classification
processes.
162
Selected References
• Dobrescu, E., Nastac, D.I., and Pelinescu, E.: ”Short-term financial forecasting using ANN
adaptive predictors in cascade”, International Journal of Process Management and
Benchmarking, Vol. 4, No. 4, 2014, pp. 376-405.
• D.I. Nastac, A.P. Ulmeanu, R. Tuduce, and P.D Cristea.: A joint of adaptive predictors for
electric load forecasting, in Proceedings of IWSSIP 2013, Bucharest, Romania, July 7-9,
2013, pp. 51-54
• D.I. Nastac, I.B Pavaloiu, R. Tuduce, and Cristea P.D.: Adaptive retraining algorithm with
shaken initialization, Revue roumaine des sciences techniques – Série Électrotechnique et
Énergétique, Vol. 58, Nr.1, 2013, pp. 101-111
• I. Nastac: “An adaptive forecasting intelligent model for nonstationary time series”, Journal
of Applied Operational Research, Vol. 2, 2010, No. 2, pp. 117–129
• I. Nastac, A. Bacivarov, and A. Costea: “A Neuro-Classification Model for Socio-Technical
Systems”, Romanian Journal of Economic Forecasting, Vol. XI, No. 3/ 2009, pp. 100-109.
• P. Cristea, R. Tuduce, I. Nastac, J. Cornelis, R. Deklerck, and M. Andrei.: “Signal
Representation and Processing of Nucleotide Sequences”, International Journal of Functional
Informatics and Personalised Medicine (IJFIPM), Vol. 1, No. 3, 2008, pp. 253-268.
• I. Nastac, E. Dobrescu, and E. Pelinescu, “Neuro-Adaptive Model for Financial Forecasting”,
Romanian Journal of Economic Forecasting, Vol. 4, No. 3, 2007.
• A. Costea, and I. Nastac, “Assessing the Predictive Performance of ANN Classifiers Based on
Different Data Preprocessing Methods”, Internat. Journal of Intelligent Sys. Acc. Fin. Mgmt.
vol. 13, issue 4, December 2006, pp. 217-250.
• I. Nastac, F.Y. Collan, B. Back, M. Collan, T. Kuopio and P. Jalva: “Breast cancer prediction
using a neural network model”, World Automation Congress (WAC), Seville, June 2004.
163

You might also like