You are on page 1of 60

Business Analytics and Forecasting

DS 580
Farideh Dehkordi-Vakil

Introduction

Recall that extrapolative methods of forecasting


focus on a single time series to identify past
patterns in the historical data.
These patterns are then extrapolated to map out
the likely future path of the series.

Introduction

Note that, the past and present values are


already observed, where as the future values
are unknown and represent random
variables.
We do not know their values but we can
describe them in terms of a set of possible
values and the associated probabilities.

Introduction

This figure shows a


time series observed
for time period 1-12
and we would like to
make a forecast for
period 13-20.

Note the increase in


uncertainty as the
forecast horizon
increases.

Introduction

It is important to know both the forecast


origin and for how many periods a head the
forecast is being made.

Extrapolation of the Mean Value

Averaging methods

If a time series is generated by a constant process


subject to random error, then mean of the past values is
a useful statistics and can be used as a forecast for the
next period.
Averaging methods are suitable for stationary time
series data where the series is in equilibrium around a
constant value ( the underlying mean) with a constant
variance over time.

Averaging Methods

The Mean

Uses the average of all the historical data as the


forecast
1 t
Ft 1

t
i 1

When new data becomes available , the forecast for


1
time t+2 is the new mean
the previously
1 t including
Ft 2 this
new
yobservation.

observed data plus


i
t 1
i 1

This method is appropriate when there is no noticeable


trend or seasonality.

Averaging Methods
How do you describe this weekly
sales?

Suppose we are at week 26 and want to


forecast sales for the next few week. Should
use the average of all the 26 weeks available?

Moving Average Method

The moving average for time period t is the mean


of the k most recent observations.
A moving average of order k, MA(k) is the value
of k consecutive observations.
Ft 1 y t 1

( yt yt 1 yt 2 yt k 1 )
K

1 t
Ft 1
yi

k i t k 1

K is the number of terms in the moving average.

Moving Average Method

Some care should be taken in choosing the span k


for a moving average forecast model.
As a general rule, large spans smooth the time
series more than smaller spans by averaging many
ups and down in each calculation.
The smaller the number k, the more weight is
given to recent periods.
The greater the number k, the less weight is given
to more recent periods.

Moving Averages

A large k is desirable when there are wide,


infrequent fluctuations in the series.
A small k is most desirable when there are
sudden shifts in the level of series.
For seasonal data, the length of the season
is often used for the value of k.

Moving Average Method

For monthly data, a 12-month moving average,


MA(12), eliminate or averages out seasonal effect.
Moving average method

Assigns equal weight to each observation used in the


calculation.
As more information become available, new data point
will be included in the calculation and the oldest data
point will be discarded.
The moving average model does not handle trend or
seasonality very well although it can do better than the
total mean

Moving Averages

The following figure shows that the MA(3) adapt more quickly to
movements in the series while MA(7) produces a greater degree of
smoothing.

Example: Weekly Department Store Sales

The weekly sales


figures (in millions of
dollars) presented in
the following table are
used by a major
department store to
determine the need for
temporary sales
personnel.

Period (t)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

Sales (y)
5.3
4.4
5.4
5.8
5.6
4.8
5.6
5.6
5.4
6.5
5.1
5.8
5
6.2
5.6
6.7
5.2
5.5
5.8
5.1
5.8
6.7
5.2
6
5.8

Example: Weekly Department Store Sales


Weekly Sales
8

Sales

Sales (y)

0
0

10

15
Weeks

20

25

30

Example: Weekly Department Store Sales

Use a three-week moving average (k=3) for


the department store sales to forecast for the
week 24 and 26.
y 24

( y23 y22 y21 ) 5.2 6.7 5.8

5.9
3
3

The forecast error is

e24 y24 y 24 6 5.9 .1

Example: Weekly Department Store Sales

The forecast for the week 26 is


y 26

y25 y24 y23 5.8 6 5.2

5.7
3
3

Example: Weekly Department Store Sales

RMSE = 0.63
Weekly Sales Forecasts

Sales

5
Sales (y)

forecast

0
0

10

15
Weeks

20

25

30

Period (t)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

Sales (y) forecast


5.3
4.4
5.4
5.8
5.033333
5.6
5.2
4.8
5.6
5.6
5.4
5.6
5.333333
5.4
5.333333
6.5
5.533333
5.1
5.833333
5.8
5.666667
5
5.8
6.2
5.3
5.6
5.666667
6.7
5.6
5.2
6.166667
5.5
5.833333
5.8
5.8
5.1
5.5
5.8
5.466667
6.7
5.566667
5.2
5.866667
6
5.9
5.8
5.966667
5.666667

Exponential Smoothing Methods

This method provides an exponentially


weighted moving average of all previously
observed values.
Appropriate for data with no predictable
upward or downward trend.
The aim is to estimate the current level and
use it as a forecast of future value.

Simple Exponential Smoothing Method

Formally, the exponential smoothing equation is


Ft 1 yt (1 ) Ft

forecast for the next period.


= smoothing constant.
yt = observed value of series in period t.

Ft = old forecast for period t.


The forecast Ft+1 is based on the most recent
observation yt with a weight and weighting the most
recent forecast Ft with a weight of 1-

Ft 1

Simple Exponential Smoothing Method

The implication of exponential smoothing


can be better seen if the previous equation
is expanded by replacing Ft with its
components as follows:
Ft 1 yt (1 ) Ft
yt (1 )[ yt 1 (1 ) Ft 1 ]
yt (1 ) y t 1 (1 ) 2 Ft 1

Simple Exponential Smoothing Method

If this substitution process is repeated by


replacing Ft-1 by its components, Ft-2 by its
components, and so on the result is:

Ft 1 yt (1 ) y t 1 (1 ) 2 y t 2 (1 ) 3 y t 3 (1 )t 1 y1

Therefore, Ft+1 is the weighted moving


average of all past observations.

Simple Exponential Smoothing Method

The following table shows the weights assigned to


past observations for = 0.2, 0.4, 0.6, 0.8, 0.9

Simple Exponential Smoothing Method

The exponential smoothing equation


rewritten in the following form elucidate the
role of weighting factor .
Ft 1 Ft ( yt Ft )

Exponential smoothing forecast is the old


forecast plus an adjustment for the error that
occurred in the last forecast.

Effect of Different Weights


0.6
0.5
0.4
Weight
0.3
0.2
0.1
0
Lag

Simple Exponential Smoothing Method

Choosing the smoothing constant in the


exponential smoothing model is similar to
choosing the span k in the moving average model.
They both related to the smoothness of the model.

Smaller values of correspond to greater smoothing of


the ups and downs in the time series.
Larger values of put most of the weight on the most
recent observed values, so the forecasts tend to follow
the ups and downs of the series more closely.

Simple Exponential Smoothing Method

The value of smoothing constant must be


between 0 and 1
can not be equal to 0 or 1.
If stable predictions with smoothed random
variation is desired then a small value of is
desire.
If a rapid response to a real change in the pattern
of observations is desired, a large value of is
appropriate.

Simple Exponential Smoothing Method

To estimate , Forecasts are computed for


equal to .1, .2, .3, , .9 and the sum of
squared forecast error is computed for each.
The value of with the smallest RMSE is
chosen for use in producing the future
forecasts.

Simple Exponential Smoothing Method

To start the algorithm, we need F1 because


F2 y1 (1 ) F1

Since F1 is not known, we can

Set the first estimate equal to the first observation.


Use the average of a number of initial observations.

the first three or four up to 12 or even the mean of the whole


sample can be used.
When either sample size or is large, the choice of starting
value is relatively unimportant.

Example:University of Michigan Index


of Consumer Sentiment

University of Michigan
Index of Consumer
Sentiment for
January1995December1996.
we want to forecast the
University of Michigan
Index of Consumer
Sentiment using Simple
Exponential Smoothing
Method.

Date
Observed
Jan-95
97.6
Feb-95
95.1
Mar-95
90.3
Apr-95
92.5
May-95
89.8
Jun-95
92.7
Jul-95
94.4
Aug-95
96.2
Sep-95
88.9
Oc t-95
90.2
Nov-95
88.2
Dec-95
91
Jan-96
89.3
Feb-96
88.5
Mar-96
93.7
Apr-96
92.7
May-96
94.7
Jun-96
95.3
Jul-96
94.7
Aug-96
95.3
Sep-96
94.7
Oc t-96
96.5
Nov-96
99.2
Dec-96
96.9
Jan-97

Example:University of Michigan Index


of Consumer Sentiment

Since no forecast is
available for the first
period, we will set the
first estimate equal to
the first observation.
We try =0.3, and
0.6.

University of Michigan Index of Consumer


Sentiment
100
Consumer Sentiment Index

98
96
94
92
90
88
86
Sep-94

Apr-95

Oct-95

May-96
Date

Dec-96

Jun-97

Example:University of Michigan Index


of Consumer Sentiment

Note the first forecast is


the first observed value.
The forecast for Feb. 95 (t
= 2) and Mar. 95 (t = 3)
are evaluated as follows:

y t 1 y t ( yt y t )
y 2 y1 0.6( y1 y1 ) 97.6 0.6(97.6 97.6) 97.6
y 3 y 2 0.6( y2 y 2 ) 97.6 0.6(95.1 97.6) 96.1

Date
Jan-95
Feb-95
Mar-95
Apr-95
May-95
Jun-95
Jul-95
Aug-95
Sep-95
Oct-95
Nov-95
Dec-95
Jan-96
Feb-96
Mar-96
Apr-96
May-96
Jun-96
Jul-96
Aug-96
Sep-96
Oct-96
Nov-96
Dec-96
Jan-97
Feb-97
Mar-97
Apr-97
May-97
Jun-97
Jul-97
Aug-97
Sep-97
Oct-97
Nov-97
Dec-97

Consumer Sentiment
97.6
95.1
90.3
92.5
89.8
92.7
94.4
96.2
88.9
90.2
88.2
91
89.3
88.5
93.7
92.7
89.4
92.4
94.7
95.3
94.7
96.5
99.2
96.9
97.4
99.7
100
101.4
103.2
104.5
107.1
104.4
106
105.6
107.2
102.1

Alpha =0.3
#N/A
97.60
96.85
94.89
94.17
92.86
92.81
93.29
94.16
92.58
91.87
90.77
90.84
90.38
89.81
90.98
91.50
90.87
91.33
92.34
93.23
93.67
94.52
95.92
96.22
96.57
97.51
98.26
99.20
100.40
101.63
103.27
103.61
104.33
104.71
105.46

Alpha=0.6
#N/A
97.60
96.10
92.62
92.55
90.90
91.98
93.43
95.09
91.38
90.67
89.19
90.28
89.69
88.98
91.81
92.34
90.58
91.67
93.49
94.58
94.65
95.76
97.82
97.27
97.35
98.76
99.50
100.64
102.18
103.57
105.69
104.92
105.57
105.59
106.55

Example:University of Michigan Index


of Consumer Sentiment

RMSE =2.66 for = 0.6


RMSE =2.96 for = 0.3
University of Michigan Index of Consumer sentiments

120

100

Sentim ent Index

80

Consumer Sentiment
60

SES (Alpha =0.3)


SES(Alpha=0.6)

40

20

0
Jun-94

Oct-95

Mar-97

Jul-98
Months

Dec-99

Apr-01

Evaluating Forecasts

All quantitative forecasting models are developed on the


basis of historical data.
When RMSE are applied to the historical data, they are
often considered measures of how well various models fit
the data (how well they work in the sample).
To determine how accurate the models are in actual
forecast (out of sample) a hold out period is often used for
evaluation and a measure of forecast accuracy based on the
forecast errors (such as RMSE) can be computed.

Evaluating Forecasts

To evaluate the relative performance of alternative


methods:

The data series is partitioned into two parts.

The first part is called estimation sample or in-sample is used


to estimate the starting value and the smoothing parameter.
This sample typically contains the first 75-80 percent of the
observations.
The second part called hold-out sample or validation sample
or out-sample is used to assess forecasting performance. This
sample contains the last 20-25 percent of observation.

General Comments

On average, SES tends to outperform MA.


SES corresponds to an intuitively appealing
underlying statistical model of the data (we shall
see this in chapter 5).
Direct use of moving average based procedures
are not recommended for forecasting.
Moving averages are useful in the area of seasonal
adjustment (will see this in chapter 4)

General Comments

For evaluation or fitting purposes, we could


minimize RMSE or minimize MAP or
MAE. They generally produce similar
results.
Out-of-sample error measures tend to be
somewhat higher than those calculated for
estimation sample.

Linear Exponential Smoothing

When a time series has a long-term trend (e.g.


increases in GDP or sales) the forecasting
method must accommodate such features. There
are two main approaches:

Convert the series to rates of change (growth rates,


either absolute or percentage) then predict the rate of
change, OR
Develop forecasting methods that account for trends

Linear Exponential Smoothing

Linear trend fitted to Quarterly


Sales
Quarterly Sales = - 6.157 + 4.567 Period
80

S
R-Sq
R-Sq(adj)

70
60

Quarterly Sales = 6.914 - 0.0466 Period


+ 0.2883 Period**2

5.47757
93.7%
93.3%

80

60

50

Netflix Sales

S
R-Sq
R-Sq(adj)

70

50

40

Netflix Sales

30

40
30

20

20

10

10

0
0

10

Period

12

14

16

10

Period

12

14

16

Quadratic trend fitted to


Quarterly Sales

1.98024
99.2%
99.1%

Holts Exponential Smoothing

The previous two models assume a never


changing trend into the future.
The linear exponential smoothing model projects
trends more locally.
Holts two parameter exponential smoothing
method is an extension of simple exponential
smoothing.
It adds a growth factor (or trend factor) to the
smoothing equation as a way of adjusting for
changes in the trend.

Holts Exponential Smoothing

We start by defining the following


variables:

Lt = level of series at time t.

Tt = trend(slope) of series at time t.

The forecast function for one step ahead is:

Ft+1 = Lt + Tt

The forecast m steps ahead is

Ft+m = Lt + mTt

Holts Exponential smoothing

To update the level and the trend:

The new level is the old level (adjusted for the increase produced
by the trend) plus a partial adjustment (weight ) for the most
recent error.
Lt Lt 1 Tt 1 et
L t yt (1 )( Lt 1 Tt 1 )

The new trend is the old trend plus a partial adjustment (weight )
for the error.
Tt Tt 1 et

Tt ( Lt Lt 1 ) (1 )Tt 1

Forecast m steps into the future.

F t m Lt Tbt

Holts Exponential smoothing

The weight and can be selected


subjectively or by minimizing a measure of
forecast error such as RMSE.
Large weights result in more rapid changes
in the component.
Small weights result in less rapid changes.

Holts Exponential smoothing

The initialization process for Holts linear


exponential smoothing requires two estimates:

One to get the first smoothed value for L1

The other to get the trend b1.

One alternative is to set L1 = y1 and


b1 y 2 y1
or
b1

y 4 y1
3

or
b1 0

Example:Quarterly sales of saws for


Acme tool company

The following table


shows the sales of
saws for the Acme tool
Company.
These are quarterly
sales From 1994
through 2000.

Year
1994

1995

1996

1997

1998

1999

2000

Quarter

t
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

sales
500
350
250
400
450
350
200
300
350
200
150
400
550
350
250
550
550
400
350
600
750
500
400
650
850
600
450
700

Example:Quarterly sales of saws for


Acme tool company
Examination of the
plot shows:

A non-stationary time
series data.
Seasonal variation
seems to exist.

Sales for the first and


fourth quarter are larger
than other quarters.

Sales of saws for the Acme Tool Company: 1994-2000


900

800

700

600

500
Saws

400

300

200

100

0
0

10

15
Year

20

25

30

Example:Quarterly sales of saws for


Acme tool company

The plot of the Acme data shows that there might


be trending in the data therefore we will try Holts
model to produce forecasts.
We need two initial values

The first smoothed value for L1


The initial trend value b1.

We will use the first observation for the estimate


of the smoothed value L1, and the initial trend
value b1 = 0.
We will use = .3 and =.1.

Example:Quarterly sales of saws for


Acme tool company
Year
1994

1995

1996

1997

1998

1999

2000

Quarter

t
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4

sales
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

500
350
250
400
450
350
200
300
350
200
150
400
550
350
250
550
550
400
350
600
750
500
400
650
850
600
450
700

Lt
500.00
455.00
390.35
385.88
398.18
378.34
318.61
303.23
307.38
266.55
220.98
261.95
339.77
340.55
311.38
379.12
431.67
427.00
407.92
467.83
558.73
553.10
517.56
564.16
659.35
656.71
608.16
644.43

bt
0.00
-4.50
-10.52
-9.91
-7.69
-8.90
-13.99
-14.13
-12.30
-15.15
-18.19
-12.28
-3.27
-2.86
-5.49
1.83
6.90
5.74
3.26
8.93
17.12
14.85
9.81
13.49
21.66
19.23
12.45
14.83

Ft+m
500.00
500.00
450.50
379.84
375.97
390.49
369.44
304.62
289.11
295.08
251.40
202.79
249.67
336.50
337.69
305.89
380.95
438.57
432.74
411.18
476.75
575.85
567.94
527.37
577.65
681.01
675.94
620.61

Example:Quarterly sales of saws for


Acme tool company

RMSE for this application


is:
= .3 and = .1
RMSE = 155.5
The plot also showed the
possibility of seasonal
variation that needs to be
investigated.

Quarterly Saw Sales Forecast Holt's Method


900

800

700

600

500
sales

Sales

Ht+m

400

300

200

100

0
0

10

15
Quarters

20

25

30

Exponential smoothing with a damped Trend

One common features of times series for


sales is a decline in sales as a product lines
matures unless the product is upgraded in
some way.
A procedure that damps down the trend
component as the forecast horizon is
extended assumes that the series will level
out over time.

Exponential smoothing with a damped Trend

This kind of life-cycle effects can be


accommodated by introducing a damping
factor to the updating equations for level
and trend.
Lt Lt 1 Tt 1 et

Tt Tt 1 et

The damping factor 0 < < 1 will dampen the


trend term.

Exponential smoothing with a damped Trend

forecast function for m steps ahead


Ft m Lt ( 2 m )Tt

This forecast levels out over time,


approaching the limiting value
Lt

Tt
(1 )

Use of Transformations

Use of LES methods requires that series be locally linear.


In many cases this assumption is not realistic and the forecasts either
underestimate or overestimate the actual value. This becomes more
serious as forecasting horizon increases.

Use of Transformations

A series with a more complex nonlinear


pattern can be forecast in two ways:

Transform the series so that the trend becomes


linear
Convert the series to growth over time, forecast
growth rate, and then convert back to the
original series.

The Log Transform

The log transform produces a linear trend, we can apply


LES and then transform back to the original series to
obtain the forecast of interest.
Typically the effect of the log transform process is to
improve forecasting performance for exponential growth
curve.
Yt 1.05Yt 1

Log transform :
LnY t ln 1.05 ln Yt 1
the reverse transformation :
Exp(ln Yt ) exp(ln 1.05 ln Yt 1 ) 1.05Yt 1

Use of Growth Rate

Gt

Yt Yt 1
100
Yt 1

Define Growth rate


Use SES to predicr the growth rate for the
next period.
The one step forecast for the original series
is given by
Gt 1
Ft 1 Yt (1
)
100

Growth Rate Analysis of Netflix Quarterly Sales


Year

Quarter

Quarterly Sales

Growth
-percentage

Growth forcast

Sales Forecast

2000

5.17*

2000

7.15

38.1

38.1

2000

10.18

42.5

38.1

9.9

2000

13.39

31.5

41.9

14.1

2001

17.06

27.4

33.0

19.0

2001

18.36

7.6

28.2

22.7

2001

18.88

2.8

10.5

23.5

2001

21.62

14.5

3.9

20.9

2002

30.53

41.2

13.0

22.5

2002

36.36

19.1

37.2

34.5

2002

40.73

12.0

21.7

49.9

2002

45.19

10.9

13.4

49.6

2003

55.67

23.2

11.3

51.2

2003

63.19

13.5

21.5

62.0

2003

72.20

14.3

14.6

76.8

2003

81.19

12.4

14.3

82.8

The BOX-Cox Transformations

Logarithmic transformation is appealing because it


reflects proportional rather absolute change.
But proportional change may project future
growth in excess of reasonable expectations.
A modified LES to allow for a damped trend was
introduced earlier.
This modification can be applied after the log
transform when appropriate.

The BOX-Cox Transformations

A second possibility is to select a transformation


that is moderate than the logrithmic one.
Box and Cox suggested using a power
transformation.
Z t Yt c
1 C 1

When to Transform

Do not use complex transforms unless they


are supported by both theory and data.
Always compare transformed method with
a benchmark by transforming the forecast to
the data series of interest.

You might also like