Professional Documents
Culture Documents
Methods
Summary
Based
on
the
script
written
by
Professor
David
Targett
Compiled
by
Jrg
Stegemann
June
2012
Table
of
Contents
Measures .............................................................................................................................. 4
Location ......................................................................................................................................... 4
Arithmetic
mean ...................................................................................................................................... 4
Median.......................................................................................................................................................... 4
Mode.............................................................................................................................................................. 4
Scatter ............................................................................................................................................ 4
Range ............................................................................................................................................................ 4
Interquartile
range ................................................................................................................................. 4
Mean
Absolute
Deviation
(MAD) ...................................................................................................... 4
Variance ....................................................................................................................................................... 4
Standard
Deviation ................................................................................................................................. 5
Coefficient
of
Variation ......................................................................................................................... 5
Indices ............................................................................................................................................ 5
Simple
Index .............................................................................................................................................. 5
Simple
Aggregate
Index ........................................................................................................................ 5
Weighted
Aggregate
Index
(Laspeyres,
Paasche) ..................................................................... 6
Other
Summary
Measures....................................................................................................... 6
Skew .............................................................................................................................................................. 6
Kurtosis........................................................................................................................................................ 6
Distributions ....................................................................................................................... 7
ANOVA
table ................................................................................................................................. 7
One-Way
Analysis
of
Variance ........................................................................................................... 7
Two-Way
Analysis
of
Variance .......................................................................................................... 7
Regression
Analysis................................................................................................................................ 7
Binominal
Distribution ............................................................................................................ 8
Normal
Distribution .................................................................................................................. 9
Poisson
Distribution ...............................................................................................................11
t-Distribution.............................................................................................................................12
Chi-squared
Distribution.......................................................................................................13
F-Distribution ............................................................................................................................14
Holt-Winters
method...........................................................................................................................18
Decomposition
Method.......................................................................................................................18
Box-Jenkins
Method .............................................................................................................................19
Forecasting ........................................................................................................................20
Qualitative
Methods ................................................................................................................20
Causal
Modelling ......................................................................................................................20
Time
Series
Methods...............................................................................................................20
Regression .........................................................................................................................21
Simple
Linear
Regression......................................................................................................21
Testing
randomness
of
residuals........................................................................................21
Runs
Test...................................................................................................................................................21
Multiple
Regression
Analysis ...............................................................................................21
Stages
in
multiple
regression
analysis .........................................................................................21
Discarding
of
variables........................................................................................................................22
Correlation .................................................................................................................................22
Correlation
coefficient.........................................................................................................................22
R-bar-squared .........................................................................................................................................23
Collinearity ...............................................................................................................................................23
Exams...................................................................................................................................24
Measures
Location
Arithmetic
mean
The
arithmetic
mean
is
calculated
as
Sum
of
readings
x
--------------------------
or
----
Number
of
readings
n
Median
The
median
is
the
middle
number
of
a
set
of
values.
In
case
of
an
even
number
of
readings
the
arithmetic
mean
of
the
middle
two
numbers
is
used.
Mode
The
most
frequent
number
that
occurs
in
a
set
of
readings.
Scatter
Range
Largest
reading
smallest
reading
Interquartile
range
Range
of
the
middle
50%
of
readings:
Strip
of
the
top
and
bottom
25%
of
readings,
then
use
the
Range
calculation.
Mean
Absolute
Deviation
(MAD)
The
mean
absolute
deviation
is
the
average
distance
of
readings
from
their
arithmetic
mean:
|x
-
xmean|
MAD
=
--------------------------
Number
of
readings
Variance
The
variance
is
the
average
squared
difference
of
readings
from
the
arithmetic
mean:
(x
-
xmean)2
Variance
=
------------------------------
Number
of
readings
1
Easier
to
calculate
for
large
numbers
of
readings:
(x2)
n
*
xmean2
Variance
=
----------------------
n
-
1
Standard
Deviation
The
standard
deviation
is
the
square
root
of
the
variance.
Standard
deviation
=
sqrt(Variance)
Coefficient
of
Variation
To
compare
the
means
of
two
groups
the
measure
of
scatter
must
be
standardised
before
a
relative
comparison
can
be
made.
The
coefficient
of
variation
is
a
possible
way
to
achieve
this:
Coefficient
of
variation
=
Standard
deviation
/
Arithmetic
mean
Example:
Mean
Standard
deviation
Coefficient
of
variation
Airport
1
4
200
1
050
0.25
Airport
2
15
600
2
250
0.14
Indices
Simple
Index
An
index
is
the
result
of
the
conversion
of
one
series
of
numbers
into
another
based
on
100.
One
value
is
picked
as
the
base
value
and
assigned
the
value
100.
Previous
and
following
values
are
calculated
relative
to
this
base
value:
Index
value
=
value
/
base
value
*
100
Simple
Aggregate
Index
An
aggregate
index
takes
into
account
multiple
factors.
Example:
The
price
index
for
meat
consists
of
the
price
of
beef,
pigs
and
lamb.
To
do
this,
add
together
the
values
for
each
meat
in
a
time
period.
The
first
time
period
gets
assigned
the
value
100.
Previous,
following
periods:
See
above.
Price
relative
index:
If
some
inputs
have
very
different
levels
the
effect
of
a
high-
value
column
can
overshadow
a
low-value
column.
To
counter
this
a
price
relative
index
can
be
used.
Here
an
index
value
is
calculated
per
column
and
the
aggregate
index
is
then
based
on
those
index
values.
Skew
Skew
measures
the
extent
to
which
a
distribution
is
non-symmetrical.
Left
(or
negatively)
skewed
graphs
show
a
peak
to
the
right
of
the
middle.
Right
(or
positively)
skewed
graphs
show
the
peak
to
the
left
of
the
middle.
Zero-skewed
graphs
are
symmetrical.
Kurtosis
Kurtosis
measures
the
extent
to
which
a
distribution
broad,
i.e.
how
thick
or
pointy
the
middle
of
the
graph
is.
Low
kurtosis
means
a
pointier
graph,
high
kurtosis
indicates
a
fatter
graph.
Distributions
ANOVA table
Conventionally
analyses
of
variance
are
laid
out
in
a
systematic
form
called
an
ANOVA
table
(ANalysis
Of
VAriance
table).
One-Way
Analysis
of
Variance
Variation
Degrees
of
Sums
of
squares
Mean
square
F
freedom
Explained
by
c
1
SST
MST
MST/MSE
treatments
Error
or
(r
-
1)
*
c
SSE
MSE
unexplained
Total
r
*
c
1
SS
Two-Way
Analysis
of
Variance
Variation
Degrees
of
Sums
of
squares
Mean
square
F
freedom
Explained
by
c
1
SST
MST
MST/MSE
treatments
Explained
by
r
1
SSB
MSB
MSB/MSE
blocks
Error
or
(r
-
1)
*
(c
1)
SSE
MSE
unexplained
Total
r
*
c
1
SS
Regression
Analysis
Variation
Degrees
of
Sums
of
squares
Mean
square
F
freedom
Explained
by
k
SST
MST
MST/MSE
treatments
Error
or
n
k
-1
SSE
MSE
unexplained
Total
n
-
1
SS
Degrees
of
freedom
c
=
Number
of
columns
r
=
Number
of
rows
n
=
Number
of
observations
k
=
Number
of
independent
variables
in
the
regression
analysis
Binominal Distribution
The
binominal
distribution
is
based
on
taking
samples
from
a
population
whose
elements
are
of
two
types.
A
random
sample
of
size
n
is
taken.
Because
of
the
randomness
of
the
sample
it
could
contain
between
0
and
n
elements
of
type
p.
Example:
20
Percent
of
chips
produced
are
defective.
A
sample
of
30
chips
is
inspected
to
see
how
many
are
actually
defective.
Parameters:
p
=
Proportion
of
the
population
of
type
1
(1-p
is
the
proportion
of
type
2)
n
=
The
size
of
the
sample
being
taken
Calculation:
P(r
of
type
1
in
sample)
=
nCr
*
pr
*
(1
-
p)
n-r
n!
with
nCr
=
---------------
r!
*
(n
r)!
How
many
combination
possibilities
are
there
for
taking
n
elements
out
of
a
population
of
r?
Think
Lotto,
pick
6
out
of
49
numbers:
49!
/
(
6!
*
43!)
=
13
983
816
(chance
of
winning
1
in
~14
Million)
Uses
of
the
binominal
distribution:
-
Inspection
schemes
(does
the
observed
defect
rate
differ
from
the
agreed
one?)
-
Opinion
polls
(for/against)
-
Selling
(sale/no
sale)
Can
be
approximated
by
both
the
Normal
and
the
Poisson
Distributions!
Normal Distribution
The
normal
distribution
is,
when
drawn,
a
bell
shaped,
continuous
and
symmetrical
curve.
Unlike
discrete
distributions
it
is
not
the
height
of
the
line
that
defines
the
probability
for
the
normal
distribution.
Instead
the
area
between
two
values
on
the
x-axis
and
the
curve
give
the
probability
of
an
event
to
be
between
both
points.
The
area
below
the
curve
has
the
following
attributes:
68.26%
of
the
readings
lie
between
1
standard
deviation
of
the
mean
95.44%
of
the
readings
lie
between
2
standard
deviation
of
the
mean
99.74%
of
the
readings
lie
between
3
standard
deviation
of
the
mean
Example:
The
IQ
of
children
has
a
mean
of
100
and
a
standard
deviation
of
17
points.
This
means
that:
68.26%
of
children
have
an
IQ
of
83
to
117
95.44%
of
children
have
an
IQ
of
66
to
134
99.74%
of
children
have
an
IQ
of
49
to
151
Parameters:
Mean
Standard
deviation
For
looking
up
values
in
the
normal
tables:
Observed
value
Mean
zcalc
=
-------------------------------
Standard
deviation
When
to
use
a
normal
distribution
The
normal
distribution
should
be
used
when
observations
or
measurements
are
taken
from
a
population.
Each
observation
is
subject
to
multiple
sources
of
disturbances.
Each
of
those
sources
changes
the
value
of
the
observation
slightly.
Some
errors
might
cancel
each
other
out
(some
being
positive,
some
being
negative)
so
most
of
the
measurements
will
fall
close
to
a
mean
value.
Some
observations
however
will
experience
the
addition
of
errors
and
be
further
away
from
the
mean.
A
lot
of
real-world
examples
exist:
- IQs
of
children
- Heights
of
people
with
the
same
sex
- Dimensions
of
mechanically
produced
components
- Weights
of
machine-produced
items
- Arithmetic
means
of
large
samples
Approximating
a
binominal
distribution
with
the
Normal:
The
binominal
distribution
can
be
approximated
by
the
normal
if
both
n
*
p
and
n
*
(1
p)
exceed
a
value
of
5.
Mean
=
n
*
p
Standard
deviation:
sqrt(np
*
(1
-
p))
If
the
proportion
of
defectives
(instead
of
the
number
of
defectives)
is
looked
at
these
values
need
to
be
used:
Mean
=
p
Standard
deviation
=
sqrt((p
*
(1
p)
/
n)
When
a
discrete
distribution
is
approximated
by
the
Normal
care
needs
to
be
taken
to
use
the
correct
values
for
the
limits.
For
example
if
the
probability
of
an
event
occurring
less
than
50
times
is
required
this
means
we
need
to
look
for
Binominal:
P(r
<
50)
Normal:
P(r
<=
49.5)
10
Poisson Distribution
Describes
the
occurrence
of
isolated
events
within
a
continuum.
Like
the
binominal
distribution
based
on
taking
a
sample
from
a
population
of
elements
of
two
types
with
the
types
being
the
occurrence
and
the
non-occurrence
of
an
event.
The
Poisson
distribution
is
discrete;
its
shape
varies
from
right-skewed
to
almost
symmetrical.
Example:
Continuum:
Time
Events:
-
A
telephone
call
arrives
at
a
switchboard
-
No
telephone
call
arrives
at
a
switchboard
The
total
number
of
elements
is
infinite
as
there
are
an
unlimited
number
of
non-
occurred
events
that
are
part
of
the
sample.
Other
uses
include
flaws
in
cable
(cable
being
the
continuum,
flaws
being
the
events)
or
mechanical
breakdown
of
machinery
(time
again
as
continuum,
breakdown
as
event).
Parameter:
=
Average
number
of
events
per
sample
Probability
of
r
events
occurring
in
a
sample:
e-
*
r
P(r)
=
----------
r!
Example:
=
2
(average
number
of
calls
arriving
per
minute)
P(0
calls)
=
0.135
*
1
/
1
=
0.135
P(1
call)
=
0.135
*
2
/
1
=
0.27
Instead
of
deriving
the
full
value
for
every
r
we
can
incrementally
calculate
it:
P(r)
*
P(r
+
1)
=
-----------
r
+
1
P(1)
=
P(0)
*
P(2)
=
P(1)
*
/
2
P(3)
=
P(2)
*
/
3
Approximate
a
binominal
distribution
with
Poisson:
=
n
*
p
11
t-Distribution
Similar
to
the
normal
distribution
but
with
longer
tails
it
is
also
continuous
with
a
symmetrical
shape.
For
small
sample
sizes
the
tails
are
considerable
longer
than
those
of
the
normal
distribution
while
for
sample
sizes
of
30+
the
t-
distribution
and
the
normal
distribution
can
be
considered
to
behave
the
same.
Parameters:
- Arithmetic
(sample)
mean
- Standard
deviation
- Degrees
of
freedom
(sample
size
1)
Observed
sample
mean
-
Mean
t
=
Estimate
of
standard
deviation
/
sqrt(Number
of
samples)
When
to
use
a
t-distribution
- The
population
standard
deviation
is
unknown
and
has
to
be
estimated
from
the
sample
- The
sample
size
is
less
than
30
(for
sample
sizes
>
30
the
normal
could
be
used)
- The
underlying
distribution
of
the
population
from
which
the
sample
was
taken
is
normal
All
these
conditions
need
to
be
met
for
the
t-distribution
to
be
applicable!
Example:
Test
for
length
of
life
of
40
light
bulbs
->
normal
distribution
Test
for
length
of
life
of
20
light
bulbs
->
t-distribution
Uses
of
t:
1) Calculate
limits
for
observed
sample
means
to
be
within
a
confidence
limit
Look
up
the
t-value
for
the
given
levels
of
freedom
and
the
confidence
limit.
With
the
t-value
and
the
standard
deviation
and
sample
size
calculate
the
range
of
possible
values.
Apply
to
the
mean
to
get
lower
and
upper
limits
for
the
observed
means
to
fall
within
the
confidence
limit.
2) Test
an
observed
sample
mean
to
be
within
a
confidence
limit
Calculate
t
with
the
formula
given
above
and
compare
the
value
with
the
t-value
from
the
statistically
table
for
the
given
levels
of
freedom
and
confidence
limit.
If
the
calculated
value
is
lower
than
the
limit
the
hypothesis
is
accepted;
otherwise
it
is
rejected.
12
Chi-squared Distribution
The
chi-squared
(2
)
distribution
provides
the
method
for
comparing
an
observed
sample
variance
with
a
hypothesised
population
variance.
It
can
answer
the
question:
Is
the
observed
scatter
of
the
sample
in
accord
with
what
is
thought
to
be
the
scatter
of
the
population?
Parameters:
(n
1)
*
Observed
sample
variance
2
=
---------------------------------------------
Population
variance
13
F-Distribution
The
F-distribution
is
used
to
compare
the
variance
of
one
sample
with
that
of
a
second.
The
variable
of
an
F-distribution
is
the
ratio
between
two
variance
estimates.
Just
as
the
location
of
two
samples
could
be
compared
through
the
difference
in
their
means
(by
applying
a
normal
or
t-test),
so
the
scatter
of
two
samples
can
be
compared
through
the
ratio
of
their
variances
(by
applying
an
F
test).
Parameters:
Variance
of
sample
1
F
=
---------------------------
Variance
of
sample
2
14
Significance
Tests
5
steps:
-
-
-
-
-
Two
samples
are
taken
from
a
population,
their
means
and
the
difference
between
the
means
calculated.
The
mean
of
the
distribution
of
those
means
is
0
(means
difference
in
samples
cancel
each
other
out).
Variance
sum
theorem:
Variance(x
+
y)
=
Variance(x)
+
Variance(y)
Variance(x
-
y)
=
Variance(x)
+
Variance(y)
With
some
dark
math
it
follows
that
Variance(xmean
ymean)
=
2
*
V
/
n
Standard
deviation
=
sqrt(2)
*
s
/
sqrt(n)
Therefore
z
can
be
calculated
as:
x1,
mean
x2,
mean
z
=
---------------------------
sqrt(2)
*
s
/
sqrt(n)
15
Create
a
new
sample
with
the
difference
between
paired
values.
Treat
the
new
sample
like
a
basic
single
sample
significance
test
with
a
mean
of
0.
xmean
0
z
=
---------------
s
/
sqrt(n)
Tests on proportions
Arithmetic
mean
=
p
Standard
deviation
=
sqrt(p
*
(1
-
p)
/
n)
Type
1
errors
Occur
when
a
hypothesis
is
rejected
due
to
a
sample
in
the
reject
tail
of
the
distribution
event
though
the
hypothesis
is
in
fact
true.
The
probability
of
this
is
equal
to
the
significance
level
(the
usual
5
or
1%).
Type
2
errors
These
occur
when
a
hypothesis
is
accepted
falsely.
To
determine
the
probability
of
this
requires
knowledge
of
the
alternative
hypothesis,
which
has
to
be
precisely
defined
(not:
Average
IQ
is
not
100).
The
probability
of
correctly
accepting
the
alternative
hypothesis
is
the
power
of
the
significance
test.
16
Smoothing
methods
Series
type
Stationary
Trend
Seasonal
Cyclical
Other
Methods
Moving
averages
Exponential
smoothing
Holt
Holt-Winters
Decomposition
Box-Jenkins
Stationary
series
Moving
average
Replace
the
original
series
with
a
smoothed
series,
replacing
each
value
with
the
average
of
it
and
the
neighbouring
values.
Examples
are
three-point
moving
average,
five-point
moving
average,
etc.
The
calculated
value
can
be
used
as
the
forecast
for
the
first
period
after
the
last
value
used
for
its
calculation
only!
Exponential
Smoothing
Gives
more
weight
to
recent
values.
St
=
(1
-
)
*
St-1
+
*
xt
is
usually
in
the
range
of
0.1
to
0.4
The
forecast
can
be
used
for
the
next
month
as
only
past
values
are
used
for
the
calculation.
As
the
first
value
in
the
series
the
first
value
from
the
original
series
is
used.
Holts
Method
Two
parameters:
:
Smoothing
parameter
for
series
values
:
Smoothing
parameter
for
trend
values
St
=
(1
-
)
*
(St-1
+
bt-1)
+
*
xt
bt
=
(1
-
)
*
bt-1
+
*
(St
St-1)
Ft
=
St
+
m
*
bt
17
xt
=
actual
observation
at
time
t
St
=
smoothed
value
at
time
t
bt
=
smoothed
trend
at
time
t
Ft
=
forecast
for
m
periods
in
the
future
The
forecast
can
be
used
for
the
next
month
as
only
past
values
are
used
for
the
calculation.
As
the
first
two
values
in
the
series
the
first
two
values
from
the
original
series
are
used.
The
first
value
for
the
trend
is
the
difference
between
the
first
and
second
value
of
the
smoothed
series
(which
are
the
same
as
the
original
series)
and
is
in
row
2.
Only
then
do
we
have
enough
values
to
properly
calculate
the
smoothed
value
and
smoothed
trend
in
the
following
rows.
Series
with
a
trend
and
seasonality
Holt-Winters
Method
The
Holt-Winters
Method
adds
a
third
smoothing
equation
for
seasonality
as
compared
to
Holts
method
described
above.
A
new
smoothing
constant
denoted
is
introduced.
Seasonality
is
measured
as
the
ratio
between
actual
data
and
smoothed
data:
Actual
data
Seasonality
=
-------------------
Smoothed
data
Decomposition
Method
This
method
assumes
that
a
time
series
can
be
decomposed
into
four
distinct
elements:
- Trend
- Cycle
- Seasonality
- Random
Trend
The
trend
is
isolated
by
a
regression
analysis
between
data
and
time.
The
regression
equation
will
look
like
this:
xt
=
a
+
bt
+
ut
where:
xt
=
actual
data
a
+
bt
=
Trend
element
ut
=
Residuals
of
seasonality,
cycles
and
random
part
18
Cycles
By
choosing
a
suitable
moving
average
(12
points
for
monthly
data,
4
for
quarterly)
the
random
and
seasonal
elements
can
be
smoothed
away,
leaving
just
trend
and
cycle.
If
St
is
such
a
moving
average
then
the
ratio
between
St
and
(a
+
bt)
must
be
the
cycle.
If
the
ratio
is
approx.
1
for
all
time
periods
there
is
no
cycle.
Seasonality
Seasonality
is
isolated
by
a
similar
approach
to
that
for
cycles.
The
moving
average
(St)
comprises
trend
and
cycles.
The
actual
value
(xt)
comprises
trend,
cycle,
seasonality
and
random
effects.
The
ratio
Actual
value
xt
---------------------
or
---
Moving
average
St
should
reflect
the
seasonality
and
the
random
effect.
If
the
data
is
quarterly
then
the
seasonality
for
the
first
quarter
can
be
calculated
as
x1
x5
x9
Average
of
---,
---,
---,
S1
S5
S9
The
averaging
helps
in
eliminating
random
effects.
Box-Jenkins
Method
The
Box-Jenkins
Methods
allows
for
compensating
of
previous
errors
as
time
goes
by.
To
do
this
past
residuals
(forecasting
errors)
are
incorporated
into
the
equation.
Box-Jenkins
is
better
described
as
a
process:
(a) Pre-whiten
(b) Identify
(c) Estimate
(d) Diagnose
(e) Forecast
19
Forecasting
Three
methods
for
forecasting:
- Qualitative
- Causal
modelling
- Time
series
methods
Qualitative
Methods
are
based
on
using
judgement
rather
than
(historical)
data.
May
be
the
only
method
when
dealing
with
new
products
or
new
technologies.
In
causal
modelling
the
variable
to
be
forecasted
is
related
statistically
to
one
or
more
other
variables.
Assumption
is
that
the
relationship
between
the
chosen
variable
and
the
modelled
one
will
hold
in
the
future!
Time
series
methods
analyse
historical
data
(cycles,
trends,
seasonal
factors)
and
project
the
observed
behaviour
into
the
future.
Used
for
short-term
forecasts
in
stable
conditions.
20
Regression
Least
squares
method
of
regression:
Minimize
the
sum
of
the
squared
residuals.
The
residuals
of
the
regression
should
be
random.
Line
is
defined
as:
y
=
a
+
b
*
x
With
(x
-
xmean)
*
(y
-
ymean)
b
=
------------------------------
(x
-
xmean)2
a
=
ymean
b
*
xmean
Runs
test
Group
the
residuals
by
their
sign.
Each
change
of
sign
is
a
run.
Having
too
many
runs
is
a
sign
of
non-randomness
as
is
a
very
low
number
of
runs.
Check
the
upper
and
lower
critical
values
from
a
statistical
table
with
the
number
of
negative
and
positive
residuals
as
the
parameters.
If
the
number
of
observed
runs
is
within
the
upper
and
lower
critical
values
the
residuals
appear
to
be
random.
The
idea
of
the
Simple
Linear
Regression
is
extended
to
multiple
variables
on
the
right-hand
side
of
the
equation:
y
=
A
+
B
*
x
+
C
*
z
+
D
*
t
These
are
three
independent
x
variables:
x,
z,
t.
Their
coefficients
are
B,
C
and
D
while
A
is
the
constant.
Stages
in
multiple
regression
analyses
- Identify
dependent
and
independent
variables
- Examine
scatter
diagrams
(multiple
needed)
- Run
regression
analysis
- Calculate
R2
value
to
determine
proportion
of
total
variation
explained
- Test
significance
using
ANOVA
table
and
F-test
- Test
significance
of
each
coefficient
using
t-test
(see
discarding
of
variables
below)
21
-
-
-
Check
residuals
o Plot
residuals
against
fitted
y-values
o Use
Runs-test
to
check
for
randomness
Check
for
collinearity
(see
Collinearity
below)
Use
model
for
prediction
Discarding
of
variables
In
multiple
regression
analyses
not
all
variables
will
have
a
statistically
significant
impact
on
the
result.
Each
variable
can
be
tested
for
its
effect
on
y.
For
this
a
t-test
is
used
with
the
usual
5
stages:
(a) H0:
The
population
coefficient
for
this
variable
is
0.
(b) The
coefficient
and
the
standard
error
for
the
variable
will
have
to
be
computed.
(c) Significance
level
is
the
usual
5
percent.
This
is
a
two-sided
test
hence
the
5%
are
split
into
2.5%
upper
and
2.5%
lower
tail.
(d) Degrees
of
freedom
are
n
k
1
with
n
=
number
of
observations
k
=
The
number
of
x
variables
in
the
regression
The
observed
t
value
is
Coefficient
estimate
tObs.
=
-------------------------------------
Standard
error
of
coefficient
(e) If
tObs.
exceeds
t0.025
then
the
hypothesis
is
rejected
and
the
variable
does
have
a
significant
impact.
If
the
t
value
is
lower
the
variable
may
be
eliminated
from
the
regression
equation.
Correlation
Total
variation
(y
-
ymean)2
Explained
variation
(Fitted
y
ymean)2
Unexplained
variation
(Residuals)2
Correlation
coefficient
[(x
-
xmean)
*
(y
-
ymean)]
r
=
----------------------------------------
sqrt((x
-
xmean)2
*
(y
-
ymean)2)
Correlation
coefficient
r
can
take
values
between
-1
and
+1.
Values
close
to
-1
or
+1
indicate
strong
(negative)
correlation;
values
closer
to
0
indicate
a
low
(or
no)
correlation.
A
more
intuitive
understanding
can
be
gained
from
the
squared
correlation
coefficient,
r2.
It
is
written
as
R2
(for
no
apparent
reason
it
seems
).
Logically
the
correlation
coefficient
is
calculated
as:
R2
=
Explained
variation
/
Total
variation
22
For
R2
closer
to
1
more
of
the
total
variation
can
be
explained.
The
closer
it
is
to
0
less
of
the
total
variation
can
be
explained
by
the
current
model.
R-bar-squared
R-bar-squared
is
a
more
sensitive
measure
of
closeness
of
fit.
Based
on
the
same
ratio
as
R2
but
with
an
adjusted
formula
that
allows
R-bar-squared
to
fall
when
a
variable
unconnected
to
the
result
is
added.
Collinearity
Collinearity
occurs
when
two
(or
more)
of
the
x
variables
are
highly
correlated.
In
this
case
multiple
variables
contribute
some
of
the
same
information
to
the
end
result.
To
avoid
collinearity
problems
you
can:
(a) Use
only
one
of
the
variables
(which
to
use
is
largely
subjective)
(b) Combine
the
variables
(if
the
aggregate
has
any
meaning)
(c) Substitute
with
another
variable
with
a
similar
meaning
and
a
low
correlation
No
hard
and
fast
rules
exist
to
deal
with
collinearity.
Make
sure
you
are
aware
of
the
problem
and
the
restrictions
it
places
on
the
interpretation
of
the
results.
23
Exams
J06
D06
J07
CS1:
One-way
analyses
of
variance,
ANOVA
CS2:
Survey
methodology
CS3:
Hypothesis
test
CS4:
Time
Series,
Exponential
smoothing,
Forecasting
techniques
D07
J08
D08
J09
24
D09
J10
D10:
J11
D11
25