You are on page 1of 34

Cours

e
Summer

Minin
g
Data

Summer Course: Data Mining


Regression Analysis

Presenter: Georgi Nalbantov

August 2009
Summer
Cours
e 2/34

Minin
g
Data

Structure

Regression analysis: definition and examples

Classical Linear Regression

LASSO and Ridge Regression (linear and


nonlinear)

Nonparametric (local) regression estimation:


kNN for regression, Decision trees, Smoothers

Support Vector Regression (linear and nonlinear)

Variable/feature selection (AIC, BIC, R^2-


adjusted)
Cours

3/34
e
Summer

Minin

Feature Selection, Dimensionality


g
Data

Reduction, and Clustering in the KDD


Process

U.M.Fayyad, G.Patetsky-
Shapiro and P.Smyth
(1995)
Cours

4/34
e
Summer

Minin
g
Data

Common Data Mining tasks

Clustering Classification
Regression
X2
+
X2 + +
+
+
+ + +
+ +
+ + + - + +
++ + + + -
+ + - - +
+ ++ + +
+
- +
+
-

X1 X1 X1
k-th Nearest Neighbour Linear Discriminant Analysis, Classical Linear
Parzen Window QDA Regression
Unfolding, Conjoint
Analysis, Cat-PCA
Logistic Regression (Logit) Ridge Regression
Decision Trees, LSSVM, NN, VS NN, CART
Cours

5/34
e
Summer

Minin
g
Data

Linear regression analysis: examples


Cours

6/34
e
Summer

Minin
g
Data

Linear regression analysis: examples


Cours

7/34
e
Summer

Minin
g
Data

The Regression task

Given data on m explanatory variables and 1 explained variable, where


the explained variable can take real values in 1, find a function that
gives the best fit:

Given: ( x1, y1 ), , ( xm , ym ) n X 1

Find: : n 1

best function = the expected error on unseen data ( xm+1, ym+1 ), ,


( xm+k , ym+k ) is minimal
Cours

8/54
e
Summer

Minin
g
Data

Classical Linear Regression (OLS)

Explanatory and Response Variables are Numeric


Relationship between the mean of the response variable
and the level of the explanatory variable assumed to be
approximately linear (straight line)
Model:

Y 0 1 x ~ N (0, )
1 > 0 Positive Association
1 < 0 Negative Association
1 = 0 No Association
Cours

9/54
e
Summer

Minin
g
Data

Classical Linear Regression (OLS)

0 Mean response when x=0 (y-


^ ^ ^
intercept)
y 0 1 x
1 Change in mean response when
x increases by 1 unit (slope)

0, 1 are unknown parameters (like


)
Task:
0+1x Mean response when
Minimize
explanatory thevariable takes on the
sum
valueofx squared
errors: 2 2
^ ^ ^
^
^ ^

SSE i 1 yi y i i 1 yi 0 1 xi
n n
y 0 1 x

Cours

10/54
e
Summer

Minin
g
Data

Classical Linear Regression (OLS)


Parameter: Slope in the population
model (1) ^
1 ^ ^ ^
y 0 1 x
^
Estimator: Least squares
^ estimate:
s/ S 1 xx
2
standard
Estimated ^
error:


y y
SSE
s2
n2 n2

x x
2
S xx

Methods of making inference


regarding population:
Hypothesis tests (2-sided or 1-

sided) Coefficientsa
Confidence Intervals
Unstandardized Standardized
Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 89.124 7.048 12.646 .000
x
LSD_CONC
1 -9.009 1.503 -.937 -5.994 .002

y
a. Dependent Variable: SCORE
Cours

11/54
e
Summer

Minin
g
Data

Classical Linear Regression (OLS)


Cours

12/54
e
Summer

Minin
g
Data

Classical Linear Regression (OLS)


Cours

13/54
e
Summer

Minin
g
Data

Classical Linear Regression (OLS)

Coefficient of determination (r2) :


proportion of variation in y explained by
the regression on x.
S yy SSE
r
2
0 r2 1
S yy
where

y y
2
S yy
2
^

SSE
y y

Cours

14/54
e
Summer

Minin
g
Data

Classical Linear Regression (OLS):


Multiple regression

Numeric Response variable (y)


p Numeric predictor variables
Model:

Y = 0 + 1x1 + + pxp +

Partial Regression Coefficients: i effect (on the mean


response) of increasing the ith predictor variable by 1
unit, holding all other predictors constant
Cours

15/54
e
Summer

Minin
g
Data

Classical Linear Regression (OLS):


Ordinary Least Squares estimation
Population Model for mean response:

E (Y | x1 , x p ) 0 1 x1 p x p
Least Squares Fitted (predicted) equation, minimizing SSE:

2
^ ^ ^ ^
^

Y 0 1 x1 p x p SSE Y Y

Cours

16/54
e
Summer

Minin
g
Data

Classical Linear Regression (OLS):


Ordinary Least Squares estimation
Model:
^ ^ ^ ^
Y 0 1 x1 p x p
2
^
OLS estimation: min SSE Y Y

n 2 p
^
LASSO estimation: min SSE Y Y j
i 1 j 1

n 2 p
^

Y Y j
2
min SSE
Ridge regression estimation:
i 1 j 1
Cours

17/59
e
Summer

Minin
g
Data

LASSO and Ridge estimation of model


coefficients

sum(|beta|) sum(|beta|)
Cours

18/59
e
Summer

Minin

Nonparametric (local) regression


g
Data

estimation:
k-NN, Decision trees, smoothers
Cours

19/59
e
Summer

Minin

Nonparametric (local) regression


g
Data

estimation:
k-NN, Decision trees, smoothers
Cours

20/59
e
Summer

Minin

Nonparametric (local) regression


g
Data

estimation:
k-NN, Decision trees, smoothers
Cours

21/59
e
Summer

Minin

Nonparametric (local) regression


g
Data

estimation:
k-NN, Decision trees, smoothers

How to Choose k or h?

When k or h is small, single instances matter; bias is


small, variance is large (undersmoothing): High
complexity
As k or h increases, we average over more instances and
variance decreases but bias increases (oversmoothing):
Low complexity
Cross-validation is used to finetune k or h.
Cours

22/59
e
Summer

Minin
g
Data

Linear Support Vector Regression

middle-sized
small area
biggest area
area


Expenditures

Expenditures

Expenditures






Support
vectors
Age Age
Age

Lazy case Suspiciously smart Compromise case,


case SVR
(underfitting)
(overfitting) (good generalisation)

The thinner the tube, the more complex the model


Cours

23/59
e
Summer

Minin
g
Data

Nonlinear Support Vector Regression

Map the data into a higher-dimensional space:


Expenditures

Age
Cours

24/59
e
Summer

Minin
g
Data

Nonlinear Support Vector Regression

Map the data into a higher-dimensional space:


Expenditures

Age
Cours

25/59
e
Summer

Minin
g
Data

Nonlinear Support Vector Regression:


Technicalities

The SVR function:


To find the unknown parameters of the SVR function, solve:
Subject to:

How to choose , ,

= RBF kernel:

Find , , , and from a cross-validation procedure


Cours

26/59
e
Summer

Minin
g
Data

SVR Technicalities: Model Selection

Do 5-fold cross-validation to find and for several fixed


values of .
CV_MSE, epsilon = 0.15
CV_MSE, epsilon = 0.15
0.02

0.018 0.064

0.016 0.063

0.014 0.062
gamma

CVMSE

0.061
0.012

0.06
0.01
0.0588
0.059
0.008 0.0592 0.02
0.058
0 0.01
0.006
0 5 10 15 5
10
C 0
15
C gamma
Cours

27/59
e
Summer

Minin
g
Data

SVR Study :
Model Training, Selection and Prediction
CVMSE (IR*, HR*, CR*) True returns (red) and raw predictions
(blue)

CVMSE (IR*, HR*, CR*)


Cours

28/59
e
Summer

Minin
g
Data

SVR: Individual Effects


Effect of credit spread on SP500
Effect of 3m treasure bill on SP500 -2.8
-0.5

-1
-2.85

-1.5
-2.9
SP500

SP500
-2

-2.95
-2.5

-3 -3

-3.5
-70 -60 -50 -40 -30 -20 -10 0 10 20 30 Effect of vix FUT on SP500
-3.05
-2-1
3moftreasure
Effect bill
vix on SP500 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4
-2.5 credit spread

-3 -2.5

-3.5
-3
SP500
SP500

-4

-3.5

-4.5

-4
-5

-5.5 -4.5
-40 -30 -20 -10 0 10 20 30 40 50 60 -10 -5 0 5 10 15 20 25
vix vix FUT
Cours

29/34
e
Summer

Minin
g
Data

SVR Technicalities: SVR vs. OLS


Performance on theHoliday
test set
Data, test set, epsilon = 0.15
4
Performance on the test
SVR set
3.5
Expenditures

3 MSE=
0.04
2.5

2
0 5 10 15 20set, OLS solution
Holiday Data, test 25 30 35 40
4 Observation

OLS
3.5
Expenditure

3
MSE=
0.23
2.5

2
0 5 10 15 20 25 30 35 40
Obserlation
Cours

30/34
e
Summer

Minin

Technical Note:
g
Data

Number of Training Errors vs. Model


Complexity

Min. number
of
training
Model
errors, test errors
complexity

training errors complexity

Best trade-off Functions ordered


in
MATLAB video here increasing
complexity
Cours

31/34
e
Summer

Minin
g
Data

Variable selection for regression

Akaike Information Criterion (AIC). Final prediction error:


Cours

32/34
e
Summer

Minin
g
Data

Variable selection for regression

Bayesian Information Criterion (BIC), also known as Schwarz


criterion. Final prediction error:

BIC tends to choose simpler models than AIC.


Cours

33/34
e
Summer

Minin
g
Data

Variable selection for regression

R^2-adjusted:
Summer
Cours
e 34/34

Minin
g
Data

Conclusion / Summary / References

Classical Linear Regression (any introductory statistical/econometric book)

http://www-stat.stanford.edu/~tibs/lasso.html ,
LASSO and Ridge Regression (linear Bishop, 2006
and nonlinear)

Alpaydin, 2004,
Nonparametric (local) regression
Hastie et. el., 2001
estimation:
kNN for regression, Decision trees,
Smoothers
Smola and Schoelkopf, 2003

Support Vector Regression (linear and


nonlinear) Hastie et. el., 2001,
(any statistical/econometric book)

Variable/feature selection (AIC, BIC,

You might also like