Regression Analysis

Regression
What Is Regression
Regression is a measure of relation between a dependent variable and a set of independent variables which affect the value of the dependent variable.
Preference for purchase Vs price, popularity of the brand, product performance
The relationship derived is in the form of an equation Y = a+b1x1+b2x2+b3x3+

Where Y = dependent variable, x1,x2 = independent variable
Regression is usually done on variables that are measured on an interval scale.

2
Types of Regression
Linear :
Assumes linear relationship between variables Simple When on independent variable is used to predict the value of the dependent variable. When there are many independent variables used to predict the value of the dependant variable.
Multiple
Non linear :
When the relationship is non linear
Many models are available asymptotic, log linear, log logistic
In most cases the underlying relationship is assumed to be linear.

3
How To Determine Which Variables To Include In regression
Drop variables that are unlikely to affect value of dependent variable. Several models are available for eliminating variables from a regression analysis.
Eliminating independent variables having a low correlation to the dependent variable. Stepwise regression
Starting with the independent variable with the highest predictive value. And entering variables one by one examining at each stage, the improvement
over the predictive power in the previous iteration.
At each stage all variables in the equation are examined to check if they are
needed. And if at any stage they are found superfluous they are dropped.
Forward selection Similar to stepwise regression except that no variable is dropped once it is entered into the equation. Backward elimination Using all independent variables and eliminating variables that contribute the least, one by one.
4
Linear Regression : Standard Output and Interpretation (1)

Interpretation
The total variation of the dependent variable explained by the equation is 69%. This is a good fit and hence one can proceed to draw further inferences based on the assumption that the relationship is linear. Adjusted R2 is an improvement over R2 in that it takes into account the number of variables used for predicting. If R2 is low then the model cannot be assumed to be linear, further inferences should not be drawn in such cases.
Model summary
R2 0.693 Adjusted R2 0.694
Linear Regression : Standard Output and Interpretation (2)

ANOVA
Sum of squares Degrees of freedom (df) F Significance test If (100 sig test) value is high (95% or above) then the relationship exists. And the model is robust for prediction.
Interpretation
The F statistics is a measure of whether any relationship exists between the dependent and independent variable.
Linear Regression : Standard Output And Interpretation

Output
Constant B
Interpretation
Constant to be used in the regression equation There is a B value for each independent variable. It is the coefficient of each independent variable in the equation A unit change in the independent variable can cause B units of change in the dependent variable, if all other independent variables are constant It is the standard error of the coefficient B. It is the normalised value of B. And removes the effect of the scale differences in the independent variables. It is a measure of relative importance because it indicates the expected change in the dependent variable per unit change in the independent variable.
Standard error
Significance of t
If t is not significant, then the independent variable is not a good predictor. And should be removed from the analysis.
Applications Of Regression
Estimating Relative Importance Of variables In Choice

The values can be used as a measure of relative importance of independent variables in choice.
The rather than B value should be used as it eliminates problems related to differences in scale of measurement the independent variables. However, if all independent variables have been measured on the same scale then there would be no difference whether or B is used. Parameter Relative importance
Cleanliness Duration of billing
0.593 0.794
43% 57%
Dependent variable overall satisfaction with the store The inference
Both cleanliness and duration of billing are important contributors to overall satisfaction with the store. Duration of billing is a relatively more important contributor.
9
Forecasting
The regression equation can be used to predict the value of the dependent variable when the independent variable values are known. Y = a+b1x1+b2x2+b3x3+ Data available
Awareness for brand A during the period of a campaign. GRPs in TV for the ad campaign. What are the likely levels of awareness of brand A during the next campaign, for which estimates of GRP are available.
Can predict
10
Some Caveats To Remember While Predicting

The prediction can be done only for the range of values based on which the original estimation equation was obtained.
If the regression equation was obtained for the awareness of a brand vis--vis GRPs for a market leader, it cannot be extrapolated for a minority brand.
11
Is my model fit to predict sales ?

80 70 60 50 40 30 20 10 0 Actual Sales PR EDI CTI ON 1 PR EDI CTI ON 2
12
DISCRIMINANT ANALYSIS
13
What is Discriminant Analysis?

A modelling technique used when the dependent variable is a categorical variable and independent variables are continuous variables Applications

Selection Process for a job, Admission process of an educational program Dividing a group in potential buyer & non- buyer high risk low risk Y = a + k1x1+ k2x2 K1 and K2 should maximise the separation between two groups
Relationship is derived in the form of an equation
K1 and K2 are Coefficients of Independent Variable
14
Predicting the Group Membership

Model building based on the linear discriminant equation Y determinant score is calculated Cut Off point : Mid Point of mean discriminant scores of the two groups
15
Linear Discriminant Analysis Standard Outputs and Interpretation

Classification/ Confusion Matrix

Percent Correct/ Wrong Column 94.44% Model has correctly classified 94.44% of the cases Level of accuracy may not hold true for future predictions.. But is a good pointer towards model being a Good One
16

Wilks Lambda A low value of Wilks Lambda indicates high significance of the model F Test
P value is the decision criterion
17

Relative Importance of Independent Variables

Standardized Coefficients indicates relative importance of the variables Means of Canonical Variables Computed based on Raw co-efficient table Right side of Mid Point is Group 2 Left Side of Mid Point is Group 1
Classifying the cases
18
Case Study
A Business School selects its students every year through a written test, interview and group discussion. It then tracks the performance of students during the two year program by means of GPA. A GPA above 2.75 /4.0 is defined as Successful and below as Unsuccessful students.
Can you develop a model that predicts whether a student would be potentially successful or not.
19
How good is the model? Statistical Significance of the model Predictors Classification of new Student
20

Regression Analysis

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Regression Analysis

Uploaded by

Copyright:

Available Formats

Regression

Preference for purchase Vs price, popularity of the brand, product performance

The relationship derived is in the form of an equation Y = a+b1x1+b2x2+b3x3+

Regression is usually done on variables that are measured on an interval scale.

Many models are available asymptotic, log linear, log logistic

In most cases the underlying relationship is assumed to be linear.

How To Determine Which Variables To Include In regression

over the predictive power in the previous iteration.

Linear Regression : Standard Output and Interpretation (1)

Linear Regression : Standard Output and Interpretation (2)

Linear Regression : Standard Output And Interpretation

Estimating Relative Importance Of variables In Choice

Cleanliness Duration of billing

Dependent variable overall satisfaction with the store The inference

Some Caveats To Remember While Predicting

Is my model fit to predict sales ?

What is Discriminant Analysis?

Relationship is derived in the form of an equation

K1 and K2 are Coefficients of Independent Variable

Predicting the Group Membership

Linear Discriminant Analysis Standard Outputs and Interpretation

Linear Discriminant Analysis Standard Outputs and Interpretation

P value is the decision criterion

Linear Discriminant Analysis Standard Outputs and Interpretation

Classifying the cases

You might also like