Regression

Relationship Between
Variables
1
TYPES OF RELATIONSHIP
► Deterministic relationship (functional relationship)

► The relationship between two variables is known exactly
► Area of a circle= π r2
► F=k(m1m2/r2) (Newton’s law of gravity)
► Dollar sales of a product sold at a fixed price and the number of units sold.
► Probabilistic relationship (statistical relationship)

 The relation between variables are not know exactly and we have to
approximate the relationship and develop models that characterize
their main features.
2
Regression
► Regression analysis is a statistical technique
for investigating and modeling the relationship
between variables.
► The word regression is used to investigate the
dependence of one variable called the
dependent variable denoted by Y, on one or
more variables, called independent variables
denoted by X’s and provides an equation to be
used for estimating or predicting the average
value of the dependent variable from the
known values of the independent variables
3
Regression Analysis
► Regression Analysis is used to estimate a function f( )
that describes the relationship between a continuous
dependent variable and one or more independent
variables.
Y = f(X1, X2, X3,…, Xk) + ε
Note:
• f( ) describes systematic variation in the relationship.
• ε represents the unsystematic variation (or random error) in
the relationship
· Where Y=dependent, response, predeictand, Regressand
X=Independent, Stimulus, predictor, Regressor
4
Examples
► Sales=f(Adv.Expenditure)+E
► Fiber=f(Weight of jute plant)+E
► Consumption Exp.=f( Income) +E
► Yield=f( fertilizer, seed rate, rainfall)+E
► Marks=f(Study hours, IQ level)+E
► Demand=f(Price, Price of related commodities,
Consumer income, Consumer taste, Adv. Expenses
for creation of demand)+E
5
Model building with one
regressor
Example:-Consider the relationship between
Example:-Consider the relationship between
PRICE (Y) and square feet living area SQFT (X)
of a house.
• There probably is a relationship...
...as SQFT increases, PRICE should
increase.
• But how would we measure and quantify this
relationship?
6
Consider the data on sale price
PRICE(Y) SQFT(X)
( thousand of dollars) and living area ( in 199.9 1065
square feet) of 14 houses in a particular 228 1254
location
235 1300
Estimate the relationship between PRICE
and SQFT 285 1577
239 1600
293 1750
285 1800
365 1870
295 1935
290 1948
385 2254
505 2600
425 2800
415 3000
7
Scatter plot
500
400
PRICE
300
200
1000 2000 3000

SQFT
The observed data points do not all fall on a straight line but cluster
about it. Many lines can be drawn through the data points; the problem
is to select among them.
The method of LEAST SQUARE results in a line that
minimizes the sum of squared vertical distances
from the observed data points to the line (i.e 8
Random Error). Any other line has a larger sum

Population linear regression model
between PRICE and SQFT
µ PRICE SQFT
= β o + β 1SQFT + ε
► In formulating the above relation between PRICE
and SQFT we are ignoring the fact that the price
of a house depends on other characteristics as
well, such as a lot size and number of bathrooms
etc. Thus we are basically assuming that all
these effects are absorbed by the error term.
The error term is actually a combination of four different

effects
1. It accounts for the effect of variables omitted from
the model
2. It captures the effects of nonlinearities in the
relationship between Y and X
3. Measurement errors in X and Y are also absorbed by
the error term 9
4. Error term also includes inherently unpredictable

Estimation of Unknown
Parameters
After formulating the model next step is to
obtain the “best” estimates for unknown
parameters Bo and B1
There are several methods for the estimation
of the parameters
► Method of Least Squares (OLS)
► Method of Maximum Likelihood (MLE)
► Methods of Moments (MM)
10
Method of LEAST SQUARE
► The method of LEAST SQUARE results
in a line that minimizes the sum of
squared vertical distances from the
observed data points to the line (i.e
Random Error). Any other line has a
larger sum.
11
Best fit line to the data
LEAST SQUARE LINE
A least square line is described in terms of its Y-
intercept (the height at which it intercepts the Y-
axis) and its slope (the angle of the line). The line
can be expressed by the following relation
Y=bo+ b1X (Estimated regression of Y on X)
Where
S XY
► b1= Slope of line b1 = 2
S X
− −
► bo=intercept of the line
bo = Y − b X
Sxy =Covariance between X & Y

S2x=Variance of Variable X.
12
Calculation for Least square
line
Y=a + bX (Estimated regression of Y
on( XX) − −
S xy =
∑ − X )(Y − Y )  X Y
=
1
∑ XY −
∑ ∑  46315.31
=
n −1 .n −1 n 
S2X =
∑ ( X − X )2
=
1 
∑
 X2−
( ∑X )2 
 = 333803 .30
n −1 .n − 1  n 
 
S XY 46315 .31
b1 = = = 0.13875
S2X 333803 .30
bo = Y − b1 X = 317.49 − (0.13875) x1910.93 = 52.351
Y=52.35 + 0.1388 X 13
Interpretation of the estimated
parameters
PRICE=52.35+0.1388SQFT
► The value of b1=0.1388, indicates that
the average price of a house is expected
to go up by 0.13875 thousands of dollars
( i.e $138.75) with each unit(100-Square
foot) increase in living area
► The value of bo is indicates that average
estimated price of an empty lot (X=0) is
$52,351 but this interpretation is not
always true be careful in interpreting the
intercept coefficient when scope of the
model does not cover X=0
14
Fitted Least Square Line
PRICE(Y) SQFT(X) Y^=52.35+0.1388SQFT
199.9 1065 200.12
600
228 1254 226.344
500
235 1300 232.726
285 1577 271.16 400
Y
239 1600 274.351 300
Y
P re d i c te d Y
293 1750 295.164 200
285 1800 302.101 100
365 1870 311.814

0
295 1935 320.833 0 1000 2000 3000 4000
290 1948 322.637

385 2254 365.094
505 2600 413.102
425 2800 440.852
15
415 3000 468.602
Properties of least Square
Line
► Sum of errors is always equal to zero

► Sum of squared errors is minimum
► Least square line always passes through
the mean of the data
16
Ŷ=52.35+0.1388X Least Square Line
Ŷ=50+0.1388X other Line
LEAST SQUARE OTHER LINE
Y X LINE e =Y-Y^
Y^ e2 Y^ e e2
199.9 1065 200.12 -0.22 0.05 197.82 2.08 4.32
228 1254 226.34 1.66 2.74 224.06 3.94 15.56
235 1300 232.73 2.27 5.17 230.44 4.56 20.79
285 1577 271.16 13.84 191.54 268.89 16.11 259.61
239 1600 274.35 -35.35 1249.72 272.08 -33.08 1094.29
293 1750 295.16 -2.16 4.68 292.90 0.10 0.01
285 1800 302.10 -17.10 292.46 299.84 -14.84 220.23
365 1870 311.81 53.19 2828.75 309.56 55.44 3074.04
295 1935 320.83 -25.83 667.33 318.58 -23.58 555.92
290 1948 322.64 -32.64 1065.14 320.38 -30.38 923.09
385 2254 365.09 19.91 396.24 362.86 22.14 490.39
505 2600 413.10 91.90 8445.30 410.88 94.12 8858.57
425 2800 440.85 -15.85 251.28 438.64 -13.64 186.05
415 3000 468.60 -53.60 2873.16 466.40 -51.40 2641.96
4444.9 26753 4444.9 0 18273.58 4413.32 31.58 18344.83
17
600 Equation
500
400
Y
Y
300
PredictedY
200
100
0
0 1000 2000 3000 4000
The observed values of (X,Y) do not all fall on the regression line but
they scatter away from it. The degree of scatter of the observed values
about the regression line is measured by what is called standard error of
estimate or standard error of regression and denoted by Se.
To measure the accuracy or reliability of the estimating regression, we

need to compute the standard error of the estimate also called standard
error of regression. The standard error of regression measures the
variability of observed points about the regression line. A small variation
18
Standard Error of Estimate
2 ∑ (Y − Yˆ ) 2 ∑ ∑ Y − b1∑ XY = 1522.79
Y 2 − bo
S e
=
n− 2
OR
n− 2
S e
= 39.023
The standard deviation can be interpreted as

the average deviation of the points from the
estimated regression line and hence small
deviation is desire for the better fitted model
19
Precision of Estimators bo & b1
► The OLS estimators are random variables as the
may take different values for different samples and
a variance of a random variable is a measure of its
dispersion around the mean.
► The smaller the variance the closer, on average,
individual values are to the mean thus the variance
of an estimator is an indicator of the precision of
the estimator
1 X2
SE (bo ) = S e + = 37 .285
n (n −1) S x2
1
SE (b1 ) = S e = 0.01873
( n −1) S x2 20
Inference in Simple Linear Regression
(From samples to population)
► Generally, more is sought in regression
analysis than a description of observed data.
One usually wishes to draw inferences about
the relationship of the variables in the
population from which the sample was taken
► The slope and the intercept estimated from a
single sample typically differ from the
population values and vary from sample to
sample. To use these estimates for inference
about the population values, the sampling
distributions of the two statistics are needed
21
Test of hypothesis
1) Construction of hypotheses
for β 1
Ho : β 1 = 0
H1: β 1 ≠ 0
2) Level of significance
α = 5%
3) TEST STATISTIC t = b1 − β 1 = 0.1388− 0 = 7.41*
SE(b1) 0.01873
4) Decision Rule:- Reject Ho if |tcal | ≥ tα /2(n-2) =2.56
5) Result:- Soreject Ho andconcludethat thereissignificant relationshipbetweenPRICEand
SQFT
22
Confidence intervals for regression
parameters
► A statistic calculated from a sample provides
a point estimate of the unknown parameter.
► A point estimate can be thought of as the
single best guess for the population value.
► While the estimated value from the sample
is typically different from the value of the
unknown population parameter, the hope is
that it isn’t too for away.
► Based on the sample estimates, it is
possible to calculate a range of values that,
with a designated likelihood, includes the
population value. Such a range is called a
confidence interval.
23
95% C.I for β 1
95% C.I can be interpret as
b1± tα / 2 ( n− 2 ) ( SE (b1) ) If we take 100 samples of
the same size under the
same conditions and
0.1388± t .025(12)
( 0.01873)
compute 100 C.I’s about
parameter, one from each
sample, then 95 such C.Is
will contain the parameter
(0.0909 , 0.1867 ) (i.e not all the constructed
C.Is)
Confidence interval
estimate of a parameter is
more informative than
point estimate because it 24
reflects the precision of the

Relation between Confidence
interval and two sided
hypothesis
If the constructed confidence interval does not contain
the value of parameter under Ho then reject Ho
Ho : β 1 = 0
H1: β 1 ≠ 0
95% C.I for β 1 0.1388 ± t .025 (12 )

( 0.01873 )
(0.0909 , 0.1867 )
As the constructed confidence interval (0.0909 ,

0.1867) 25
Does not contain the value of B1 under Ho (i.e 0) so

Partition of variation in dependent
variable into explained and unexplained
variation
Total variation= Explained variation (Variation due to X also called
(variation due to regression)
+
Unexplained variation (Variation due to unknown factors)
Total Variation= (n-1)S2y=101815
Explained variation =(b1)(n-1)Sxy =83541
Un-Explained variation =101815-83541 =18274
26
Goodness of Fit
A commonly used measure of the goodness of fit of a
linear model is R2 called coefficient of determination. If
all the observations fall on the regression line R2 is 1. If
no linear relationship between Y & X R2 is 0.
The co-efficient of determination tells us the proportion
of variation in the dependent variable explained by the
independent variable

Coefficient of determination (R2)=(Explained/Total Variation)x100=82%
The higher the coefficient of determination is, the better the

regression function explains the observed values. The value of
R2 indicates that about 83% variation in the dependent variable
has been explained by the linear relationship with X and
remaining are due to some other unknown factors.
27
The hypothesis β 1=0
by analysis of variance
procedure.
ANOVA TABLE
Source Of Degree of Sum of Mean Sum Fcal Ftab p-Value
Variation Freedom Squares of Squares
(S.O.V) (DF) (SS) (MSS=SS/df)
Regression 1 83541 83541 54.86* F.05(1,12) =4.84 0.000
Error 14-2=12 18274 1523
TOTAL 13-1=12 101815
Relation between F and t for testing β 1=0
F=t2 i.e 54.86=(7.41)2

28
Estimation in regression
(Predicting unknown value of Y from known value
of X)
► Estimate the Price of a house with living area
350 square feet
Put SQFT=350 in the estimated equation
Y=52.35 + 0.1388(350)=329.95 or $329,950
► NOTE:-Predictions made using an estimated regression

function may have little or no validity for values of the
independent variables that are substantially different
from those represented in the sample.
29
Example:-[2]
Find the least squares regression line for the data
on incomes (in hundreds of dollars) and food
expenditure on the seven households
Income Food Expenditure

35 9
49 15
21 7
39 11
15 5
28 8
25 9
30
Scatter Diagram
A plot of paired
Food expenditure
observations is
called a
scatter
diagram.
Income
31
Scatter diagram and straight
Food
expenditure lines.
Income
32
Least Squares Line
Regression line and random errors.
e
Food expenditure
Regression line
33
Income
Error Sum of Squares (SSE)
The error sum of squares, denoted SSE, is
SSE = ∑ e = ∑ ( y − yˆ )
2 2
The values of a and b that give the minimum

SSE are called the least square estimates of
A and B, and the regression line obtained with
these estimates is called the least square
line.
34
The Least Squares Line
For the least squares regression line ŷ = a + bx,
bx,
Sxy
b= 2
and a= y− bx
S x
( ∑ x )−( ∑)y 
1( )x = x
2
1  2
Sxy =  ∑ xy  and S x 2 ∑
n−1 n   n 1 − 
  
35
Solution
Income Food
x Expenditure xy x²
y
35 9 315 1225
49 15 735 2401
21 7 147 441
39 11 429 1521
15 5 75 225
28 8 224 784
25 9 225 625
36
Σx = 212 Σy = 64 Σxy = 2150 Σx² = 7222

∑ x =212 ∑ y =64
x =∑ x / n =212 / 7 =30 .2857
y =∑ y / n =64 / 7 =9.1429
SS xy =
1 
∑− xy
( ∑ x∑)
=
( y 1)  (212)(64) 
−2150  = 35.285
n −1  n  6  7 

 
 ( ∑  x) 1
2
1  2  (212)  2
S x =
2
n −1 
∑ −x =
n  6
−7222  =
7
133.571

   
S xy 35.285
b = 2= = 0.2642
S x 133.571
a =y −bx= −
9.1429 =
(.2642)(30.2857) 1.1414
Fitted line ŷ = 1.1414 + 0.2642x 37
Error of prediction.
ŷ = 1.1414 + .
Food expenditure
2642x
Predicted = $1038.84
e Error = -
$138.84
Actual = $900
Income
38
Interpretation of a and b
ŷ = 1.1414 + .2642 X
Interpretation of a
Consider the household with zero income
ŷ = 1.1414 + .2642(0) = $1.1414
hundred
Thus, we can state that households with no
income is expected to spend $114.14 per
month on food
39
Interpretation of a and b cont.
ŷ = 1.1414 + .2642 X
Interpretation of b
The value of b in the regression model gives the
change in y due to change of one unit in x
We can state that, on average, a $1 increase in
income of a household will increase the food
expenditure by $0.2642
The regression line is valid only for the values of x
between 15 and 49 (Scope of the model)
40
Goodness of Fit
R2=92%
The value of R2 indicates that about
92% variation in the dependent
variable has been explained by the
linear relationship with X and
remaining are due to some other
unknown factors.
41
Positive and negative linear relationships
between x and y.
y y
b<0
b>0
(a) Positive linear x (b) Negative linear x

relationship. relationship.
42
Example:[3]:- Driving Monthly Auto
A random sample of Experience Insurance
eight drivers insured (years) Premium($)
with a company and
having similar auto 5 64
insurance policies was
2 87
selected. The following
table lists their driving 12 50
experience (in years) 9 71
and monthly auto 15 44
insurance premiums. 6 56
25 42
16 60
43
a) Does the insurance premium depend on
the driving experience or does the driving
experience depend on the insurance
premium? Do you expect a positive or a
negative relationship between these two
variables?
a) The insurance premium depends on

driving experience
 The insurance premium is the dependent
variable
 The driving experience is the
independent variable 44
b) Plot the scatter diagram and identify
the nature and strength of relationship.
Insurance premium
Negative and
moderate
Experience
45
c) Find the least squares regression line by
choosing appropriate dependent and
independent variables based on your
answer in part a.
Experience Premium
x y xy x² y²
5 64 320 25 4096
2 87 174 4 7569
12 50 600 144 2500
9 71 639 81 5041
15 44 660 225 1936
6 56 336 36 3136
25 42 1050 625 1764
16 60 960 256 3600
Σx = 90 Σy = 474 Σxy = 4739 Σx² = 1396 Σy² = 29,642

46
c)
x= ∑x/n=90/8=11.25
y= ∑ y/n=474/8=59.25
(∑ x)( ∑ y) (90)(474)
SSxy =∑ xy- =4739- =-593.5
n 8
SSxx =∑ 2
(
x -
∑ x)2
(90)
=1396-
2
=383.5
n 8
SSyy =∑ 2
(
y -
∑ y)2
(474)
=29,642-
2
=1557.5
n 8
47
LEAST SQUARE REGRESSION
LINE
SSxy − 593.5000
b= = = − 1.5476
SSxx 383.5000
a = y − bx = 59.25 − (− 1.5476)(11.25) = 76.6605
yˆ = 76 .6605 −1.547 x
48
d) Interpret the meaning of the
values of a and b calculated
yˆ = 76 .6605 −1.547 x
a = 76.6605 gives the value of ŷ for x = 0
Amount of monthly premium with no
driving experience
b = -1.5476 indicates that, on average,
for every extra year of driving
experience, the monthly auto insurance
premium decreases by $1.55.
49
f) Calculate coefficient of
determination
R² = 59%
59% of the total variation in insurance
premiums is explained by years of
driving experience and 41% is due to
other unknown factors
50
Predict the monthly auto insurance
for a driver with 10 years of driving
experience.
The predict value of y for x = 10 is
ŷ = 76.6605 – 1.5476(10) = $61.18
51
egression with more than one independent variable
Example:-The following information has been
gathered from a random sample of apartment’s
renters in a city. We are trying to predict rent (in
dollars per month) based on the size of the
apartment (number of rooms) and the distance from
downtown
Rent Number(in Distance
miles)
($) of rooms ( miles)
[Y] [X1] [X2]
360 2 1
1000 6 1
450 3 2
525 4 3
350 2 10
300 1 4
52
Matrix Plot
to identify the relationship
X1
X2
53
Least square regression by
SPSS
a
Co efficien ts
Standardi
zed
Unstandardized Coefficien
Coefficients ts
M odel B Std. Error Beta t Sig.
1 (Constant) 96.458 118.121 .817 .474
X1 136.485 26.864 .943 5.081 .015
X2 -2.403 14.171 -.031 -.170 .876
a. Dependent Variable: Y
54
Regression equation:RENT =
96.458 +136.485 NUM_ROOM –2.403 DISTANCE
► b1=136.485, means keeping distance

constant, rent will increase, on the
average, by $ 136 with each increase of
one room.
► b2=-2.403, means by keeping number of
rooms constant the rent will decrease, on
the average by $ 2.403 with each one
mile increase in downtown distance.
► In this example bo has no valid
interpretation
55
Goodness of fit
► R2=92%
A high value of R2 indicates that much of
the variation in rent has been
explained by regressors; number of
rooms and downtown distance
56
ANOVA considering both X1 &
X2
ANOVAb
Sum of
Model Squares df Mean Square F Sig.
1 Regression 306910.2 2 153455.102 16.280 .025 a
Residual 28277.297 3 9425.766
Total 335187.5 5
a. Predictors: (Constant), X2, X1
b. Dependent Variable: Y
Coefficients a
Standardi
zed
Coefficients ts
Model B Std. Error Beta t Sig.
1 (Constant) 96.458 118.121 .817 .474
X1 136.485 26.864 .943 5.081 .015
X2 -2.403 14.171 -.031 -.170 .876
57
ANOVA considering X1
Model Summary
Adjusted Std. Error of

Model R R Square R Square the Estimate
1 .956 a .915 .894 84.4814
a. Predictors: (Constant), X1
ANOVA b
Sumof
Model Squares df Mean Square F Sig.
1 Regression 306639.1 1 306639.063 42.964 .003 a
Residual 28548.438 4 7137.109

Total 335187.5 5
a. Predictors: (Constant), X1
Coefficients a
Standardi
zed
Coefficients ts
Model B Std. Error Beta t Sig.
1 (Constant) 82.188 72.140 1.139 .318
X1 138.438 21.120 .956 6.555 .003
58
QUESTION: Which regressor (Independent
Variable) is relatively more important in
explaining variation in response
variable(dependent variable)
Answer:- Use standardized regression
coefficient Y=RENT
S x1
b1* =b1
X1=Number of Rooms
Sy X2=Distance
Sy=Standard deviation of Y
Sx2 Sx1=Standard Deviation of X1
b 2* =b 2 Sx2=Standard Deviation of X2
Sy b1,b2=Unstandardized Coefficients
b1*,b2*=Standardized Coefficients
59
Standardized Regression Coefficient
(Beta Coefficient)
Often the independent variables
C o e f f i c iae n t s
are measured in different
units. The standardized S ta n d a rd i
zed
coefficients or betas are an U n s t a n d a r d iz eCd o e f f ic ie n
attempt to make the C o e f f ic ie n t s ts
M odel B S td . E r r o r B e ta t S ig .
regression coefficients more 1 ( C o n s t a n t )9 6 . 4 5 8 1 1 8 . 1 2 1 .8 1 7 .4 7 4
comparable. A high value of X1 1 3 6 .4 8 5 2 6 .8 6 4 .9 4 3 5 .0 8 1 .0 1 5
standardized coefficients i.e X2 -2 .4 0 3 1 4 .1 7 1 - .0 3 1 - .1 7 0 .8 7 6
a .D e p e n d e n t V a r ia b le : Y
bata coefficient indicates the
relative importance of the
independent variable
Number of rooms is more

important as compare to
distance 60
The Quadratic Regression Model
► The relationship between the dependent
variable and an independent variable may not
be linear
► Review the scatter diagram to check for non-
linear relationships
► Non-linear relationships might be modeled with
a quadratic relationship
Yi = β 0 + β1X i + β 2 X i2 + ε i
Quadratic Model :
The Second Explanatory Variable is the Square
of the First Variable
61
Quadratic Regression Model
Yi = β 0 + β1X i + β 2 X i + ε i
2
Quadratic models may be considered when the scatter

diagram takes on one of the following shapes:
Y Y Y Y
X1 X1 X1 X1
β1 < 0 β1 > 0 β1 < 0 β1 > 0
β2 > 0 β2 > 0 β2 < 0 β2 < 0
β1 = the coefficient of the linear term
β2 = the coefficient of the squared term
62
Estimation of Quadratic Regression
Model
Yi = β 0 + β1X i + β 2 X + ε i 2
i
Convert the 2nd degree model to multiple linear regression model by using
transformation X1=X and X2=X2
Yi = β 0 + β1X1i + β 2 X 2i + ε i
The above model is multiple linear regression model with two regressors
where 2nd regressor is the square of ist regressor
63
Testing for Significance: Quadratic
Model
► Testing the Quadratic Effect
 Compare quadratic model
Yi = β 0 + β1X i + β 2 X i2 + ε i
with the linear model
Yi = β 0 +β1X i +ε i
 Hypotheses
► H0 : β2 = 0 (No quadratic term)
► H1 : β 2 ≠ 0 (Quadratic term is needed)
64
Heating Oil Example
40.80 73 6
Determine if a quadratic 94.30

230.90
64
34
6
6
model is needed for 366.70
300.60
9
8
6
10
estimating heating oil used for 237.80 23 10
121.40 63 3
a single family home in the 31.40 65 10
month of January based on 203.50

441.10
41
21
6
3
average temperature and 323.00
52.50
38
58
3
10
amount of insulation in
inches.
65
Scatter Diagram
a) Oil used VS Temp
b) Oil used VS insulation
Fig. a:- Ist degree curve is appropriate Fig. b:- 2nd degree curve is appropriate
66
M o d e l S u m m a ry
A d ju ste d S td . E rr o r o f
M odel R R S q u a r e R S q u a re th e E stim a te
1 a
.9 8 6 .9 7 3 .9 6 5 2 4 .2 9 3 7 8
a . P r e d icto rs: (C o n sta n t) , X 2 _ 2 , X 1a, X 2
Co efficien ts
Unstandardized Standardized
Coefficients Coefficients
M odel B Std. Error Beta t Sig.
1 (Constant) 624.586 42.435 14.719 .000
X1 -5.363 .317 -.854 -16.910 .000
X2 -44.587 14.955 -1.019 -2.981 .012
X2_2 1.867 1.124 .568 1.661 .125
ˆ
Y = 624.59 − 5.36X 1 − 44.59X 2 + 1.87X 2
2
67
Test of overall significance of regression
Ho : β1 = β2 = β3 = 0
H 1 : Atleast one β is not zero
RegSS
Reg df RMS
F= = = 129 .70 *
ESS EMS
Edf
b
A N O VA
Sum of
M odel Squares df M ean Square F Sig.
1 R egression 229643.2 3 76547.721 129.701 .000a
R esidual 6492.065 11 590.188
T otal 236135.2 14
a. Predictors: (C onstant), X2_2, X1, X2
68
Heating Oil Example:
►Testing the Quadratic Effect

 Model with quadratic insulation term
Yi = βwithout
 Model 0 + β1 X +β X
1quadratic
i +βX
2
2 2 i insulation
3 2i iterm +ε
Yi = β 0 + β1
 Hypotheses X1 i + β2 X2 i +iε
► (No quadratic term in insulation)
► (Quadratic term is needed in insulation)
H0 : β3 = 0
H1 : β 3 ≠ 0
69
Test of significance of quadratic
term
Is quadratic term in insulation needed on monthly
consumption of heating oil? Test at α = 0.05.
H0: β =0
3
Test Statistic:
H1: β ≠ 0 b3 − β3 1.8667− 0
3 t= = = 1.6611
df = 11 Sb3 1.1238
P-value=0.1249
Decision: Do not reject H0 at α = 0.05
C o e f f ic iean t s
Conclusion: There is not
U n s ta n d a r d iz e d S ta n d a r d iz e d
sufficient evidence for the C o e ffic ie n ts C o e ffic ie n ts
need to include quadratic M odel

1
B S td . E r r o r B e ta
( C o n s ta n t) 6 2 4 .5 8 6 4 2 .4 3 5
t
1 4 .7 1 9
S ig .
.0 0 0
effect of insulation on oil X1 - 5 .3 6 3 .3 1 7 - .8 5 4 - 1 6 .9 1 0 .0 0 0
X2 - 4 4 .5 8 7 1 4 .9 5 5 - 1 .0 1 9 - 2 .9 8 1 .0 1 2
consumption. X2_2 1 .8 6 7 1 .1 2 4 .5 6 8 1 .6 6 1 . 170
25
a . D e p e n d e n t V a r ia b le : Y
Test of Quadratic term
► F-Test
F-Test for Quadratic term
Unrestricted model / Restricted model /
Full model model under Ho
Yi = β 0 + β1 X1 i + β2 X2 i +3β2X2 i +i εH 0 : β
3 =
0 Yi = β 0 + β1 X1 i + 2β X2 i +
Unrestricted ANOVA
Restricted ANOVA
SOV DF SS SOV DF SS
REGRESSION K-1= 3 REGRESSION K-1= 2
(x1 , x2, x22) (x1 , x2)
ERROR n-K=10 ESS(UR) ERROR n-K=11 ESS(R)
TOTAL n-1=14 TOTAL n-1=14
( ESS − ESS (UR ) )

( Edf − Edf (UR ) )
(R)
( R)
F=
ESSUR
EdfUR 71
Test of Quadratic term
Unrestricted model / Restricted model /
Full model model under Ho
Yi = β 0 + β1 X1 i + β2 X2 i +3β2X2 i ε0:β
+i H 3 =
0 Yi = β 0 + β1 X1 i + 2β X2 i +
Unrestricted ANOVA Restricted ANOVA
ANOVA b ANOVA b
Sum of Sum of
Model Squares df Model Squares df
1 Regression 229643.2 3 1 Regression 228014.6 2
Residual 6492.065 11 Residual 8120.603 12
Total 236135.2 14 Total 236135.2 14
b. Dependent Variable: Y b. Dependent Variable: Y
( ESS − ESS (UR ) ) (8120 .603 − 6492 .065

( Edf − Edf (UR ) )
(R)
(R) (12 − 11)
F= = = 2.76 ns
ESSUR 6492 .065
Edf UR 11
Ftab = F(.05,1,11) = 4.84 Conclusion: There is not sufficient evidence for
the need to include quadratic effect of insulation on72
oil consumption.

Regression

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Regression

Uploaded by

Copyright:

Available Formats

Relationship Between

► Deterministic relationship (functional relationship)

► Probabilistic relationship (statistical relationship)

1000 2000 3000

Random Error). Any other line has a larger sum

The error term is actually a combination of four different

4. Error term also includes inherently unpredictable

Sxy =Covariance between X & Y

293 1750 295.164 200

285 1800 302.101 100

365 1870 311.814

290 1948 322.637

► Sum of errors is always equal to zero

To measure the accuracy or reliability of the estimating regression, we

The standard deviation can be interpreted as

reflects the precision of the

95% C.I for β 1 0.1388 ± t .025 (12 )

As the constructed confidence interval (0.0909 ,

Does not contain the value of B1 under Ho (i.e 0) so

The higher the coefficient of determination is, the better the

Relation between F and t for testing β 1=0

F=t2 i.e 54.86=(7.41)2

► NOTE:-Predictions made using an estimated regression

Income Food Expenditure

Regression line and random errors.

The error sum of squares, denoted SSE, is

The values of a and b that give the minimum

Σx = 212 Σy = 64 Σxy = 2150 Σx² = 7222

(a) Positive linear x (b) Negative linear x

a) The insurance premium depends on

Σx = 90 Σy = 474 Σxy = 4739 Σx² = 1396 Σy² = 29,642

ŷ = 76.6605 – 1.5476(10) = $61.18

► b1=136.485, means keeping distance

Adjusted Std. Error of

Residual 28548.438 4 7137.109

Sx2 Sx1=Standard Deviation of X1

Number of rooms is more

Quadratic models may be considered when the scatter

Determine if a quadratic 94.30

month of January based on 203.50

►Testing the Quadratic Effect

need to include quadratic M odel

Unrestricted model / Restricted model /

Full model model under Ho

( ESS − ESS (UR ) )

( ESS − ESS (UR ) ) (8120 .603 − 6492 .065

You might also like