You are on page 1of 31

Elements of regression

theory
Done by:
Borysenko Alina
Dubovik Maksim

Definition
Regression is a statistical measure that
attempts to determine the strength of
the relationship between one dependent
variable (usually denoted by Y) and a
series of other changing variables
(known as independent variables).

Definition

Regression analysis is often used in business or investment world to


attempt to predict the effect of certain INPUTS on an OUTPUT.

For example:

A company may want to see

A company may want to predict

if its sales can be predicted by

the effect of the price of steel on

a movement in the GDP

car-sales.

Problems of regression analysis


1.MULTICOLLINEARITY

case in which two or more explanatory


variables in the regression model are highly
correlated, making it difficult or impossible to
isolate their individual effects on the
dependent variable.

Problems of regression analysis


2. HETEROSCEDASTICITY

Error variance increases


with the increase of
Independent variable.

Problems of regression analysis


3. AUTOCORRELATION
casein which the error term in one time period is
correlated with the error term in any other time
period.
statistical relationship between the sequences of a
number of values taken with a shift a shift in time.

Problems of regression analysis


4. ERRORS IN VARIABLES
casein which the variables in the regression model
include measurement errors.
errors in the explanatory (independent) variables lead
to biased and inconsistent parameter estimates.

Correlation dependence
Correlation dependence is statistical
relationship between two or more
random variables.
Correlation ratio or correlation
coefficient (or ) are the mathematical
measure of correlation between two
random variables.

Correlation table

The primary task of statistical processing of the experimental data is the


systematization of the data.

That is when correlation table will help you:


Y/X

...

X takes values X1, X2, , Xm

Y takes values Y1, Y2, , Yk

n - frequency

Correlation table (example)


We studied the relationship between the quality of
goods Y (%) and quantity of goods X (pcs).
The observation results are shown in the form of
correlation table:
Y/X

18

70

75

80

22

26

5
46

7+46+1=54

29

72

29+72=101

29

85
90
5+7=12

30

46+29=75

1+72+29=1
02

29+8=37

8+3=11

100

Empirical lines of regression


Empirical regression is based on the data of the
combination group.
It represents the dependence of the group mean
values of dependent variables (Y) from the group
mean values of dependent variables (X).
Graphical representation of the empirical regression
is a broken line composed of points (abscissae group mean value of X, ordinates group mean
value of Y)

Empirical line of regression (example)

The dependence between the amount of sales of good (Y) and the costs
for its advertisement (X) is represented below:
X

1,5

4,0

5,0

7,0

8,5

10,0

11,0

12,5

5,0

4,5

7,0

6,5

9,5

9,0

11,0

9,0

Lets depict experimental data as points in Cartesian coordinates. The


broken line joining these points is called the empirical line of
regression:

The estimation of parameters using


Least Squares Method

Aline of best fitis a straight line that is the


best approximation of the given set of data. It
is used to study the nature of the relation
between two variables.

A more accurate way of finding the line of best


fit is theleast square method.

The estimation of parameters using


Least Squares Method
Use
the following steps to find the equation of line of best fit for a

set ofordered pairs (; ), ()


STEP 1 : Calculate the mean of the X-values and mean of the Yvalues.

The estimation of parameters using


Least Squares Method
STEP

2 : The following formula gives the slope (a) of


the line of best fit:

The estimation of parameters using


Least Squares Method
STEP

3 : Compute the y-intercept (b) of the line by


using the formula:

The estimation of parameters using


Least Squares Method
STEP

4 : Use the slope a and the y-intercept b to form


the equation of the line.

The estimation of parameters using


Least Squares Method (example)
X

11

12

10

12

14

1. Calculate the means:

The estimation of parameters using


Least Squares Method (example)
2.
1

1,6

-4

-6,4

2,56

10

-4,4

-13,2

19,36

11

4,6

-4

-18,4

21,16

-0,4

-1

0,4

0,16

-1,4

-1,4

1,96

12

-2,4

-12

5,76

12

5,6

-6

-33,6

31,36

2,6

-3

-7,8

6,76

-0,4

-0,8

0,16

10

14

-5,4

-37,8

29,16

-131

118,4

The estimation of parameters using


Least Squares Method (example)
2. Calculate

the slope:

3. Calculate the y-intercept:

4. Form the equation of the line:

Point estimations

Point estimations of parameters of regression


(a and b):

Point estimations (example)


X

15

17

11

14

14

13

15

10

13

14

14

84

36

13

91

49

15

90

36

15

60

225

17

136

289

10

90

81

11

66

121

13

78

36

10

14

70

196

93

95

779

1073

Point estimations (example)

Regression Slope: Confidence Interval


To construct aconfidence intervalfor the slope of the
regression line, we need to know thestandard errorof
thesampling distributionof the slope.
STEP 1 : Identify a sample statistic. The sample statistic is the
regression slope (a) calculated from sample data.
STEP 2 : Select a confidence level. The confidence level
describes the uncertainty of a sampling method. Often,
researchers choose 90%, 95%, or 99% confidence levels.
STEP 3 : Find the margin of error.
STEP 4 : Specify the confidence interval. The range of the
confidence interval is defined by thesample statistic+margin of
error.

Confidence Interval (example)

Regression equation:

Predictor

Coef

SE Coef

15

5,0

0,00

0,55

0,24

2,29

0,01

What is the 99% confidence interval for the slope of the


regression line?

Confidence Interval (example)

1. 99% confidence level (given)


2. Margin of error:
1. The standard error is given in the regression output. It is 0.24.
2.
3. Critical probability:
4. Degrees of freedom:
5. Using Table of Probabilities for Students t-Distribution, we find that
the critical value is 2.63.
6. Margin of error:
4. The 99% confidence interval is -0.08 to 1.18.

Nonlinear regression
Nonlinear regressionis a form ofregression
analysisin which observational data are
modeled by a function which is a nonlinear
combination of the model parameters and
depends on one or more independent
variables.
The data are fitted by a method of
successive approximations.

Nonlinear regression

Examples:

Nonlinear regression

Examples:

Multiple Regression

The general purpose of multiple regression is to learn more about the


relationship between several independent or predictor variables and a
dependent or criterion variable.

Personnel professionals use multiple regression procedures to


determine equitable compensation.You can determine a number of
factors or dimensions such as "amount of responsibility" (Resp) or
"number of people to supervise" (No_Super) thatyou believe to
contribute to the value of a job.

The personnel analyst then usually conducts a salary survey among


comparable companies in the market, recording the salaries and
respective characteristics for different positions. This information can be
used in a multiple regression analysis to build a regression equation
of the form:

Multiple Regression
Once this so-called regression line has been
determined, the analyst can now easily construct a
graph of the expected (predicted) salaries and the
actual salaries of job incumbents in his or her
company.
RESULT: Thus, the analyst is able to determine which
position is underpaid (below the regression line) or
overpaid (above the regression line), or paid equitably.

You might also like