You are on page 1of 13

Averages, Correlation,

Regression

For Assignment or Dissertation Help, Please


Contact:
Muhammad Sajid Saeed
+44 141 4045137
Email:
todrsaeed@gmail.com
Skype ID: tosajidsaeed

QUESTION 1
(i) Correlation Coefficient r
a) A perfect correlation
A perfect correlation can be of two types: perfect positive correlation and perfect negative
correlation. A perfect positive correlation indicates the result of correlation coefficient r =
+1 (Jain and Ohri, 2010). This means the both variables have a perfect positive relationship
with each other; for example, the relationship of supply and price. One the other hand, a
perfect negative correlation indicates the result of correlation coefficient r = -1 (Jain and
Ohri, 2010). This means the both variables have a perfect inverse relationship with each
other; for example, the relationship of demand and supply. Figure 1 demonstrates the
examples of each where all points lie on a single line.

b) A very strong correlation


According to a rule of thumb a very strong correlation means if the result of correlation
coefficient r is being closer to 1 preferably more than 0.80. The following diagram
illustrates an example of very strong correlation where all the points lie closer to the main
single line.

c) A null correlation
A null or no correlation illustrates no relationship between two variables X and Y. In this case
the result of correlation coefficient r is zero. Figure 3 illustrates an example of null
correlation where all values scattered and do not lie on a single line.

(ii) Product Moment Coefficient of Correlation


Year
1995
1996
1997
1998
1999
Total

XS
50
54
52
21
70
247

XS = Sales (000)
YA = Advert (000)

YA
8
31
12
12
15
78

XS YA
400
1674
624
252
1050
4000

XS2
2500
2916
2704
441
4900
13461

YA2
64
961
144
144
225
1538

n XSYA XS YA

n XS 2 XS n YA2 YA
2

5(4000) (247)(78)
5(13461) (247) 2 5(1538) (78)2

20000 19266
67305 61009 7690 6084

734
6296 1606

734
79.347 40.075

734
3179.839

R 0.231

Discussion on Results
The value of correlation coefficient R near to 0 represents a relatively low association
between sales and advertisement. This means that company was reluctant to develop any
adequate policy for advertising their products/services. Therefore, it is recommended to
company to increase its advertising budget in upcoming years.

QUESTION 2

Marks

Frequency

Midpoints

Cumulative Freq.

(%)

(f)

(x)

(c.f)

0 10
10 20
20 30
30 40
40 50
50 60
60 70
70 80
80 90
90 100

15
14
18
18
20
8
18
24
11
5
151

5
15
25
35
45
55
65
75
85
95

15
29
47
65
85
93
111
135
146
151

(i) Median Mark


Median Class = N/2 = 151/2 = 75.50

2 f BM
Median L1
w
Fm

151
2 65
Median 40
*10
20

Median 40

75.5 65 *10

Median 40

10.5 *10

20

20

Q1 class (37.75)

Median class (75.50)

Mode (max freq.) + Q3 class (113.25)

Median 40 (0.525) *10

Median 40 (5.25)

Median 45.25

(ii) Modal Mark


Mode Class = Maximum frequency

1
Mode Lm
1 2

6
*10
6 13

Mode 70

Mode 70 (0.3158) *10

Mode 70 (3.158)

Mode 73.158

(iii) Upper and Lower Quartiles


Lower Quartile (Q1)
Q1 Class = N/4 = 151/4 = 37.75

4 f BQ1
Q 1 LQ1
w
f Q1

151

4 29
Q 1 20
*10
18

Q 1 20

37.75 29 *10

Q 1 20

8.75 *10

18

18

Q 1 20 (0.4861)*10

Q 1 20 4.861

Q 1 24.861

Upper Quartile (Q3)


Q3 Class = 3N/4 = 3(151)/4 = 113.25

3N

4 f Q 3
Q 3 LQ 3
w
fQ3

3(151)

4 111
Q 3 70
*10
24

Q 3 70

113.25 111 *10


24

Q 3 70

2.25
*10
24

Q 3 70 (0.09375)*10

Q 3 70 0.9375

Q 3 70.9375

(iv) Quartile Deviation (Q.D)

Q.D

1
Q3 Q1
2

Q.D

1
70.9375 24.861
2

Q.D

46.0765
2

Q.D 23.03825

QUESTION 3
(i) Product Moment Coefficient of Correlation
Y
16
10

X3
5
7

YX3
80
70

Y2
256
100

X32
25
49

33
15
77
59
75
57
88
26
456

7
3
9
1
8
3
12
15
70

231
45
693
59
600
171
1056
390
3395

1089
225
5929
3481
5625
3249
7744
676
28374

Y = Copies sold (000)


X3 = Number of competing books

n YX 3 Y X 3

n Y 2 Y n X 3 2 X 3
2

10(3395) (456)(70)
10(28374) (456) 2 10(656) (70) 2

33950 31920
283740 207936 6560 4900

2030
75804 1660

2030
275.325 40.743

2030
11217.566

R 0.181

49
9
81
1
64
9
144
225
656

Interpretation of Results
The result of correlation coefficient r closer to zero does not show strong association
between Y (number of copies sold) and X3 (number of competing books). This means the number
of copies sold are not much dependent on the number of competing books.

(ii) Regression Analysis

Tab.1 Variables Entered/Removed


Model
1

Variables
Removed

Variables Entered
Cost, Number of Competing, Pages,
Advertising Budget a

Method
. Enter

a. All requested variables entered.

Tab.2 Model Summary


Model

R Square

Adjusted R Square

Std. Error of the Estimate

.959a

.921

.857

10.97068

a. Predictors: (Constant), Cost, Number of Competing, Pages, Advertising Budget

Tab. 3 ANOVA b

Model

Sum of Squares

df

Mean Square

Regression

6978.620

1744.655

Residual

601.780

120.356

Total

7580.400

Sig.

14.496

.006a

a. Predictors: (Constant), Cost, Number of Competing, Pages, Advertising Budget


b. Dependent Variable: Copies Sold

Tab. 4 Coefficients a
Unstandardised Coefficients

Model
1

Standardized
Coefficients

Sig.

1.006

.361

Std. Error

82.227

81.735

Pages

.126

.032

.670

3.918

.011

Advertising Budget

-.484

2.877

-.206

-.168

.873

Number of Competing

.428

7.909

.063

.054

.959

-4.946

2.808

-.456

-1.761

.139

(Constant)

Cost

Beta

a. Dependent Variable: Copies Sold

(iii) Discussion on Regression Output


a. The Regression Model
Table 1 demonstrates the independent variables entered to conduct regression analysis. This
regression model takes copies sold as a dependent variable which depends upon four
independent variables such as cost, number of competing, pages, and advertising budget. In
conducting multiple regression analysis, it is important to form the function of relationship
between dependent and independent variables. Therefore, in this model this multiple
regression model is represented as E (Y on X) = f(X1, X2, X3, X4) where Y denotes
dependent variable (copies sold) and X1, X2, X3, and X4 represent independent variables
respectively.
b. R-Squared and Adjusted R-Square
Table 2 represents the model summary part of the multiple regression analysis which is
performed through SPSS. This table illustrates that percentage of variability among all
independent variables. In this table, R is the square root of R Squared and shows the how
independent variables (cost, number of competing, pages, and advertising budget) are
associated with the dependent variable (copies sold). The R squared demonstrates the linear
correlation between dependent and independent variables (Albright, 2013). For instance in
table 2, the value of R square = 0.921 is closer to 1 which refers strong correlation between
copies sold and cost, number of competing, pages, and advertising budget.

On the other hand, Adjusted R-Squared indicates statistical shrinkage. The adjusted R-Square
deals with the severity of extra predictor variables and penalizes for the additional predictor
variables (Albright, 2013). The adjusted R-Square is actually the proportion of dependent and
independent variables and can be supportive in the selection of the appropriate model. But in
this regression model the shrinkage level .064 (.921-.857) is quite low which indicates the
relevancy of independent and dependent variables.
c. Significance of the Model-Significance F
The table 3 shows the statistical significance of the unpredictability of independent variables
(cost, number of competing, pages, and advertising budget) for dependent variable (copies
sold) through F and Significance observations. The Analysis of Variance (ANOVA) in table 3
with slightly high p-value (0.06) at a level of 0.05 represents a non-linear relationship among
the variables. This shows the probability that the association among variables is not due to
chance. Many researchers believe that the significance level should be less than 0.05 mark
(Upton and Cook, 2001; Caldwell, 2009). The higher F value 14.49 (1744.65/120.356) also
indicates the inappropriate relationships among variables.
d. Interpretation of the Model-based on Size and Sign
The table 4 gives the idea of the coefficient part about regression equation. The regression
equation of this model can be written as follows.
Y = 82.227 + .126 (pages) .484 (Advertising budget) + .428 (number of competing) 4.946
(cost)
It is evident from sig. column that accept pages with 0.11 significance value, none of the
other independent variables (cost, number of competing, and advertising budget) is a
significant predictor of dependent variable (copies sold). Similarly, the largest Beta value
0.670 also indicates that it is the best predictor of copies sold.
(iv) Predicting Number of Copies Sold for a Book
It is given that Pages = 350, Budget = 35, Competing books = 6, and Cost = 6
Putting these values in the regression equation below:

Y = 82.227 + .126 (pages) .484 (Advertising budget) + .428 (number of competing) 4.946
(cost)
Y = 82.227 + .126 (350) .484 (35) + .428 (6) 4.946 (6)
Y = 82.227 + 44.1 16.94 + 2.568 29.676
Y = 82.279
Therefore it is predicted that approximately 82 copies sold for information provided.
References
Albright, B. (2013). Essentials of Mathematical Statistics. USA, Burlington: Jones & Bartlett
Publishers
Caldwell, S. (2009). Statistics unplugged. 3rd edition, USA, Belmont: Cengage Learning.
Upton, G.J.G. and Cook, I.T. (2001). Introducing statistics. 2nd edition, Oxford: Oxford
University Press.
Jain, T.R. and Ohri, V.K. (2010). Statistics for economics. India: FK Publications

You might also like