You are on page 1of 8

Final Exam Anly 502

By Vatsal Patel

Q1. Using the attached worksheet, create a scatter plot and draw the regression line. Please
consider red triangular elements for data points, a dashed line for regression, and a frame for the
plot by applying the appropriate arguments (30 points)
Ans.

> Datadefect = read.csv("D:/Vatsal/VP/HBU/8th Sem/Ana 502/Dataset.csv",header = TRUE)


> Datadefect
Profit Number.of.Defective.Items
1 35 974
2 490 693
3 777 248
4 922 277
5 519 509
6 520 635
7 899 200
8 391 743
9 577 563
10 419 715
11 667 397
12 399 720
13 540 659
14 954 123
15 1078 8
16 563 444
17 619 464
18 625 483
19 351 715
20 674 444
21 547 639
22 578 503
23 609 565
24 228 785
25 871 286
26 188 842
27 632 480
28 442 721
29 442 571
30 1114 25
31 864 272
32 825 241
33 750 252
34 615 500
35 445 674
36 282 732
37 409 701
38 637 401
39 646 536
40 999 156
41 232 824
42 152 964
43 874 212
44 981 218
45 289 747
46 771 356
47 806 303
48 921 113
49 150 883
50 113 910
51 1084 85
52 350 745

> plot(as.integer(Defectdata$Profit), as.integer(Defectdata$Number.of.Defective.Items),pch = 2


, col = "red")
> abline(lm(as.integer(Datadefect$Number.of.Defective.Items)~as.integer(Datadefect$Profit)),lt
y=1)

Q2. A fish survey is done to see if the proportion of fish types is consistent with previous years.
Suppose, the3 types of fish recorded: parrotfish, grouper, tang are historically in a 5:3:4
proportion and in a survey the following counts are found Please do a test of hypothesis to see if
this survey of fish has the same proportions as historically. (30 points)

Ans.
> FishSurvey = c(53,22,49)
> FishSurvey
[1] 53 22 49
> Historicdata = c(5,3,4)
> Historicdata
[1] 5 3 4
> chisq.test(FishSurvey,Historicdata)

Pearson's Chi-squared test


data: FishSurvey and Historicdata
X-squared = 6, df = 4, p-value = 0.1991

As we can see here that p-value for chi square test is above 0.05 so we can accept the hypothesis
that proportions are same.

Q3. It is well known that the more beer you drink, the more your blood alcohol level rises.
Suppose we have the following data on student beer consumption Make a scatterplot and fit the
data with a regression line. Test the hypothesis that another beer raises your BAL by 2 percent
against the alternative that it is not. (40 points)

Ans.
BeersCount = c(5,2,9,8,3,7,3,5,3,5)
> BeersCount
[1] 5 2 9 8 3 7 3 5 3 5
> AlcBAL = c(0.10,0.03,0.19,0.12,0.04,0.095,0.07,0.06,0.02,0.05)
> AlcBAL
[1] 0.100 0.030 0.190 0.120 0.040 0.095 0.070 0.060 0.020 0.050
> ScatterPlot = plot(BeersCount,AlcBAL)
> abline(lm(AlcBAL~BeersCount))
> summary(lm(AlcBAL~BeersCount))

Call:
lm(formula = AlcBAL ~ BeersCount)

Residuals:
Min 1Q Median 3Q Max
-0.0275 -0.0187 -0.0071 0.0194 0.0357

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.018500 0.019230 -0.962 0.364200
BeersCount 0.019200 0.003511 5.469 0.000595 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.02483 on 8 degrees of freedom


Multiple R-squared: 0.789, Adjusted R-squared: 0.7626
F-statistic: 29.91 on 1 and 8 DF, p-value: 0.0005953

As we can see above that Alpha value is 0.02% while p-value is 0.0005 which is greater than alpha
so we can accept the hypothesis that having another beer will rise alcohol level to 0.02% against
the alternative that it is not.

Q4. What is the max and min of F value (F statistics) to accept the null hypothesis for 7 df for
numerator, and12 df for denominator? (alpha = 0.1 and two sided F distribution) (20 points)

Ans.

> qf(0.95,df1=7,df2 =12)


[1] 2.913358

Max F value = 2.913358

Min F value = 1/2.913358 = 0.343246

From the F-distribution table we can see that F critical for 7 df for numerator, and 12 df for
denominator is 2.9134.
So the max F value to accept null hypothesis is 2.9134 while min of F-value to accept null
hypothesis is 1/(2.9134)
which is 0.343246
Q5. Please perform step-by-step ANOVA analysis process for below dataset, and discuss the
results at each step. Finally answer the question of do all the three drugs has the same impact on
patients and if yes, how they are different? (The 3 steps include “graphical comparison”, “fitting
ANOVA model” and “Why and how the means are different”). (80 points)
Drug A 3,5,6,1,2,4,5,7,8,9,0,10
Drug B 6,2,3,2,1,6,8,1,5,5,3,9
Drug C 4,7,3,7,3,8,5,4,6,5,1,8
(Drug impact factors out of 10)

Ans.

> DrugA= c(3,5,6,1,2,4,5,7,8,9,0,10)


> DrugA
[1] 3 5 6 1 2 4 5 7 8 9 0 10
> DrugB = c(6,2,3,2,1,6,8,1,5,5,3,9)
> DrugB
[1] 6 2 3 2 1 6 8 1 5 5 3 9
> DrugC= c(4,7,3,7,3,8,5,4,6,5,1,8)
> DrugC
[1] 4 7 3 7 3 8 5 4 6 5 1 8
> Drug = data.frame(DrugA,DrugB,DrugC)
> Drug = stack(Drug)
> names(Drug)
[1] "values" "ind"
> Drug
values ind
1 3 DrugA
2 5 DrugA
3 6 DrugA
4 1 DrugA
5 2 DrugA
6 4 DrugA
7 5 DrugA
8 7 DrugA
9 8 DrugA
10 9 DrugA
11 0 DrugA
12 10 DrugA
13 6 DrugB
14 2 DrugB
15 3 DrugB
16 2 DrugB
17 1 DrugB
18 6 DrugB
19 8 DrugB
20 1 DrugB
21 5 DrugB
22 5 DrugB
23 3 DrugB
24 9 DrugB
25 4 DrugC
26 7 DrugC
27 3 DrugC
28 7 DrugC
29 3 DrugC
30 8 DrugC
31 5 DrugC
32 4 DrugC
33 6 DrugC
34 5 DrugC
35 1 DrugC
36 8 DrugC

ANOVA Analysis:
1. Graphical Comparison:

> plot(values~ind, data=Drug)


We can see from above that A and C has almost same means but B has means less than both of
them.

2. Fitting ANOVA model

> results=aov(values~ind,data=Drug)
> summary(results)
Df Sum Sq Mean Sq F value Pr(>F)
ind 2 5.06 2.528 0.346 0.71
Residuals 33 241.17 7.308

Above calculations shows as that p-value is 0.71 and F-value is 0.346

3. Explanation for why Means are different :

If we see that p-value is more than 0.05 while f-value is significantly low so we can accept the null
Hypothesis that all 3 drugs has same effect on patient.

You might also like