You are on page 1of 11

Chapter 2

Question 8
(a)
college<-read.csv("College.csv", h = T)
(b)
rownames(college) = college[,1]
college = college[,-1]
(c)
#i.
summary(college)
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##

Private
No :212
Yes:565

Apps
Accept
Enroll
Top10perc
Min.
:
81
Min.
:
72
Min.
: 35
Min.
: 1.00
1st Qu.: 776
1st Qu.: 604
1st Qu.: 242
1st Qu.:15.00
Median : 1558
Median : 1110
Median : 434
Median :23.00
Mean
: 3002
Mean
: 2019
Mean
: 780
Mean
:27.56
3rd Qu.: 3624
3rd Qu.: 2424
3rd Qu.: 902
3rd Qu.:35.00
Max.
:48094
Max.
:26330
Max.
:6392
Max.
:96.00
Top25perc
F.Undergrad
P.Undergrad
Outstate
Min.
: 9.0
Min.
: 139
Min.
:
1.0
Min.
: 2340
1st Qu.: 41.0
1st Qu.: 992
1st Qu.:
95.0
1st Qu.: 7320
Median : 54.0
Median : 1707
Median : 353.0
Median : 9990
Mean
: 55.8
Mean
: 3700
Mean
: 855.3
Mean
:10441
3rd Qu.: 69.0
3rd Qu.: 4005
3rd Qu.: 967.0
3rd Qu.:12925
Max.
:100.0
Max.
:31643
Max.
:21836.0
Max.
:21700
Room.Board
Books
Personal
PhD
Min.
:1780
Min.
: 96.0
Min.
: 250
Min.
: 8.00
1st Qu.:3597
1st Qu.: 470.0
1st Qu.: 850
1st Qu.: 62.00
Median :4200
Median : 500.0
Median :1200
Median : 75.00
Mean
:4358
Mean
: 549.4
Mean
:1341
Mean
: 72.66
3rd Qu.:5050
3rd Qu.: 600.0
3rd Qu.:1700
3rd Qu.: 85.00
Max.
:8124
Max.
:2340.0
Max.
:6800
Max.
:103.00
Terminal
S.F.Ratio
perc.alumni
Expend
Min.
: 24.0
Min.
: 2.50
Min.
: 0.00
Min.
: 3186
1st Qu.: 71.0
1st Qu.:11.50
1st Qu.:13.00
1st Qu.: 6751
Median : 82.0
Median :13.60
Median :21.00
Median : 8377
Mean
: 79.7
Mean
:14.09
Mean
:22.74
Mean
: 9660
3rd Qu.: 92.0
3rd Qu.:16.50
3rd Qu.:31.00
3rd Qu.:10830
Max.
:100.0
Max.
:39.80
Max.
:64.00
Max.
:56233
Grad.Rate
Min.
: 10.00
1st Qu.: 53.00
Median : 65.00
1

##
##
##

Mean
: 65.46
3rd Qu.: 78.00
Max.
:118.00

#ii.
pairs(college[,1:10])

40000

5000

20

80

20000

2000

8000
1.0

Private

Apps

Accept

Enroll

Top10perc

20

Top25perc

F.Undergrad

5000

P.Undergrad

2000

Outstate

Room.Board

1.0

1.8

0 20000

60

#iii.
boxplot(college$Outstate, college$Private)

25000

5000

20000
5000 10000
0

#iv.
Elite<-rep("No", nrow(college))
Elite[college$Top10perc>50]="Yes"
Elite<-as.factor(Elite)
college<-data.frame(college, Elite)
summary(Elite)
## No Yes
## 699 78
boxplot(college$Outstate, college$Elite)

20000
5000 10000
0

#v.
par(mfrow=c(2,2))
hist(college$Top10perc)
hist(college$PhD)
hist(college$Personal)
hist(college$Enroll)

20

40

60

80

100
0

Frequency
0

100

20

40

60

80

100

Histogram of college$Personal

Histogram of college$Enroll

2000

4000

6000

150
0

200

college$PhD

Frequency

college$Top10perc

Frequency

Histogram of college$PhD

0 100

Frequency

Histogram of college$Top10perc

0 1000

college$Personal

3000

5000

college$Enroll

#vi.
dim(college)
## [1] 777

19

par(mfrow = c(1,1))
Summary of the data: this data set is originally composed by 777 observations of 18 variables. Each observation
corresponds to a university. Most variables are counts, and show a poisson distribution.

Question 9
Reading data:
auto<-read.csv("Auto.csv", h = T)

(a)
str(auto)

## 'data.frame':
## $ mpg
:
## $ cylinders
:
## $ displacement:
## $ horsepower :
## $ weight
:
## $ acceleration:
## $ year
:
## $ origin
:
## $ name
:

397 obs. of 9 variables:


num 18 15 18 16 17 15 14 14 14 15 ...
int 8 8 8 8 8 8 8 8 8 8 ...
num 307 350 318 304 302 429 454 440 455 390 ...
Factor w/ 94 levels "?","100","102",..: 17 35 29 29 24 42 47 46 48 40 ...
int 3504 3693 3436 3433 3449 4341 4354 4312 4425 3850 ...
num 12 11.5 11 12 10.5 10 9 8.5 10 8.5 ...
int 70 70 70 70 70 70 70 70 70 70 ...
int 1 1 1 1 1 1 1 1 1 1 ...
Factor w/ 304 levels "amc ambassador brougham",..: 49 36 231 14 161 141 54 223 241 2

Quantitative: mpg, cylinders, displacement, horsepower, weight, acceleration. Qualitative: year, origin, name.

(b)
range(auto$mpg)
## [1]

9.0 46.6

range(auto$cylinders)
## [1] 3 8
range(auto$displacement)
## [1]

68 455

range(as.numeric(auto$horsepower))
## [1]

1 94

range(auto$weight)
## [1] 1613 5140
range(auto$acceleration)
## [1]

8.0 24.8

(c)
mean(auto$mpg); sd(auto$mpg)
## [1] 23.51587
## [1] 7.825804

mean(auto$cylinders); sd(auto$cylinders)
## [1] 5.458438
## [1] 1.701577
mean(auto$displacement); sd(auto$displacement)
## [1] 193.5327
## [1] 104.3796
mean(as.numeric(auto$horsepower)); sd(auto$horsepower)
## [1] 51.51637
## [1] 29.8627
mean(auto$weight); sd(auto$weight)
## [1] 2970.262
## [1] 847.9041
mean(auto$acceleration); sd(auto$acceleration)
## [1] 15.55567
## [1] 2.749995

(d)
newAuto<-auto[-10:-85,]
mean(newAuto$mpg); sd(newAuto$mpg)
## [1] 24.43863
## [1] 7.908184
mean(newAuto$cylinders); sd(newAuto$cylinders)
## [1] 5.370717
## [1] 1.653486

mean(newAuto$displacement); sd(newAuto$displacement)
## [1] 187.0498
## [1] 99.63539
mean(as.numeric(newAuto$horsepower)); sd(newAuto$horsepower)
## [1] 50.99688
## [1] 30.07672
mean(newAuto$weight); sd(newAuto$weight)
## [1] 2933.963
## [1] 810.6429
mean(newAuto$acceleration); sd(newAuto$acceleration)
## [1] 15.72305
## [1] 2.680514

(e)
pairs(auto)

0 40

10

20

1.0

2.5

10

mpg

cylinders

0 80

100

displacement

horsepower

1500

weight

70 82

10

acceleration

year

name

10 30

100

400

1500

4500

70 76 82

0 300

1.0

origin

0 150

We can see that some variables are correlated, some others, e.g. year, do not show patterns.

(f)
Weight seems to be a good predictor for gas mileage, as expected, although the relationship is not linear,
maybe exponential. Displacement also seems to show the same pattern, and cylinders, but both are correlated
and probably account for the same information. There seems to be some relation between year and mpg - as
year increases, mpg also increases, suggesting some worry about making more economical or efficient vehicles.

Question 10
(a)
library(MASS)
?Boston
506 rows and 14 columns. The rows represent housing values in suburbs of Boston. Each column is a variable
for deciding housing values.

(b)
pairs(Boston)

0 80

0.0

12

200

400

10
0

crim

zn

indus

0.0

chas

0.4

nox

rm

age

dis

200

rad
tax

14

ptratio

10

black
lstat

10

medv

80

25

0.4

0 80

14

10

##(c) There is no clear relationship between per capita crime rate and other predictor.

(d)
plot(Boston$crim)

10

80
60
40
0

20

Boston$crim

100

200

300
Index

(e)
sum(Boston$chas)
## [1] 35

(f)
median(Boston$ptratio)
## [1] 19.05

11

400

500

You might also like