5 views

Uploaded by Peter

Wol

- demoGPA
- Real Gases SPE26668
- Naong Matsidiso Nehemia 2009
- 1.c.10
- A Brief Tour of Stata
- Appendix
- Regression Analysis
- Eview Manual Guide
- Important Statistics Formulas
- Pages From Midterm_2015_Winter_soln (1) 3210
- aamj_17.1.8
- 123
- Statistics (How to calculate averages, correlation and regression)
- Trivariate Regression by Hand
- STATS6900 Quantitative Methods for Business
- S8SPC-12
- R Notes for Data Analysis and Statistical Inference
- out_2
- A Procedure for Determining the Characteristic Value of a Geotechnical Parameter (Como Achar o Valor Característico)
- output data sayang.doc

You are on page 1of 10

P1.1 It does not make sense to pose the question in terms of causality. Economists would assume

that students choose a mix of studying and working (and other activities, such as attending class,

leisure, and sleeping) based on rational behavior, such as maximizing utility subject to the

constraint that there are only 168 hours in a week. We can then use statistical methods to

measure the association between studying and working, including regression analysis that we

cover starting in Chapter 2. But we would not be claiming that one variable causes the other.

They are both choice variables of the student.

P1.2 (i) Ideally, we could randomly assign students to classes of different sizes. That is, each

student is assigned a different class size without regard to any student characteristics such as

ability and family background. For reasons we will see in Chapter 2, we would like substantial

variation in class sizes (subject, of course, to ethical considerations and resource constraints).

(ii) A negative correlation means that larger class size is associated with lower performance.

We might find a negative correlation because larger class size actually hurts performance.

However, with observational data, there are other reasons we might find a negative relationship.

For example, children from more affluent families might be more likely to attend schools with

smaller class sizes, and affluent children generally score better on standardized tests. Another

possibility is that, within a school, a principal might assign the better students to smaller classes.

Or, some parents might insist their children are in the smaller classes, and these same parents

tend to be more involved in their childrens education.

(iii) Given the potential for confounding factors some of which are listed in (ii) finding a

negative correlation would not be strong evidence that smaller class sizes actually lead to better

performance. Some way of controlling for the confounding factors is needed, and this is the

subject of multiple regression analysis.

C1.1 (i) The average of educ is about 12.6 years. There are two people reporting zero years of

education, and 19 people reporting 18 years of education.

In EViews, open the variable educ and select View => Descriptive Statistics & Tests =>

Stats Table

(ii) The average of wage is about $5.90, which seems low in 2005.

In EViews, open the variable wage and select View => Descriptive Statistics & Tests =>

Stats Table

(iii) Using Table B-60 in the 2004 Economic Report of the President, the CPI was 56.9 in

1976 and 184.0 in 2003.

(iv) To convert 1976 dollars into 2003 dollars, we use the ratio of the CPIs, which is

184 / 56.9 3.23 . Therefore, the average hourly wage in 2003 dollars is roughly

3.23($5.90) $19.06 , which is a reasonable figure.

(v) The sample contains 252 women (the number of observations with female = 1) and 274

men.

In EViews, open the variable female and select One-Way Tabulation. Press ok with the default

configuration.

C1.2 (i) There are 1388 women in the sample. In EViews, to make a frequency table open cigs.

In view select One-Way Tabulation and just press ok with the default configuration.

Tabulation of CIGS

Number of categories: 18

Cumulative

Cumulative

Value

Count

Percent

Count

Percent

1176

84.73

1176

84.73

0.22

1179

84.94

0.29

1183

85.23

0.50

1190

85.73

0.65

1199

86.38

19

1.37

1218

87.75

0.43

1224

88.18

0.29

1228

88.47

0.36

1233

88.83

0.07

1234

88.90

10

55

3.96

1289

92.87

12

0.36

1294

93.23

15

19

1.37

1313

94.60

20

62

4.47

1375

99.06

30

0.36

1380

99.42

40

0.43

1386

99.86

46

0.07

1387

99.93

50

0.07

1388

100.00

1388

100.00

1388

100.00

Total

There are 1,388 observations in the sample and 1,176 has cigs = 0. So 1,388 -1,176 = 212

women have cigs > 0, i.e. they smoked during pregnancy.

(ii) Open cigs. Again, in View select Descriptive Statistics & Tests Stats Table

The average of cigs is about 2.09, but this includes

the 1,176 women who did not smoke.

Reporting just the average masks the fact that almost

85 percent of the women did not smoke. It makes more

sense to say that the typical woman does not smoke

during pregnancy; indeed, the median number of

cigarettes smoked is zero.

Mean

Median

Maximum

Minimum

Std. Dev.

Skewness

Kurtosis

CIGS

2.087176

0.000000

50.00000

0.000000

5.972688

3.560448

17.93397

Jarque-Bera

Probability

15830.76

0.000000

Sum

Sum Sq. Dev.

2897.000

49478.45

Observations

1388

The average of cigs for women with cigs > 0 is about 13.7. Of course, this is much higher than

the average over the entire sample because we are excluding all of the non-smokers (1,176

zeros). This is an example of a conditional average, because we are conditioning on smoking.

(iv) This time open fatheduc and in view select Descriptive Statistics & Tests Stats

Table. Remember to get rid of sample selection of cigs > 0, simply open sample and delete the

IF condition. The average of fatheduc is about 13.2.

In view select One-Way Tabulation Now use:

Such that NAs are included in the table. There are 196 observations with a missing value for

fatheduc leaving us only 1,192 observations that can be used to compute the average.

Tabulation of FATHEDUC

Date: 01/30/10 Time: 10:33

Sample: 1 1388

Included observations: 1388

Number of categories: 19

Value

NA

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

Total

Count

196

1

2

4

3

4

10

10

22

17

49

64

443

87

115

43

189

32

97

1388

Percent

14.12

0.07

0.14

0.29

0.22

0.29

0.72

0.72

1.59

1.22

3.53

4.61

31.92

6.27

8.29

3.10

13.62

2.31

6.99

100.00

Cumulative

Count

196

197

199

203

206

210

220

230

252

269

318

382

825

912

1027

1070

1259

1291

1388

1388

Cumulative

Percent

14.12

14.19

14.34

14.63

14.84

15.13

15.85

16.57

18.16

19.38

22.91

27.52

59.44

65.71

73.99

77.09

90.71

93.01

100.00

100.00

Table.

(v) This time open faminc and in view select Descriptive Statistics & Tests Stats

about 29.027 and 18.739, respectively, but faminc

is measured in thousands of dollars. So, in dollars,

the average and standard deviation are $29,027

and $18,739.

Mean

Median

Maximum

Minimum

Std. Dev.

Skewness

Kurtosis

FAMINC

29.02666

27.50000

65.00000

0.500000

18.73928

0.617620

2.473396

Jarque-Bera

Probability

104.2811

0.000000

Sum

Sum Sq. Dev.

40289.00

487060.0

Observations

1388

C1.3 (i) In EViews, open the variable math4 and select View => Descriptive Statistics &

Tests => Stats Table. The largest is 100, the smallest is 0.

(ii) In EViews, open the variable math4 and select One-Way Tabulation. Press ok with the

default configuration.

38 out of 1,823, or about 2.1 percent of the sample have a perfect math score.

Tabulation of MATH4

Date: 01/30/10 Time: 10:44

Sample: 1 1823

Included observations: 1823

Number of categories: 6

Value

[0, 20)

[20, 40)

[40, 60)

[60, 80)

[80, 100)

[100, 120)

Total

Count

31

128

264

622

740

38

1823

Percent

1.70

7.02

14.48

34.12

40.59

2.08

100.00

Cumulative

Count

31

159

423

1045

1785

1823

1823

Cumulative

Percent

1.70

8.72

23.20

57.32

97.92

100.00

100.00

(iii) 17. To get this number, open math4 and in Sample type:

(iv) The average of math4 is about 71.9 and the average of read4 is about 60.1. So, at least

in 2001, the reading test was harder to pass.

(v) In Eviews, write: cor math4 read4 in the command window. The sample correlation

between math4 and read4 is about .843, which is a very high degree of (linear) association. Not

surprisingly, schools that have high pass rates on one test have a strong tendency to have high

pass rates on the other test.

MATH4

1.000000

0.842728

MATH4

READ4

READ4

0.842728

1.000000

(vi) In EViews, open the variable math4 and select View => Descriptive Statistics &

Tests => Stats Table. The average of exppp is about $5,194.87. The standard deviation is

$1,091.89, which shows rather wide variation in spending per pupil. [The minimum is $1,206.88

and the maximum is $11,957.64.]

(vii) a) 100((6,000 5,500)/5,500) = 9.1% and b) 100(log(6,000) log(5,500)) = 8.7%

C1.4 (i) In EViews, open the variable train and select One-Way Tabulation. Press ok with the

default configuration. 185/445 .416 is the fraction of men receiving job training, or about

41.6%.

Tabulation of TRAIN

Date: 01/30/10 Time: 11:04

Sample: 1 445

Included observations: 445

Number of categories: 2

Value

0

1

Total

Count

260

185

445

Percent

58.43

41.57

100.00

Cumulative

Count

260

445

445

Cumulative

Percent

58.43

100.00

100.00

(ii) Open re78. Select Sample and (a) type under IF train = 0 which will give you a sample

of those who havent received training, and (b) type under IF train = 1 which will give you a

sample of those who have received training. In both cases, you find the average by selecting

View => Descriptive Statistics & Tests => Stats Table For men receiving job training, the

average of re78 is about 6.35, or $6,350. For men not receiving job training, the average of re78

is about 4.55, or $4,550. The difference is $1,800, which is very large. On average, the men

receiving the job training had earnings about 40% higher than those not receiving training.

(iii) Open unem78. Select Sample and (a) type under IF train = 0 which will give you a

sample of those who havent received training, and (b) type under IF train = 1 which will give

you a sample of those who have received training. In both cases, you find the fraction of

unemployment by selecting One-Way Tabulation (Press ok with the default configuration).

About 24.3% of the men who received training were unemployed in 1978; the figure is 35.4%

for men not receiving training. This, too, is a big difference.

Tabulation of UNEM78

Date: 01/30/10 Time: 11:13

Sample: 1 445 IF TRAIN=0

Included observations: 260

Number of categories: 2

Value

0

1

Total

Count

168

92

260

Percent

64.62

35.38

100.00

Cumulative

Count

168

260

260

Cumulative

Percent

64.62

100.00

100.00

Percent

75.68

24.32

100.00

Cumulative

Count

140

185

185

Cumulative

Percent

75.68

100.00

100.00

Tabulation of UNEM78

Date: 01/30/10 Time: 11:15

Sample: 1 445 IF TRAIN=1

Included observations: 185

Number of categories: 2

Value

0

1

Total

Count

140

45

185

(iv) The differences in earnings and unemployment rates suggest the training program had

strong, positive effects. Our conclusions about economic significance would be stronger if we

could also establish statistical significance (which is done in Computer Exercise C9.10 in

Chapter 9).

P2.1 In the equation y = 0 + 1x + u, add and subtract 0 from the right hand side to get y =

(0 + 0) + 1x + (u 0). Call the new error e = u 0, so that E(e) = 0. The new intercept is

0 + 0, but the slope is still 1.

n

P2.2

(i) Let yi = GPAi, xi = ACTi, and n = 8. Then x = 25.875, y = 3.2125, (xi x )(yi

i=1

y ) = 5.8125, and (xi x )2 = 56.875. From equation (2.9), we obtain the slope as 1 =

i=1

5.8125/56.875 .1022, rounded to four places after the decimal. From (2.17), 0 = y

x 3.2125 (.1022)25.875 .5681. So we can write

1

GPA

n = 8.

The intercept does not have a useful interpretation because ACT is not close to zero for the

n increases by .1022(5) = .511.

population of interest. If ACT is 5 points higher, GPA

(ii) The fitted values and residuals rounded to four decimal places are given along with

the observation number i and GPA in the following table:

n

u

i GPA GPA

1 2.8

2.7143 .0857

2 3.4

3.0209 .3791

3 3.0

3.2253 .2253

4 3.5

3.3275 .1725

5 3.6

3.5319 .0681

6 3.0

3.1231 .1231

7 2.7

3.1231 .4231

8 3.7

3.6341 .0659

You can verify that the residuals, as reported in the table, sum to .0002, which is pretty close to

zero given the inherent rounding error.

(iii) When ACT = 20, GPA

2

i

i =1

(yi

i=1

R2 = 1 SSR/SST 1 (.4347/1.0288) .577.

Therefore, about 57.7% of the variation in GPA is explained by ACT in this small sample of

students.

C2.1 (i) In both cases, open the variable and select View => Descriptive Statistics & Tests =>

Stats Table. The average prate is about 87.36 and the average mrate is about .732.

(ii) To estimate the equation in EViews write ls prate c mrate in the command window. The

estimated equation is

n

prate = 83.05 + 5.86 mrate

n = 1,534, R2 = .075.

Dependent Variable: PRATE

Method: Least Squares

Date: 01/30/10 Time: 11:41

Sample: 1 1534

Included observations: 1534

Variable

Coefficient

Std. Error

t-Statistic

Prob.

C

MRATE

83.07546

5.861079

0.563284

0.527011

147.4840

11.12137

0.0000

0.0000

R-squared

Adjusted R-squared

S.E. of regression

Sum squared resid

Log likelihood

F-statistic

Prob(F-statistic)

0.074703

0.074099

16.08528

396383.8

-6436.956

123.6848

0.000000

S.D. dependent var

Akaike info criterion

Schwarz criterion

Hannan-Quinn criter.

Durbin-Watson stat

87.36291

16.71654

8.394989

8.401945

8.397578

1.908008

(iii) The intercept implies that, even if mrate = 0, the predicted participation rate is 83.05

percent. The coefficient on mrate implies that a one-dollar increase in the match rate a fairly

large increase is estimated to increase prate by 5.86 percentage points. This assumes, of

course, that this change prate is possible (if, say, prate is already at 98, this interpretation makes

no sense).

(iv) If we plug mrate = 3.5 into the equation we get prate

This is impossible, as we can have at most a 100 percent participation rate. This illustrates that,

especially when dependent variables are bounded, a simple regression model can give strange

predictions for extreme values of the independent variable. (In the sample of 1,534 firms, only

34 have mrate 3.5.)

(v) mrate explains about 7.5% of the variation in prate. This is not much, and suggests that

many other factors influence 401(k) plan participation rates.

10

- demoGPAUploaded byfabriciolafebre
- Real Gases SPE26668Uploaded byLawrenceLopez
- Naong Matsidiso Nehemia 2009Uploaded byAnantha Raj
- 1.c.10Uploaded bySaadet Atilla
- A Brief Tour of StataUploaded byToulouse18
- AppendixUploaded bysadyehclen
- Regression AnalysisUploaded byJOJUL_ISTEACHER
- Eview Manual GuideUploaded byAdhi Chandra Wirawan
- Important Statistics FormulasUploaded byRakesh Yadav
- Pages From Midterm_2015_Winter_soln (1) 3210Uploaded byMax
- aamj_17.1.8Uploaded byMario Martinez
- 123Uploaded bysathiya
- Statistics (How to calculate averages, correlation and regression)Uploaded byMuhammad Sajid Saeed
- Trivariate Regression by HandUploaded byAvinash Supkar
- STATS6900 Quantitative Methods for BusinessUploaded byRafiq Ibne Nur Reza
- S8SPC-12Uploaded byRoxy Roxa
- R Notes for Data Analysis and Statistical InferenceUploaded byrgardnercook
- out_2Uploaded byEvana Larasati
- A Procedure for Determining the Characteristic Value of a Geotechnical Parameter (Como Achar o Valor Característico)Uploaded byWeber Anselmo
- output data sayang.docUploaded byBradley Salazar
- TERM PAPER ON “CURRENCY STABILITY OF EURO(FRANCE) WITH REST OF THE WORLDUploaded byVubon Minu
- Wonder of HeavensUploaded bySaqib Ahmad
- Hecht Teddy Ex6 MKMR310Uploaded byteyuders
- SWAT Model Calibration EvaluationUploaded byAnonymous BVbpSE
- LREUploaded byvetdeva
- Factor of Currency Develuation PakistanUploaded byjavedaly
- Multi Objective Optimization of Process Parameters in Abrasive Flow Machining of Launch Vehicle Fluid Control ComponentUploaded bySoorej Kamal
- SppUploaded byOmer Abdur Rehman Khan
- chapter08 part 5Uploaded byapi-232613595
- PrfUploaded bySangam Neupane

- company_profile.docxUploaded bySumit Rp
- 6580262 Managerial Economics Chapter 4Uploaded byvivek1119
- 02whole.pdfUploaded byDon Barlone
- Understand Statistical SymbolsUploaded byudaysk
- Correlation and RegressionUploaded byNjuh Polki
- 293Uploaded byArpan Kumar Patra
- 17012019_BA(Prog.)-2019-SEM.-VI-IV-II(CBCS)Uploaded byShakir Nabi
- cal gen 91 001Uploaded bypostscript
- Data Science Course brochureUploaded byRamesh Kummam
- MFx Module 3 Properties of Time SeriesUploaded byThanh Nguyen
- Nr Reporting Summary FlatUploaded bycretinusmaximus
- PShah-22843119Uploaded bysemper88
- RodaUploaded byKresimer Nicole Asirit Garma
- A STUDY on CHANGING CONSUMER PREFERENCES TOWARDS Organised Retailing From Unorganised RetailingUploaded byarcherselevators
- The 5 Most Important Clinical SAS Programming Validation StepUploaded bypalani.ramji
- 12Uploaded byNaveen Goyal
- 13063_2017_Article_2275.pdfUploaded byDaniela Quilodrán
- Response Surface RegressionUploaded byashrvincent
- change matrix-1Uploaded byapi-281582623
- Solver Demonstration 2007Uploaded byDolphin Micky
- Stat Mini ProjectUploaded byyadlajagruthi
- 14520901 Ch06 Exploratory Research and Qualitative AnalysisUploaded byKawish Ahmed
- pranpreyathesisUploaded byTrung Kien Le
- Spatial Econometrics IntroductionUploaded byMoniefMydomain
- 10_chapter 3(1)Uploaded byPouranchung Kamei
- ass-algo (1)Uploaded bymalik
- BP3[1]Uploaded bysam2sung2
- Access Learner ReportUploaded byMujeeb Ur Rahman
- Malhotra MR6e 02Uploaded byMohsin Ali Raza
- The Boston Housing DatasetUploaded bySwastik Mishra