Professional Documents
Culture Documents
6)
Everything so far has been linear in the Xs The approximation that the regression function is linear might be good for some variables, but not for others. The multiple regression framework can be extended to handle regression functions that are nonlinear in one or more X.
6-
6-#
$ut the TestScore ! average district income relation looks like it is nonlinear.
6-%
&f a relation between Y and X is nonlinear' The effect on Y of a change in X depends on the value of X ! that is, the marginal effect of X is not constant ( linear regression is mis-specified ! the functional form is wrong The estimator of the effect on Y of X is biased ! it neednt even be right on average. The solution to this is to estimate a regression function that is nonlinear in X
6-)
The General Nonlinear Population Regression Function Yi * f+X i,X#i,",Xki, - ui, i * ,", n Assumptions . E+ui. X i,X#i,",Xki, * / +same,0 implies that f is the conditional expectation of Y given the Xs. #. +X i,",Xki,Yi, are i.i.d. +same,. %. 1enough2 moments exist +same idea0 the precise statement depends on specific f,. ). 3o perfect multicollinearity +same idea0 the precise statement depends on the specific f,.
6-4
6-6
Nonlinear Functions of a Single Independent Varia le (SW Section 6.!) 5ell look at two complementary approaches' . 6olynomials in X The population regression function is approximated by a 7uadratic, cubic, or higher-degree polynomial #. 8ogarithmic transformations Y and9or X is transformed by taking its logarithm this gives a 1percentages2 interpretation that makes sense in many applications
6-:
". #ol$nomials in X (pproximate the population regression function by a polynomial' Yi * / - Xi - # X i# -"- r X ir - ui This is ;ust the linear multiple regression model ! except that the regressors are powers of X< Estimation, hypothesis testing, etc. proceeds as in the multiple regression model using =8> The coefficients are difficult to interpret, but the regression function itself is interpretable
6-?
Example' the TestScore ! Income relation Incomei * average district income in the ith district +thousdand dollars per capita, @uadratic specification' TestScorei * / - Incomei - #+Incomei,# - ui Aubic specification' TestScorei * / - Incomei - #+Incomei,# - %+Incomei,% - ui
6-B
(((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((( . Robust testscr . Coef$ ,td$ -rr$ t &'.t. /0%1 Conf$ 2nterva34 (((((((((((((5(((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((( avginc . 6$#%000% $2*#004 4$6* 0$000 6$6240 4$6!!0!0 avginc2 . ($04260#% $004!#06 (#$#% 0$000 ($0% !0% ($0620 0 7cons . *0!$60 ! 2$00 !%4 200$20 0$000 *0 $%0!# * 6$00%* ((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((
The t-statistic on Income# is -?.?4, so the hypothesis of linearity is re;ected against the 7uadratic alternative at the C significance level.
6- /
Interpreting the estimate regression function' +a, 6lot the predicted values * 6/:.% - %.?4Incomei ! /./)#%+Incomei,# TestScore +#.B, +/.#:, +/.//)?,
6-
Interpreting the estimate regression function' +a, Aompute 1effects2 for different values of X
* 6/:.% - %.?4Incomei ! /./)#%+Incomei,# TestScore
+#.B, +/.#:,
+/.//)?,
6redicted change in TestScore for a change in income to D6,/// from D4,/// per capita'
TestScore * 6/:.% - %.?46 ! /./)#%6#
6- #
6redicted 1effects2 for different values of X Ahange in Income +thD per capita, TestScore from 4 to 6 %.) from #4 to #6 .: from )4 to )6 /./ The 1effect2 of a change in income is greater at low than high income levels +perhaps, a declining marginal benefit of an increase in school budgetsE, "aution# 5hat about a change from 64 to 66E Font extrapolate outside the range of the data.
6- %
(((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((( . Robust testscr . Coef$ ,td$ -rr$ t &'.t. /0%1 Conf$ 2nterva34 (((((((((((((5(((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((( avginc . %$0 #*!! $!0!6%0% !$ 0 0$000 6$*2#2% *$400 04 avginc2 . ($00%#0%2 $02#0%6! (6$6 0$00 ($ %2! 0 ($06##0 6 avginc6 . $000*#%% $00064! $0# 0$040 6$2!e(0* $00 6*!! 7cons . *00$0!0 %$ 020*2 !$* 0$000 %00$0400 * 0$ 0# ((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((
The cubic term is statistically significant at the 4C, but not C, level
6- )
Testing the null hypothesis of linearity, against the alternative that the population regression is 7uadratic and9or cubic, that is, it is a polynomial of degree up to %' %/' popn coefficients on Income# and Income% * / % ' at least one of these coefficients is nonGero.
test avginc2 avginc6; ( " ( 2" avginc2 = 0$0 avginc6 = 0$0 F( 2, 4 *" = &rob ' F = 6!$*0 0$0000 -8ecute the test command after running the regression
The hypothesis that the population regression is linear is re;ected at the C significance level against the alternative that it is a polynomial of degree up to %.
6- 4
Summar$% pol$nomial regression functions Yi * / - Xi - # X i# -"- r X ir - ui Estimation' by =8> after defining new regressors Aoefficients have complicated interpretations To interpret the estimated regression function' o plot predicted values as a function of x o compute predicted Y9X at different values of x Hypotheses concerning degree r can be tested by tand F-tests on the appropriate +blocks of, variable+s,. Ahoice of degree r o plot the data0 t- and F-tests, check sensitivity of estimated effects0 ;udgment.
6- 6
o &r use mo el selection criteria +ma'$e later, !. &ogarithmic functions of Y and'or X ln+X, * the natural logarithm of X 8ogarithmic transforms permit modeling relations in 1percentage2 terms +like elasticities,, rather than linearly. %ere(s )h''
x ln+x-x, ! ln+x, * ln + x ln+ x , = , +calculus' x x x x
Three cases' Case &. linear-log &&. log-linear &&&. log-log #opulation regression function Yi * / - ln+Xi, - ui ln+Yi, * / - Xi - ui ln+Yi, * / - ln+Xi, - ui
The interpretation of the slope coefficient differs in each case. The interpretation is found by applying the general 1before and after2 rule' 1figure out the change in Y for a given change in X.2
6- ?
I. &inear(log population regression function Yi * / - ln+Xi, - ui 3ow change X' >ubtract +a, ! +b,' now so or Y - Y * / - ln+X - X, Y * Iln+X - X, ! ln+X,J +b, +a,
6- B
6-#/
Example- TestScore .s/ ln0Income1 Kirst defining the new regressor, ln+Income, The model is now linear in ln+Income,, so the linear-log model can be estimated by =8>'
* 44:.? - %6.)#ln+Incomei, TestScore
+%.?,
+ .)/,
so a C increase in Income is associated with an increase in TestScore of /.%6 points on the test. >tandard errors, confidence intervals, R# ! all the usual tools of regression apply here. How does this compare to the cubic modelE
6-#
6-##
II. &og(linear population regression function ln+Yi, * / - Xi - ui 3ow change X' >ubtract +a, ! +b,' so or ln+Y - Y, * / - +X - X, ln+Y - Y, ! ln+Y, * X +b, +a,
Y X Y Y 9 Y +small X, X
6-#%
X by one unit ( X * ") is associated with a ")) "% change in Y (Y increases by a factor of "+ "). Note' 5hat are the units of ui and the >ELE o fractional +proportional, deviations o for example, SER * .# means"
6-#)
III. &og(log population regression function ln+Yi, * / - ln+Xi, - ui 3ow change X' >ubtract' so or ln+Y - Y, * / - ln+X - X, +b, +a,
6-#4
percentage change in X, so a 1% change in X is associated with a 1% change in Y. In the log-log specification 1 has the interpretation of an elasticity.
6-#6
Example- ln0 TestScore1 .s/ ln0 Income1 Kirst defining a new dependent variable, ln+TestScore,, and the new regressor, ln+Income, The model is now a linear regression of ln+TestScore, against ln+Income,, which can be estimated by =8>'
TestScore, * 6.%%6 - /./44)ln+Incomei, ln+
+/.//6, +/.//# , (n C increase in Income is associated with an increase of ./44)C in TestScore +factor of ./44), How does this compare to the log-linear modelE
6-#:
Summar$% &ogarithmic transformations Three cases, differing in whether Y and9or X is transformed by taking logarithms. (fter creating the new variable+s, ln+Y, and9or ln+X,, the regression is linear in the new variables and the coefficients can be estimated by =8>. Hypothesis tests and confidence intervals are now standard. The interpretation of differs from case to case. Ahoice of specification should be guided by ;udgment +which interpretation makes the most sense in your applicationE,, tests, and plotting predicted values
6-#B
Interactions +et,een Independent Varia les (SW Section 6.-) 6erhaps a class siGe reduction is more effective in some circumstances than in others" 6erhaps smaller classes help more if there are many English learners, who need individual attention
TestScore That is, might depend on PctE* STR Y More generally, might depend on X# X
How to model such 1interactions2 between X and X#E 5e first consider binary Xs, then continuous Xs
6-%/
(a) Interactions et,een t,o inar$ .aria les Yi * / - 2 i - #2#i - ui 2 i, 2#i are binary is the effect of changing 2 */ to 2 * . &n this specification, this effect oesn(t epen on the .alue of 2#. To allow the effect of changing 2 to depend on 2#, include the 1interaction term2 2 i2#i as a regressor' Yi * / - 2 i - #2#i - %+2 i2#i, - ui
6-%
Interpreting the coefficients Yi * / - 2 i - #2#i - %+2 i2#i, - ui Neneral rule' compare the various cases E+Yi.2 i*/, 2#i* #, * / - #
# #
+b, - %
#
+a,
+what we wanted,
+ .), +#.%,
+ .B,
+%. ,
1Effect2 of %iSTR when %iE* * / is ! .B 1Effect2 of %iSTR when %iE* * is ! .B ! %.4 * !4.) Alass siGe reduction is estimated to have a bigger effect when the percent of English learners is large This interaction isnt statistically significant' t * %.49%.
6-%%
( ) Interactions et,een continuous and inar$ .aria les Yi * / - 2i - #Xi - ui 2i is binary, X is continuous (s specified above, the effect on Y of X +holding constant 2, * #, which does not depend on 2 To allow the effect of X to depend on 2, include the 1interaction term2 2iXi as a regressor' Yi * / - 2i - #Xi - %+2iXi, - ui
6-%)
Interpreting the coefficients Yi * / - 2i - #Xi - %+2iXi, - ui Neneral rule' compare the various cases Y * / - 2 - #X - %+2X, 3ow change X' Y - Y * / - 2 - #+X-X, - %I2+X-X,J +a, subtract +a, ! +b,'
Y Y * #X - %2X or * # - %2 X
+b,
The effect of X depends on 2 +what we wanted, % * increment to the effect of X, when 2 * Example' TestScore, STR, %iE* +* if PctE*#/,
6-%4
+ B.4,
+/.B:,
5hen %iE* * ,
* 6?#.# ! /.B:STR - 4.6 ! .#?STR TestScore
* 6?:.? ! #.#4STR Two regression lines' one for each %iSTR group. Alass siGe reduction is estimated to have a larger effect when the percent of English learners is large. Example, ct /
6-%6
+ .B, +/.4B,
+ B.4,
+/.B:,
Testing various hypotheses' The two regression lines have the same slope the coefficient on STR%iE* is Gero' t * ! .#?9/.B: * ! .%# cant re;ect The two regression lines have the same intercept the coefficient on %iE* is Gero' t * !4.69 B.4 * /.#B cant re;ect Example, ct /
* 6?#.# ! /.B:STR - 4.6%iE* ! .#?+STR%iE*,, TestScore
+ .B, +/.4B,
+ B.4,
+/.B:,
6-%:
!oint hypothesis that the two regression lines are the same population coefficient on %iE* * / and population coefficient on STR%iE* * /' F * ?B.B) +p-value O .// , // 5hy do we re;ect the ;oint hypothesis but neither individual hypothesisE Aonse7uence of high but imperfect multicollinearity' high correlation between %iE* and STR%iE* 3inar'+continuous interactions- the t)o regression lines Yi * / - 2i - #Xi - %+2iXi, - ui
6-%?
=bservations with 2i* / +the 12 * /2 group,' Yi * / - #Xi - ui =bservations with 2i* +the 12 * 2 group,'
6-%B
6-)/
(c) Interactions et,een t,o continuous .aria les Yi * / - X i - #X#i - ui X , X# are continuous (s specified, the effect of X doesnt depend on X# (s specified, the effect of X# doesnt depend on X To allow the effect of X to depend on X#, include the 1interaction term2 X iX#i as a regressor' Yi * / - X i - #X#i - %+X iX#i, - ui
6-)
"oefficients in continuous+continuous interactions Yi * / - X i - #X#i - %+X iX#i, - ui Neneral rule' compare the various cases Y * / - X - #X# - %+X X#, 3ow change X ' Y- Y * / - +X -X , - #X# - %I+X -X ,X#J subtract +a, ! +b,'
Y Y * X - %X#X or * # - %X# X
+b, +a,
The effect of X depends on X# +what we wanted, % * increment to the effect of X from a unit change in X# Example' TestScore, STR, PctE*
6-)#
+ .?, +/.4B,
+/.%:,
+/./ B,
The estimated effect of class siGe reduction is nonlinear because the siGe of the effect itself depends on PctE*'
TestScore * ! . # - .// #PctE* STR TestScore PctE* STR
+ .?, +/.4B,
+/.%:,
+/./ B,
6-)%
Foes population coefficient on STRPctE* * /E t * .// #9./ B * ./6 cant re;ect null at 4C level Foes population coefficient on STR * /E t * ! . #9/.4B * ! .B/ cant re;ect null at 4C level Fo the coefficients on both STR and STRPctE* * /E F * %.?B +p-value * ./# , re;ect null at 4C level+<<, +5hyE high but imperfect multicollinearity,
6-))
Application% Nonlinear 0ffects on 1est Scores of the Student(1eacher Ratio (SW Section 6.2) Kocus on two 7uestions' . (re there nonlinear effects of class siGe reduction on test scoresE +Foes a reduction from %4 to %/ have same effect as a reduction from #/ to 4E, #. (re there nonlinear interactions between PctE* and STRE +(re small classes more effective when there are many English learnersE,
6-)4
>trategy for @uestion P +different effects for different STRE, Estimate linear and nonlinear functions of STR, holding constant relevant demographic variables o PctE* o Income +remember the nonlinear TestScore-Income relation<, o *unchP"T +fraction on free9subsidiGed lunch, >ee whether adding the nonlinear terms makes an 1economically important2 7uantitative difference +1economic2 or 1real-world2 importance is different than statistically significant, Test for whether the nonlinear terms are significant
6-)6
6-):
(n advantage of the logarithmic specification is that it is better behaved near the ends of the sample, especially large values of income.
6-)?
+ase specification Krom the scatterplots and preceding analysis, here are plausible starting points for the demographic control variables' Fependent variable' TestScore Independent .aria le PctE* *unchP"T Income Functional form linear linear ln+Income, +or could use cubic,
6-)B
+ 6%.6, +#).?6,
+ .#4,
+./# ,
! 4.):%iE* ! .)#/*unchP"T - .:4ln+Income, + ./%, +./#B, + .:?, &nterpretation of coefficients on' %iE*E *unchP"TE ln+Income,E STR, STR#, STR%E
6-4/
Interpreting the regression function .ia plots +preceding regression is labeled +4, in this figure,
6-4
+ 6%.6, +#).?6,
+ .#4,
+./# ,
! 4.):%iE* ! .)#/*unchP"T - .:4ln+Income, + ./%, +./#B, + .:?, +a, %/' 7uadratic in STR v. % ' cubic in STRE t * ./4B9./# * #.?6 +p * .//4, +b, %/' linear in STR v. % ' nonlinear9up to cubic in STRE F * 6. : +p * .//#,
6-4#
4uestion 5#' STR-PctE* interactions +to simplify things, ignore STR#, STR% terms for now,
* 64%.6 ! .4%STR - 4.4/%iE* ! .4?%iE*STR TestScore
+B.B, +.%),
+B.?/,
+.4/,
! .) *unchP"T - #. #ln+Income, +./#B, + .?/, &nterpretation of coefficients on' STRE %iE*E +wrong signE, %iE*STRE *unchP"TE ln+Income,E
6-4%
+B.B, +.%),
+B.?/,
+.4/,
! .) *unchP"T - #. #ln+Income, +./#B, + .?/, 4Real(,orld5 (4polic$5 or 4economic5) importance of the interaction term%
TestScore * !.4% ! .4?%iE* * STR . # if %iE* = .4% if %iE* = /
The difference in the estimated effect of reducing the STR is substantial0 class siGe reduction is more effective in districts with more English learners
6-4)
+B.B, +.%),
+B.?/,
+.4/,
! .) *unchP"T - #. #ln+Income, +./#B, + .?/, +a, %/' coeff. on interaction*/ v. % ' nonGero interaction t * ! . : not significant at the /C level +b, %/' both coeffs involving STR * / vs. % ' at least one coefficient is nonGero +STR enters, F * 4.B# +p * .//%, Next- specifications )ith pol'nomials - interactions<
6-44
6-46
6-4:
6-4?
Summar$% Nonlinear Regression Functions Qsing functions of the independent variables such as ln+X, or X X#, allows recasting a large family of nonlinear regression functions as multiple regression. Estimation and inference proceeds in the same way as in the linear multiple regression model. &nterpretation of the coefficients is model-specific, but the general rule is to compute effects by comparing different cases +different value of the original Xs, Many nonlinear specifications are possible, so you must use ;udgment' 5hat nonlinear effect you want to analyGeE 5hat makes sense in your applicationE
6-4B