Ed231C - Generalized Linear Models

Ed231C: Generalized Linear Models
26/02/10 12:26
Applies Categorical & Nonnormal Data Analysis

Generalized Linear Models
Generalized Linear Models
Most students are introduced to linear models through either multiple regression or analysis of variance.
With these methods the expected value of the response variable is statistically modeled, that is, it is
expressed as a linear combination of the explanatory variables. With categorical and count response
variables, the regression cannot be linear. The problem of nonlinearity is handled through nonlinear
functions that transform the expected value of the categorical or count variable into a linear function of
the explanatory variables. Such transformations are referred to as link functions.
For example, in the analysis of count data, the expected frequencies must be nonnegative. To ensure that
the predicted values from the linear models fit these constraints, the log link is used to transform the
expected value of the response variable. This loglinear transformation serves two purposes: it ensures that
the fitted values are appropriate for count data, and it permits the unknown regression parameters to lie
within the real number space.
Different types of response variables utilize different link functions: both the logit and probit link
functions work with binomial response variables while the log link function works with both poisson and
negative binomial response variables. Growing out of the work of Nelder & Wedderburn (1972) and
McCullagh & Nelder (1989), generalized linear models provides a unified framework which can be
applied to various 'linear' models.
Generalized linear models take the form:
g(E(y)) = x,
y -> {F}
where F is the distribution family and g( ) is the link function.

You might recognize this example more easily if it were rewritten as follows:
Y' = b0 + b1 X1 + b2 X2 + ...
y -> {gaussian}
Now we can replace Y' with E(y),

E(y) = b0 + b1 X1 + b2 X2 + ...
y -> {gaussian}
In OLS the distribution family is gaussian (normal), i.e., y -> {gaussian} and the link function is identity,
i.e., g(y) = y. Thus, we can write g(E(y)) as just E(y).
http://www.gseis.ucla.edu/courses/ed231c/notes1/glm.html
Page 1 sur 9
26/02/10 12:26
Another example is poisson regression in which the distribution family is poisson, i.e., y -> {poisson}
and the link function is the natural log, i.e., g(y) = ln(y). The glm model would then be written as,
g(E(y)) = b0 + b1 X1 + b2 X2 + ...
y -> {poisson}
Here are examples of distributions and link functions for some common estimation procedures:
type of
estimation
OLS regression
logistic regression
probit
cloglog
poisson regression
neg binomial regression
distribution
family
gaussian
binomial
binomial
binomial
poisson
neg binomial
link
function
identity
logit
probit
cloglog
log
log
Stata's GLM Procedure

Stata's glm procedure estimates generalized linear models in which the user can specify both the
distribution family and the link function. Here is the basic syntax of the glm procedure:
glm depvar indvars [if exp] [in range] [, family(fname) link(lname) eform ]
where fname can take on the values gaussian | igaussian | binomial | poisson | nbinomial | gamma
and lname can take on the values identity | log | logit | probit | cloglog | nbinomial |power | opower.
An OLS regression would look like this using regress and glm:
regress write read math gender
glm write read math gender, family(gaus) link(iden)
A logistic regression would look like this:
logistic honors read math gender
glm honors read math gender, family(binom) link(logit)
A poisson regression would look like this:
poisson days read math gender
glm days read math gender, family(poisson) link(log)
A negative binomial regression would look like this:
nbreg days read math gender
glm days read math gender, family(nbinom) link(log)
Here is a list of the allowable distribution families:
gaussian (normal)
inverse gaussian
bernoulli (binomial)
poisson
Page 2 sur 9
26/02/10 12:26
negative binomial
gamma
And here is a list of the link functions that are available:
indentity
log
logit
probit
complementary log-log
odds power
power
negative binomial
log-log
log-compliment
Of course, if all that glm could do was duplicate OLS, logistic, poisson and negative binomial regression
that it would not appear to be very useful. However, it is possible to combine distribution families and
link functions in ways that do not duplicate existing estimation procedures. The table below give the
possible combinations that make sense from a data analysis perspective:
gaussian
inverse gaussian
binomial
poisson
negative binomial
gamma
iden log logit probit cloglog nbinom power opower

X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
loglog
X
logc
X
Examples
use http://www.gseis.ucla.edu/courses/data/hsb2
generate hon = write>=60
regress write read math female
Source |
SS
df
MS
-------------+-----------------------------Model | 9405.34864
3 3135.11621
Residual | 8473.52636
196 43.2322773
-------------+-----------------------------Total |
17878.875
199
89.843593
Number of obs
F( 3,
196)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
200
72.52
0.0000
0.5261
0.5188
6.5751
-----------------------------------------------------------------------------write |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------read |
.3252389
.0607348
5.36
0.000
.2054613
.4450166
math |
.3974826
.0664037
5.99
0.000
.266525
.5284401
female |
5.44337
.9349987
5.82
0.000
3.59942
7.287319
_cons |
11.89566
2.862845
4.16
0.000
6.249728
17.5416
-----------------------------------------------------------------------------glm write read math female, link(iden) fam(gauss) nolog
Generalized linear models
No. of obs
200
Page 3 sur 9
26/02/10 12:26
Optimization
: ML: Newton-Raphson
Deviance
Pearson
=
=
8473.526357
8473.526357
Residual df
Scale parameter
(1/df) Deviance
(1/df) Pearson
Variance function: V(u) = 1

Link function
: g(u) = u
Standard errors : OIM
[Gaussian]
[Identity]
Log likelihood
BIC
AIC
= -658.4261736
= 7435.056153
=
=
=
=
196
43.23228
43.23228
43.23228
6.624262
-----------------------------------------------------------------------------write |
Coef.
Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------read |
.3252389
.0607348
5.36
0.000
.2062009
.444277
math |
.3974826
.0664037
5.99
0.000
.2673336
.5276315
female |
5.44337
.9349987
5.82
0.000
3.610806
7.275934
_cons |
11.89566
2.862845
4.16
0.000
6.28459
17.50674
-----------------------------------------------------------------------------logit hon read math female, nolog
Logit estimates
Log likelihood = -75.209827
Number of obs
LR chi2(3)
Prob > chi2
Pseudo R2
=
=
=
=
200
80.87
0.0000
0.3496
-----------------------------------------------------------------------------hon |
Coef.
Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------read |
.0752424
.027577
2.73
0.006
.0211924
.1292924
math |
.1317117
.0324607
4.06
0.000
.06809
.1953335
female |
1.154801
.4340856
2.66
0.008
.304009
2.005593
_cons | -13.12749
1.850769
-7.09
0.000
-16.75493
-9.50005
-----------------------------------------------------------------------------logit, or
Logit estimates
Number of obs
LR chi2(3)
Prob > chi2
Pseudo R2
=
=
=
=
200
80.87
0.0000
0.3496
-----------------------------------------------------------------------------hon | Odds Ratio

Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------read |
1.078145
.0297321
2.73
0.006
1.021419
1.138023
math |
1.140779
.0370305
4.06
0.000
1.070462
1.215716
female |
3.173393
1.377524
2.66
0.008
1.355281
7.430502
-----------------------------------------------------------------------------glm hon read math female, link(logit) fam(bin) nolog
Optimization
Deviance
Pearson
=
=
150.4196543
164.2509104
Variance function: V(u) = u*(1-u)

No. of obs
Residual df
Scale parameter
(1/df) Deviance
(1/df) Pearson
=
=
=
=
=
200
196
1
.7674472
.8380148
[Bernoulli]
Page 4 sur 9
26/02/10 12:26
Link function
Standard errors
: g(u) = ln(u/(1-u))
: OIM
[Logit]
Log likelihood
BIC
= -75.20982717
= -888.0505495
AIC
.7920983
-----------------------------------------------------------------------------hon |
Coef.
Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------read |
.0752424
.0275779
2.73
0.006
.0211906
.1292941
math |
.1317117
.0324623
4.06
0.000
.0680869
.1953366
female |
1.154801
.4341012
2.66
0.008
.3039785
2.005624
_cons | -13.12749
1.850893
-7.09
0.000
-16.75517
-9.499808
-----------------------------------------------------------------------------glm, eform
Optimization
Deviance
Pearson
=
=
150.4196543
164.2509104
No. of obs
Residual df
Scale parameter
(1/df) Deviance
(1/df) Pearson

Link function
: g(u) = ln(u/(1-u))
[Bernoulli]
[Logit]
Log likelihood
BIC
AIC
= -75.20982717
= -888.0505495
=
=
=
=
=
200
196
1
.7674472
.8380148
.7920983
-----------------------------------------------------------------------------hon | Odds Ratio

Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------read |
1.078145
.029733
2.73
0.006
1.021417
1.138025
math |
1.140779
.0370323
4.06
0.000
1.070458
1.21572
female |
3.173393
1.377573
2.66
0.008
1.35524
7.430728
-----------------------------------------------------------------------------probit hon read math female, nolog
Probit estimates
Number of obs
LR chi2(3)
Prob > chi2
Pseudo R2
=
=
=
=
200
81.80
0.0000
0.3537
-----------------------------------------------------------------------------hon |
Coef.
Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------read |
.0473262
.0157561
3.00
0.003
.0164449
.0782076
math |
.0735256
.0173216
4.24
0.000
.0395759
.1074754
female |
.6824682
.2447275
2.79
0.005
.2028112
1.162125
_cons | -7.663304
.9921289
-7.72
0.000
-9.607841
-5.718767
-----------------------------------------------------------------------------glm hon read math female, link(probit) fam(bin) nolog
Optimization
Deviance
Pearson
=
=
149.4918859
160.9679286
No. of obs
Residual df
Scale parameter
(1/df) Deviance
(1/df) Pearson
=
=
=
=
=
200
196
1
.7627137
.8212649
Page 5 sur 9
26/02/10 12:26

Link function
: g(u) = invnorm(u)
[Bernoulli]
[Probit]
Log likelihood
BIC
AIC
= -74.74594294
= -888.978318
.7874594
-----------------------------------------------------------------------------hon |
Coef.
Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------read |
.0473262
.0157561
3.00
0.003
.0164448
.0782077
math |
.0735256
.0173217
4.24
0.000
.0395758
.1074755
female |
.6824681
.2447281
2.79
0.005
.2028098
1.162126
_cons | -7.663303
.9921345
-7.72
0.000
-9.607851
-5.718755
-----------------------------------------------------------------------------use http://www.gseis.ucla.edu/courses/data/lahigh, clear
poisson daysabs langnce gender, nolog
Poisson regression
Number of obs
LR chi2(2)
Prob > chi2
Pseudo R2
=
=
=
=
316
171.50
0.0000
0.0524
-----------------------------------------------------------------------------daysabs |
Coef.
Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------langnce |
-.01467
.0012934
-11.34
0.000
-.0172051
-.0121349
gender | -.4093528
.0482192
-8.49
0.000
-.5038606
-.3148449
_cons |
2.646977
.0697764
37.94
0.000
2.510217
2.783736
-----------------------------------------------------------------------------poisson, irr
Poisson regression
Number of obs
LR chi2(2)
Prob > chi2
Pseudo R2
=
=
=
=
316
171.50
0.0000
0.0524
-----------------------------------------------------------------------------daysabs |
IRR
Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------langnce |
.9854371
.0012746
-11.34
0.000
.982942
.9879384
gender |
.6640799
.0320214
-8.49
0.000
.6041936
.7299021
-----------------------------------------------------------------------------glm daysabs langnce gender, link(log) fam(poisson) nolog
Optimization
Deviance
Pearson
=
=
2238.317597
2752.913231
Variance function: V(u) = u

Link function
: g(u) = ln(u)
No. of obs
Residual df
Scale parameter
(1/df) Deviance
(1/df) Pearson
=
=
=
=
=
316
313
1
7.151174
8.79525
[Poisson]
[Log]
Page 6 sur 9
Log likelihood
BIC
26/02/10 12:26
=
=
-1549.85665
436.7702841
AIC
9.828207
-----------------------------------------------------------------------------daysabs |
Coef.
Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------langnce |
-.01467
.0012934
-11.34
0.000
-.0172051
-.0121349
gender | -.4093528
.0482192
-8.49
0.000
-.5038606
-.3148449
_cons |
2.646977
.0697764
37.94
0.000
2.510217
2.783736
-----------------------------------------------------------------------------glm, eform
Optimization
Deviance
Pearson
=
=
2238.317597
2752.913231
No. of obs
Residual df
Scale parameter
(1/df) Deviance
(1/df) Pearson
Variance function: V(u) = u

Link function
: g(u) = ln(u)
[Poisson]
[Log]
Log likelihood
BIC
AIC
=
=
-1549.85665
436.7702841
=
=
=
=
=
316
313
1
7.151174
8.79525
9.828207
-----------------------------------------------------------------------------daysabs |
IRR
Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------langnce |
.9854371
.0012746
-11.34
0.000
.982942
.9879384
gender |
.6640799
.0320214
-8.49
0.000
.6041936
.7299021
-----------------------------------------------------------------------------nbreg daysabs langnce gender, nolog
Negative binomial regression
Log likelihood =
-880.9274
Number of obs
LR chi2(2)
Prob > chi2
Pseudo R2
=
=
=
=
316
20.63
0.0000
0.0116
-----------------------------------------------------------------------------daysabs |
Coef.
Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------langnce | -.0156493
.0039485
-3.96
0.000
-.0233882
-.0079104
gender | -.4312069
.1396913
-3.09
0.002
-.7049968
-.1574169
_cons |
2.70344
.2292762
11.79
0.000
2.254067
3.152813
-------------+---------------------------------------------------------------/lnalpha |
.25394
.095509
.0667457
.4411342
-------------+---------------------------------------------------------------alpha |
1.289094
.1231201
1.069024
1.554469
-----------------------------------------------------------------------------Likelihood ratio test of alpha=0: chibar2(01) = 1337.86 Prob>=chibar2 = 0.000
glm daysabs langnce gender, link(log) fam(nbin) nolog
Optimization
Deviance
Pearson
=
=
425.603464
415.6288036
No. of obs
Residual df
Scale parameter
(1/df) Deviance
(1/df) Pearson
=
=
=
=
=
316
313
1
1.359755
1.327888
Page 7 sur 9
26/02/10 12:26
Variance function: V(u) = u+(1)u^2

Link function
: g(u) = ln(u)
[Neg. Binomial]
[Log]
Log likelihood
BIC
AIC
= -884.4953535
= -1375.943849
5.617059
-----------------------------------------------------------------------------daysabs |
Coef.
Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------langnce | -.0156357
.0035438
-4.41
0.000
-.0225814
-.0086899
gender | -.4307736
.1253082
-3.44
0.001
-.6763732
-.185174
_cons |
2.702606
.2052709
13.17
0.000
2.300282
3.104929
-----------------------------------------------------------------------------glm, eform
Optimization
Deviance
Pearson
=
=
425.603464
415.6288036
No. of obs
Residual df
Scale parameter
(1/df) Deviance
(1/df) Pearson
Variance function: V(u) = u+(1)u^2

Link function
: g(u) = ln(u)
[Neg. Binomial]
[Log]
Log likelihood
BIC
AIC
= -884.4953535
= -1375.943849
=
=
=
=
=
316
313
1
1.359755
1.327888
5.617059
-----------------------------------------------------------------------------daysabs |
IRR
Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------langnce |
.9844859
.0034888
-4.41
0.000
.9776716
.9913477
gender |
.650006
.0814511
-3.44
0.001
.5084577
.8309596
-----------------------------------------------------------------------------glm daysabs langnce gender, fam(gamma) link(log) nolog
Optimization
Deviance
Pearson
=
=
251.8270233
495.7055497
No. of obs
Residual df
Scale parameter
(1/df) Deviance
(1/df) Pearson
Variance function: V(u) = u^2

Link function
: g(u) = ln(u)
[Gamma]
[Log]
Log likelihood
BIC
AIC
= -856.2487643
= -1549.72029
=
=
=
=
=
316
313
1.583724
.8045592
1.583724
5.438283
-----------------------------------------------------------------------------daysabs |
Coef.
Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------langnce | -.0156852
.0040626
-3.86
0.000
-.0236478
-.0077226
gender | -.4326492
.1443719
-3.00
0.003
-.7156129
-.1496854
_cons |
2.705757
.2383799
11.35
0.000
2.238541
3.172973
-----------------------------------------------------------------------------glm, eform
Page 8 sur 9
26/02/10 12:26

Optimization
Deviance
Pearson
=
=
251.8270233
495.7055497
No. of obs
Residual df
Scale parameter
(1/df) Deviance
(1/df) Pearson
Variance function: V(u) = u^2

Link function
: g(u) = ln(u)
[Gamma]
[Log]
Log likelihood
BIC
AIC
= -856.2487643
= -1549.72029
=
=
=
=
=
316
313
1.583724
.8045592
1.583724
5.438283
-----------------------------------------------------------------------------daysabs |
ExpB
Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------langnce |
.9844372
.0039994
-3.86
0.000
.9766296
.9923071
gender |
.6487881
.0936668
-3.00
0.003
.4888924
.8609788
------------------------------------------------------------------------------
Categorical Data Analysis Course

Phil Ender
Page 9 sur 9

Ed231C - Generalized Linear Models

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ed231C - Generalized Linear Models

Uploaded by

Copyright:

Available Formats

Ed231C: Generalized Linear Models

Applies Categorical & Nonnormal Data Analysis

where F is the distribution family and g( ) is the link function.

Now we can replace Y' with E(y),