You are on page 1of 9

Ed231C: Generalized Linear Models

26/02/10 12:26

Applies Categorical & Nonnormal Data Analysis


Generalized Linear Models
Generalized Linear Models
Most students are introduced to linear models through either multiple regression or analysis of variance.
With these methods the expected value of the response variable is statistically modeled, that is, it is
expressed as a linear combination of the explanatory variables. With categorical and count response
variables, the regression cannot be linear. The problem of nonlinearity is handled through nonlinear
functions that transform the expected value of the categorical or count variable into a linear function of
the explanatory variables. Such transformations are referred to as link functions.
For example, in the analysis of count data, the expected frequencies must be nonnegative. To ensure that
the predicted values from the linear models fit these constraints, the log link is used to transform the
expected value of the response variable. This loglinear transformation serves two purposes: it ensures that
the fitted values are appropriate for count data, and it permits the unknown regression parameters to lie
within the real number space.
Different types of response variables utilize different link functions: both the logit and probit link
functions work with binomial response variables while the log link function works with both poisson and
negative binomial response variables. Growing out of the work of Nelder & Wedderburn (1972) and
McCullagh & Nelder (1989), generalized linear models provides a unified framework which can be
applied to various 'linear' models.
Generalized linear models take the form:
g(E(y)) = x,

y -> {F}

where F is the distribution family and g( ) is the link function.


You might recognize this example more easily if it were rewritten as follows:
Y' = b0 + b1 X1 + b2 X2 + ...

y -> {gaussian}

Now we can replace Y' with E(y),


E(y) = b0 + b1 X1 + b2 X2 + ...

y -> {gaussian}

In OLS the distribution family is gaussian (normal), i.e., y -> {gaussian} and the link function is identity,
i.e., g(y) = y. Thus, we can write g(E(y)) as just E(y).
http://www.gseis.ucla.edu/courses/ed231c/notes1/glm.html

Page 1 sur 9

Ed231C: Generalized Linear Models

26/02/10 12:26

Another example is poisson regression in which the distribution family is poisson, i.e., y -> {poisson}
and the link function is the natural log, i.e., g(y) = ln(y). The glm model would then be written as,
g(E(y)) = b0 + b1 X1 + b2 X2 + ...

y -> {poisson}

Here are examples of distributions and link functions for some common estimation procedures:
type of
estimation
OLS regression
logistic regression
probit
cloglog
poisson regression
neg binomial regression

distribution
family
gaussian
binomial
binomial
binomial
poisson
neg binomial

link
function
identity
logit
probit
cloglog
log
log

Stata's GLM Procedure


Stata's glm procedure estimates generalized linear models in which the user can specify both the
distribution family and the link function. Here is the basic syntax of the glm procedure:
glm depvar indvars [if exp] [in range] [, family(fname) link(lname) eform ]
where fname can take on the values gaussian | igaussian | binomial | poisson | nbinomial | gamma
and lname can take on the values identity | log | logit | probit | cloglog | nbinomial |power | opower.
An OLS regression would look like this using regress and glm:
regress write read math gender
glm write read math gender, family(gaus) link(iden)
A logistic regression would look like this:
logistic honors read math gender
glm honors read math gender, family(binom) link(logit)
A poisson regression would look like this:
poisson days read math gender
glm days read math gender, family(poisson) link(log)
A negative binomial regression would look like this:
nbreg days read math gender
glm days read math gender, family(nbinom) link(log)
Here is a list of the allowable distribution families:
gaussian (normal)
inverse gaussian
bernoulli (binomial)
poisson
http://www.gseis.ucla.edu/courses/ed231c/notes1/glm.html

Page 2 sur 9

Ed231C: Generalized Linear Models

26/02/10 12:26

negative binomial
gamma
And here is a list of the link functions that are available:
indentity
log
logit
probit
complementary log-log
odds power
power
negative binomial
log-log
log-compliment
Of course, if all that glm could do was duplicate OLS, logistic, poisson and negative binomial regression
that it would not appear to be very useful. However, it is possible to combine distribution families and
link functions in ways that do not duplicate existing estimation procedures. The table below give the
possible combinations that make sense from a data analysis perspective:
gaussian
inverse gaussian
binomial
poisson
negative binomial
gamma

iden log logit probit cloglog nbinom power opower


X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X

loglog
X

logc
X

Examples
use http://www.gseis.ucla.edu/courses/data/hsb2
generate hon = write>=60
regress write read math female
Source |
SS
df
MS
-------------+-----------------------------Model | 9405.34864
3 3135.11621
Residual | 8473.52636
196 43.2322773
-------------+-----------------------------Total |
17878.875
199
89.843593

Number of obs
F( 3,
196)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

200
72.52
0.0000
0.5261
0.5188
6.5751

-----------------------------------------------------------------------------write |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------read |
.3252389
.0607348
5.36
0.000
.2054613
.4450166
math |
.3974826
.0664037
5.99
0.000
.266525
.5284401
female |
5.44337
.9349987
5.82
0.000
3.59942
7.287319
_cons |
11.89566
2.862845
4.16
0.000
6.249728
17.5416
-----------------------------------------------------------------------------glm write read math female, link(iden) fam(gauss) nolog
Generalized linear models
http://www.gseis.ucla.edu/courses/ed231c/notes1/glm.html

No. of obs

200
Page 3 sur 9

Ed231C: Generalized Linear Models

26/02/10 12:26

Optimization

: ML: Newton-Raphson

Deviance
Pearson

=
=

8473.526357
8473.526357

Residual df
Scale parameter
(1/df) Deviance
(1/df) Pearson

Variance function: V(u) = 1


Link function
: g(u) = u
Standard errors : OIM

[Gaussian]
[Identity]

Log likelihood
BIC

AIC

= -658.4261736
= 7435.056153

=
=
=
=

196
43.23228
43.23228
43.23228

6.624262

-----------------------------------------------------------------------------write |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------read |
.3252389
.0607348
5.36
0.000
.2062009
.444277
math |
.3974826
.0664037
5.99
0.000
.2673336
.5276315
female |
5.44337
.9349987
5.82
0.000
3.610806
7.275934
_cons |
11.89566
2.862845
4.16
0.000
6.28459
17.50674
-----------------------------------------------------------------------------logit hon read math female, nolog
Logit estimates
Log likelihood = -75.209827

Number of obs
LR chi2(3)
Prob > chi2
Pseudo R2

=
=
=
=

200
80.87
0.0000
0.3496

-----------------------------------------------------------------------------hon |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------read |
.0752424
.027577
2.73
0.006
.0211924
.1292924
math |
.1317117
.0324607
4.06
0.000
.06809
.1953335
female |
1.154801
.4340856
2.66
0.008
.304009
2.005593
_cons | -13.12749
1.850769
-7.09
0.000
-16.75493
-9.50005
-----------------------------------------------------------------------------logit, or
Logit estimates
Log likelihood = -75.209827

Number of obs
LR chi2(3)
Prob > chi2
Pseudo R2

=
=
=
=

200
80.87
0.0000
0.3496

-----------------------------------------------------------------------------hon | Odds Ratio


Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------read |
1.078145
.0297321
2.73
0.006
1.021419
1.138023
math |
1.140779
.0370305
4.06
0.000
1.070462
1.215716
female |
3.173393
1.377524
2.66
0.008
1.355281
7.430502
-----------------------------------------------------------------------------glm hon read math female, link(logit) fam(bin) nolog
Generalized linear models
Optimization
: ML: Newton-Raphson
Deviance
Pearson

=
=

150.4196543
164.2509104

Variance function: V(u) = u*(1-u)


http://www.gseis.ucla.edu/courses/ed231c/notes1/glm.html

No. of obs
Residual df
Scale parameter
(1/df) Deviance
(1/df) Pearson

=
=
=
=
=

200
196
1
.7674472
.8380148

[Bernoulli]
Page 4 sur 9

Ed231C: Generalized Linear Models

26/02/10 12:26

Link function
Standard errors

: g(u) = ln(u/(1-u))
: OIM

[Logit]

Log likelihood
BIC

= -75.20982717
= -888.0505495

AIC

.7920983

-----------------------------------------------------------------------------hon |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------read |
.0752424
.0275779
2.73
0.006
.0211906
.1292941
math |
.1317117
.0324623
4.06
0.000
.0680869
.1953366
female |
1.154801
.4341012
2.66
0.008
.3039785
2.005624
_cons | -13.12749
1.850893
-7.09
0.000
-16.75517
-9.499808
-----------------------------------------------------------------------------glm, eform
Generalized linear models
Optimization
: ML: Newton-Raphson
Deviance
Pearson

=
=

150.4196543
164.2509104

No. of obs
Residual df
Scale parameter
(1/df) Deviance
(1/df) Pearson

Variance function: V(u) = u*(1-u)


Link function
: g(u) = ln(u/(1-u))
Standard errors : OIM

[Bernoulli]
[Logit]

Log likelihood
BIC

AIC

= -75.20982717
= -888.0505495

=
=
=
=
=

200
196
1
.7674472
.8380148

.7920983

-----------------------------------------------------------------------------hon | Odds Ratio


Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------read |
1.078145
.029733
2.73
0.006
1.021417
1.138025
math |
1.140779
.0370323
4.06
0.000
1.070458
1.21572
female |
3.173393
1.377573
2.66
0.008
1.35524
7.430728
-----------------------------------------------------------------------------probit hon read math female, nolog
Probit estimates
Log likelihood = -74.745943

Number of obs
LR chi2(3)
Prob > chi2
Pseudo R2

=
=
=
=

200
81.80
0.0000
0.3537

-----------------------------------------------------------------------------hon |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------read |
.0473262
.0157561
3.00
0.003
.0164449
.0782076
math |
.0735256
.0173216
4.24
0.000
.0395759
.1074754
female |
.6824682
.2447275
2.79
0.005
.2028112
1.162125
_cons | -7.663304
.9921289
-7.72
0.000
-9.607841
-5.718767
-----------------------------------------------------------------------------glm hon read math female, link(probit) fam(bin) nolog
Generalized linear models
Optimization
: ML: Newton-Raphson
Deviance
Pearson

=
=

149.4918859
160.9679286

http://www.gseis.ucla.edu/courses/ed231c/notes1/glm.html

No. of obs
Residual df
Scale parameter
(1/df) Deviance
(1/df) Pearson

=
=
=
=
=

200
196
1
.7627137
.8212649
Page 5 sur 9

Ed231C: Generalized Linear Models

26/02/10 12:26

Variance function: V(u) = u*(1-u)


Link function
: g(u) = invnorm(u)
Standard errors : OIM

[Bernoulli]
[Probit]

Log likelihood
BIC

AIC

= -74.74594294
= -888.978318

.7874594

-----------------------------------------------------------------------------hon |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------read |
.0473262
.0157561
3.00
0.003
.0164448
.0782077
math |
.0735256
.0173217
4.24
0.000
.0395758
.1074755
female |
.6824681
.2447281
2.79
0.005
.2028098
1.162126
_cons | -7.663303
.9921345
-7.72
0.000
-9.607851
-5.718755
-----------------------------------------------------------------------------use http://www.gseis.ucla.edu/courses/data/lahigh, clear
poisson daysabs langnce gender, nolog
Poisson regression
Log likelihood = -1549.8567

Number of obs
LR chi2(2)
Prob > chi2
Pseudo R2

=
=
=
=

316
171.50
0.0000
0.0524

-----------------------------------------------------------------------------daysabs |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------langnce |
-.01467
.0012934
-11.34
0.000
-.0172051
-.0121349
gender | -.4093528
.0482192
-8.49
0.000
-.5038606
-.3148449
_cons |
2.646977
.0697764
37.94
0.000
2.510217
2.783736
-----------------------------------------------------------------------------poisson, irr
Poisson regression
Log likelihood = -1549.8567

Number of obs
LR chi2(2)
Prob > chi2
Pseudo R2

=
=
=
=

316
171.50
0.0000
0.0524

-----------------------------------------------------------------------------daysabs |
IRR
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------langnce |
.9854371
.0012746
-11.34
0.000
.982942
.9879384
gender |
.6640799
.0320214
-8.49
0.000
.6041936
.7299021
-----------------------------------------------------------------------------glm daysabs langnce gender, link(log) fam(poisson) nolog
Generalized linear models
Optimization
: ML: Newton-Raphson
Deviance
Pearson

=
=

2238.317597
2752.913231

Variance function: V(u) = u


Link function
: g(u) = ln(u)
Standard errors : OIM

http://www.gseis.ucla.edu/courses/ed231c/notes1/glm.html

No. of obs
Residual df
Scale parameter
(1/df) Deviance
(1/df) Pearson

=
=
=
=
=

316
313
1
7.151174
8.79525

[Poisson]
[Log]

Page 6 sur 9

Ed231C: Generalized Linear Models

Log likelihood
BIC

26/02/10 12:26

=
=

-1549.85665
436.7702841

AIC

9.828207

-----------------------------------------------------------------------------daysabs |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------langnce |
-.01467
.0012934
-11.34
0.000
-.0172051
-.0121349
gender | -.4093528
.0482192
-8.49
0.000
-.5038606
-.3148449
_cons |
2.646977
.0697764
37.94
0.000
2.510217
2.783736
-----------------------------------------------------------------------------glm, eform
Generalized linear models
Optimization
: ML: Newton-Raphson
Deviance
Pearson

=
=

2238.317597
2752.913231

No. of obs
Residual df
Scale parameter
(1/df) Deviance
(1/df) Pearson

Variance function: V(u) = u


Link function
: g(u) = ln(u)
Standard errors : OIM

[Poisson]
[Log]

Log likelihood
BIC

AIC

=
=

-1549.85665
436.7702841

=
=
=
=
=

316
313
1
7.151174
8.79525

9.828207

-----------------------------------------------------------------------------daysabs |
IRR
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------langnce |
.9854371
.0012746
-11.34
0.000
.982942
.9879384
gender |
.6640799
.0320214
-8.49
0.000
.6041936
.7299021
-----------------------------------------------------------------------------nbreg daysabs langnce gender, nolog
Negative binomial regression
Log likelihood =

-880.9274

Number of obs
LR chi2(2)
Prob > chi2
Pseudo R2

=
=
=
=

316
20.63
0.0000
0.0116

-----------------------------------------------------------------------------daysabs |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------langnce | -.0156493
.0039485
-3.96
0.000
-.0233882
-.0079104
gender | -.4312069
.1396913
-3.09
0.002
-.7049968
-.1574169
_cons |
2.70344
.2292762
11.79
0.000
2.254067
3.152813
-------------+---------------------------------------------------------------/lnalpha |
.25394
.095509
.0667457
.4411342
-------------+---------------------------------------------------------------alpha |
1.289094
.1231201
1.069024
1.554469
-----------------------------------------------------------------------------Likelihood ratio test of alpha=0: chibar2(01) = 1337.86 Prob>=chibar2 = 0.000
glm daysabs langnce gender, link(log) fam(nbin) nolog
Generalized linear models
Optimization
: ML: Newton-Raphson
Deviance
Pearson

=
=

425.603464
415.6288036

http://www.gseis.ucla.edu/courses/ed231c/notes1/glm.html

No. of obs
Residual df
Scale parameter
(1/df) Deviance
(1/df) Pearson

=
=
=
=
=

316
313
1
1.359755
1.327888

Page 7 sur 9

Ed231C: Generalized Linear Models

26/02/10 12:26

Variance function: V(u) = u+(1)u^2


Link function
: g(u) = ln(u)
Standard errors : OIM

[Neg. Binomial]
[Log]

Log likelihood
BIC

AIC

= -884.4953535
= -1375.943849

5.617059

-----------------------------------------------------------------------------daysabs |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------langnce | -.0156357
.0035438
-4.41
0.000
-.0225814
-.0086899
gender | -.4307736
.1253082
-3.44
0.001
-.6763732
-.185174
_cons |
2.702606
.2052709
13.17
0.000
2.300282
3.104929
-----------------------------------------------------------------------------glm, eform
Generalized linear models
Optimization
: ML: Newton-Raphson
Deviance
Pearson

=
=

425.603464
415.6288036

No. of obs
Residual df
Scale parameter
(1/df) Deviance
(1/df) Pearson

Variance function: V(u) = u+(1)u^2


Link function
: g(u) = ln(u)
Standard errors : OIM

[Neg. Binomial]
[Log]

Log likelihood
BIC

AIC

= -884.4953535
= -1375.943849

=
=
=
=
=

316
313
1
1.359755
1.327888

5.617059

-----------------------------------------------------------------------------daysabs |
IRR
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------langnce |
.9844859
.0034888
-4.41
0.000
.9776716
.9913477
gender |
.650006
.0814511
-3.44
0.001
.5084577
.8309596
-----------------------------------------------------------------------------glm daysabs langnce gender, fam(gamma) link(log) nolog
Generalized linear models
Optimization
: ML: Newton-Raphson
Deviance
Pearson

=
=

251.8270233
495.7055497

No. of obs
Residual df
Scale parameter
(1/df) Deviance
(1/df) Pearson

Variance function: V(u) = u^2


Link function
: g(u) = ln(u)
Standard errors : OIM

[Gamma]
[Log]

Log likelihood
BIC

AIC

= -856.2487643
= -1549.72029

=
=
=
=
=

316
313
1.583724
.8045592
1.583724

5.438283

-----------------------------------------------------------------------------daysabs |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------langnce | -.0156852
.0040626
-3.86
0.000
-.0236478
-.0077226
gender | -.4326492
.1443719
-3.00
0.003
-.7156129
-.1496854
_cons |
2.705757
.2383799
11.35
0.000
2.238541
3.172973
-----------------------------------------------------------------------------glm, eform
http://www.gseis.ucla.edu/courses/ed231c/notes1/glm.html

Page 8 sur 9

Ed231C: Generalized Linear Models

26/02/10 12:26

Generalized linear models


Optimization
: ML: Newton-Raphson
Deviance
Pearson

=
=

251.8270233
495.7055497

No. of obs
Residual df
Scale parameter
(1/df) Deviance
(1/df) Pearson

Variance function: V(u) = u^2


Link function
: g(u) = ln(u)
Standard errors : OIM

[Gamma]
[Log]

Log likelihood
BIC

AIC

= -856.2487643
= -1549.72029

=
=
=
=
=

316
313
1.583724
.8045592
1.583724

5.438283

-----------------------------------------------------------------------------daysabs |
ExpB
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------langnce |
.9844372
.0039994
-3.86
0.000
.9766296
.9923071
gender |
.6487881
.0936668
-3.00
0.003
.4888924
.8609788
------------------------------------------------------------------------------

Categorical Data Analysis Course


Phil Ender

http://www.gseis.ucla.edu/courses/ed231c/notes1/glm.html

Page 9 sur 9

You might also like