You are on page 1of 32

PRI workshop

Multilevel modeling: A comparison of software packages

Juan Leon Jara Almonte


Pennsylvania State University
814 Oswald
jjl292@psu.edu

March 2009

I. Multilevel Modeling
Multilevel models are statistical models used to analyze data that have a hierarchical or
nested structure, for example:
-

Students nested in schools


Children nested in households
Patients nested in hospitals
Criminals nested in jails
Repeated measures nested in individuals (e.g.: students, patients)

Thus, to address this issues, using a multilevel approach will be a more accurate model
than a ordinary least squares regression (OLS). The names received for this model in the
different fields or literature around this topic is: random coefficient model, variance
component model, and hierarchical linear model.
The basic two level model:
Level 1
Yij

0j + 1jXij + rij

- equation 1

00 + 01Zj + u0j
10

- equation 2
- equation 3

Level 2
0j
1j

=
=

Substitution of ( 2 ) and ( 3 ) in ( 1 )
Yij

00 + 10Xij + 01Zj + rij + u0j

- equation 4

Where
i
j
0j
00
Xij
10
rij and u0j

: subjects
: groups
: the mean of Yij for group j
: the grand mean of Yij
: the predictor variable for the subject i in the group j
: the fixed effect of the predictor variable Xij across groups
: random components normally distributed and independent of each other

We could add more predictors at level 1 and we could add random components at level 2.
However, the basic idea of multilevel modeling is to decompose the component variance
in two non-observables: one at a subject level and one at a group level.

To estimate this model, Maximum Likelihood estimators (MLE) are generally used.
MLEs require an iterative procedure until the model converges, allowing us to obtain
unbiased and efficient estimators.
Assumptions of OLS and Multilevel modeling
OLS estimation
Multilevel estimation
Linearity: Function
Linearity: at each level
Normality: Residuals
Normality: at each level
Homoscedasticity: Constant variance
Homoscedasticity: At level 1
Independence: between observations
Independence:
- No correlation between residuals at
level 1 and level 2.
- Observations in the highest level are
independent of each other.
Mean indicators for multilevel modeling
Proportion of variance:
At level 1: rij / (rij + u0j )
At level 2: u0j / (rij + u0j)

intraclass correlation coefficient

Variance explained (Proxy for an R-square):


Take as a base the null model
At level 1: ( rij(null) - rij(estimated))/ rij(null)
At level 2: ( u0j(null) - u0j(estimated))/ u0j(null)

Datasets that will be employed:


Student_level.dta/sav :
School_level.dta/sav :

Student level variables for Sixth grade students


Classroom and school level variables for each student

Variables in the dataset for the exercises:


idschool
idclass
idstudent
math
age
gender
nsibbl
nuclear
ses_i
indig
gestion
tipo
area
prog_ebi

:
:
:
:
:
:
:
:
:
:
:
:
:
:

bserv
dtotal

:
:

Schools ID
Classroom code
Students code
Math achievement in 2004
Age (years)
1 Female 0 Male
Number of siblings
The student lives with both parents
Socioeconomic index
The student has as a mother tongue an indigenous language
The school is public
Full grade school
1 Urban 0 Rural
Schools that participate in the government-headed bilingual
program
Number of basic services at school
Days of class during the year

II. Mixed models STATA


*There is two commands to run a multilevel model in STATA
*xtmixed
*gllamm
*XTMIXED
*********
*1st - Identified the dependent variable: math
*2nd - Number of levels: 2
*3rd - Identified level two id: idschool
*4th - With or without random coefficients: Without
*5th - Identified predictors at level 1: age gender nsibbl nuclear ses_i
indig
*6th - Identified predictors at level 2: gestion tipo area bserv dtotal
prog_ebi
*Syntax*
********
*Note: the id for level two has to be numeric
destring
idschool, replace
*****************************************a two level model
*xtmixed [dependent variable] [independent variables] || [id level 2]:,
[type of estimation: reml mle] [random component: var sd]
*****************************************a three level model
*xtmixed [dependent variable] [independent variables] || [id level 3]:
|| [id level 2]:, [type of estimation: reml mle] [random component: var
sd]
*****************************************a two level model with random
coefficients
*xtmixed [dependent variable] [independent variables] || [id level 2]:
[variable with random coefficient], [type of estimation: reml mle]
[random component: var sd]
******************************************************Random components
*var(_cons)
: Level 2 variance
*var(Residual)
: Level 1 variance
*Examples*
**********

*Null Model
xtmixed

math || idschool:, var

Performing EM optimization:
Performing gradient-based optimization:
Iteration 0:
Iteration 1:

log restricted-likelihood = -55799.507


log restricted-likelihood = -55799.507

Computing standard errors:


Mixed-effects REML regression
Group variable: idschool

Log restricted-likelihood = -55799.507

Number of obs
Number of groups

=
=

10327
564

Obs per group: min =


avg =
max =

1
18.3
30

Wald chi2(0)
Prob > chi2

=
=

.
.

-----------------------------------------------------------------------------math |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------_cons |
291.5814
2.171078
134.30
0.000
287.3262
295.8366
----------------------------------------------------------------------------------------------------------------------------------------------------------Random-effects Parameters |
Estimate
Std. Err.
[95% Conf. Interval]
-----------------------------+-----------------------------------------------idschool: Identity
|
var(_cons) |
2422.165
160.494
2127.172
2758.067
-----------------------------+-----------------------------------------------var(Residual) |
2489.893
35.66315
2420.967
2560.782
-----------------------------------------------------------------------------LR test vs. linear regression: chibar2(01) = 4803.15 Prob >= chibar2 = 0.0000

*Model 1 - Student variables


xtmixed
math age gender nsibbl nuclear ses_i indig ||
idschool:, var
Performing EM optimization:
Performing gradient-based optimization:
Iteration 0:
Iteration 1:

log restricted-likelihood = -54213.956


log restricted-likelihood = -54213.956

Computing standard errors:


Mixed-effects REML regression
Group variable: idschool

Log restricted-likelihood = -54213.956

Number of obs
Number of groups

=
=

10084
564

Obs per group: min =


avg =
max =

1
17.9
30

Wald chi2(6)
Prob > chi2

=
=

581.48
0.0000

-----------------------------------------------------------------------------math |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------age | -8.197397
.5646148
-14.52
0.000
-9.304021
-7.090772
gender | -10.04743
1.053241
-9.54
0.000
-12.11175
-7.983119
nsibbl | -1.418298
.2599372
-5.46
0.000
-1.927766
-.9088307
nuclear | -4.110831
1.19185
-3.45
0.001
-6.446815
-1.774847
ses_i |
.4424885
.0532977
8.30
0.000
.338027
.54695
indig | -16.40169
2.547377
-6.44
0.000
-21.39446
-11.40892
_cons |
360.9887
9.142899
39.48
0.000
343.069
378.9085
----------------------------------------------------------------------------------------------------------------------------------------------------------Random-effects Parameters |
Estimate
Std. Err.
[95% Conf. Interval]
-----------------------------+-----------------------------------------------idschool: Identity
|
var(_cons) |
1418.413
111.9487
1215.126
1655.709
-----------------------------+-----------------------------------------------var(Residual) |
2415.155
35.33061
2346.892
2485.404
-----------------------------------------------------------------------------LR test vs. linear regression: chibar2(01) = 1760.38 Prob >= chibar2 = 0.0000

*Model 2 - School variables


xtmixed
math age gender nsibbl nuclear ses_i indig gestion
tipo area bserv dtotal prog_ebi || idschool:, var
Performing EM optimization:
Performing gradient-based optimization:
Iteration 0:
Iteration 1:

log restricted-likelihood = -54042.506


log restricted-likelihood = -54042.506

Computing standard errors:


Mixed-effects REML regression
Group variable: idschool

Log restricted-likelihood = -54042.506

Number of obs
Number of groups

=
=

10084
564

Obs per group: min =


avg =
max =

1
17.9
30

Wald chi2(12)
Prob > chi2

=
=

1170.77
0.0000

-----------------------------------------------------------------------------math |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------age | -7.704743
.5607968
-13.74
0.000
-8.803884
-6.605601
gender | -10.23747
1.045682
-9.79
0.000
-12.28697
-8.187969
nsibbl | -1.104413
.2590295
-4.26
0.000
-1.612102
-.5967245
nuclear | -3.823928
1.185488
-3.23
0.001
-6.147442
-1.500415
ses_i |
.1828045
.0559183
3.27
0.001
.0732067
.2924024
indig | -13.38815
2.510379
-5.33
0.000
-18.3084
-8.467896
gestion | -39.19536
4.042729
-9.70
0.000
-47.11896
-31.27175
tipo |
11.72788
3.866162
3.03
0.002
4.150338
19.30541
area |
4.210408
4.328804
0.97
0.331
-4.273892
12.69471
bserv |
10.77618
1.946826
5.54
0.000
6.960472
14.59189
dtotal |
.404681
.1593815
2.54
0.011
.092299
.7170629
prog_ebi | -16.20831
4.441248
-3.65
0.000
-24.91299
-7.503619
_cons |
270.5271
44.33491
6.10
0.000
183.6323
357.4219
----------------------------------------------------------------------------------------------------------------------------------------------------------Random-effects Parameters |
Estimate
Std. Err.
[95% Conf. Interval]
-----------------------------+-----------------------------------------------idschool: Identity
|
var(_cons) |
817.257
63.49015
701.8292
951.6688
-----------------------------+-----------------------------------------------var(Residual) |
2400.291
34.86962
2332.912
2469.617
-----------------------------------------------------------------------------LR test vs. linear regression: chibar2(01) = 1479.13 Prob >= chibar2 = 0.0000

*GLLAMM*
********
*1st - Identified the dependent variable: math
*2nd - Number of levels: 2
*3rd - Identified level two id: idschool
*4th - With or without random coefficients: Without
*5th - Identified predictors at level 1: age gender nsibbl nuclear ses_i
indig
*6th - Identified predictors at level 2: gestion tipo area bserv dtotal
prog_ebi
*Syntax
******************************a two level model
*Define equation
*gen
cons=1
*eq int
: cons
/* Intercept equation */
*gllamm
[dependent variable] [independent variables], i([id
level variable]) eqs([equations]) nrf([number of random components])
[other options]
******************************a three level model
*Define equation
*gen
cons=1
*eq int
: cons
/* Intercept equation */
*gllamm
[dependent variable] [independent variables], i([level
2 level 3]) eqs([equations]) nrf([number of random components]) [other
options]
******************************a two level model with random coefficients
*Define equation
*gen
cons=1
*eq int
: cons
/* Intercept equation */
*eq slope
: [variable with random coefficient]
*gllamm
[dependent variable] [independent variables], i([id
level variable]) eqs([equations]) nrf([number of random components])
[other options]

*Examples*
**********
*Null model
gen
eq int
gllamm

cons=1
: cons
math, i(idschool) eqs(int) nrf(1) adapt

Running adaptive quadrature


Iteration 0:
log likelihood
Iteration 1:
log likelihood
Iteration 2:
log likelihood
Iteration 3:
log likelihood
Iteration 4:
log likelihood

= -58198.98
= -56186.657
= -55826.109
= -55801.265
= -55801.26

Adaptive quadrature has converged, running Newton-Raphson


Iteration 0:
log likelihood = -55801.26
Iteration 1:
log likelihood = -55801.237
Iteration 2:
log likelihood =
-55801.2
Iteration 3:
log likelihood =
-55801.2
number of level 1 units = 10327
number of level 2 units = 564
Condition Number = 303.60003
gllamm model
log likelihood = -55801.2
-----------------------------------------------------------------------------math |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------_cons |
291.5841
2.171009
134.31
0.000
287.329
295.8392
-----------------------------------------------------------------------------Variance at level 1
-----------------------------------------------------------------------------2489.9024 (35.663352)
Variances and covariances of random effects
-----------------------------------------------------------------------------***level 2 (idschool)
var(1): 2417.2909 (160.0545)
------------------------------------------------------------------------------

10

*Model 1 - Student variables


gllamm
math age gender nsibbl nuclear ses_i indig,
i(idschool) eqs(int) nrf(1) adapt
Running adaptive quadrature
Iteration 0:
log likelihood
Iteration 1:
log likelihood
Iteration 2:
log likelihood
Iteration 3:
log likelihood

= -55094.61
= -54598.877
= -54489.213
= -54489.213

Adaptive quadrature has converged, running Newton-Raphson


Iteration 0:
log likelihood = -54489.213
Iteration 1:
log likelihood = -54236.507
Iteration 2:
log likelihood = -54217.441
Iteration 3:
log likelihood = -54217.18
Iteration 4:
log likelihood = -54217.18
number of level 1 units = 10084
number of level 2 units = 564
Condition Number = 1283.6933
gllamm model
log likelihood = -54217.18
-----------------------------------------------------------------------------math |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------age | -8.198358
.5652368
-14.50
0.000
-9.306201
-7.090514
gender | -10.04702
1.053157
-9.54
0.000
-12.11117
-7.982871
nsibbl | -1.419342
.2613114
-5.43
0.000
-1.931503
-.9071808
nuclear |
-4.11179
1.191921
-3.45
0.001
-6.447912
-1.775668
ses_i |
.4432147
.0580544
7.63
0.000
.3294303
.5569991
indig | -16.41488
2.601117
-6.31
0.000
-21.51298
-11.31679
_cons |
360.9406
9.283431
38.88
0.000
342.7454
379.1358
-----------------------------------------------------------------------------Variance at level 1
-----------------------------------------------------------------------------2413.8148 (35.301464)
Variances and covariances of random effects
-----------------------------------------------------------------------------***level 2 (idschool)
var(1): 1413.3884 (111.42859)
------------------------------------------------------------------------------

11

*Model 2 - School variables


gllamm
math age gender nsibbl nuclear ses_i indig gestion
tipo area bserv dtotal prog_ebi, i(idschool) eqs(int) nrf(1) adapt
Running adaptive quadrature
Iteration 0:
log likelihood
Iteration 1:
log likelihood
Iteration 2:
log likelihood
Iteration 3:
log likelihood
Iteration 4:
log likelihood

=
=
=
=
=

-54787.036
-54495.168
-54276.704
-54069.796
-54069.796

Adaptive quadrature has converged, running Newton-Raphson


Iteration 0:
log likelihood = -54069.796 (not concave)
Iteration 1:
log likelihood = -54055.649
Iteration 2:
log likelihood = -54054.891
Iteration 3:
log likelihood = -54054.886
number of level 1 units = 10084
number of level 2 units = 564
Condition Number = 9373.0729
gllamm model
log likelihood = -54054.886
-----------------------------------------------------------------------------math |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------age | -7.705684
.5605493
-13.75
0.000
-8.80434
-6.607027
gender | -10.23462
1.045321
-9.79
0.000
-12.28341
-8.185827
nsibbl | -1.106948
.2591939
-4.27
0.000
-1.614959
-.5989372
nuclear | -3.827367
1.185166
-3.23
0.001
-6.15025
-1.504485
ses_i |
.1848875
.0567318
3.26
0.001
.0736953
.2960798
indig | -13.46722
2.534508
-5.31
0.000
-18.43476
-8.499676
gestion | -39.15573
4.01795
-9.75
0.000
-47.03076
-31.28069
tipo |
11.7064
3.839827
3.05
0.002
4.180478
19.23232
area |
4.186199
4.298805
0.97
0.330
-4.239304
12.6117
bserv |
10.77714
1.933419
5.57
0.000
6.987705
14.56657
dtotal |
.4046668
.1582621
2.56
0.011
.0944787
.7148548
prog_ebi | -16.14732
4.418169
-3.65
0.000
-24.80677
-7.487869
_cons |
270.3501
44.04093
6.14
0.000
184.0315
356.6687
-----------------------------------------------------------------------------Variance at level 1
-----------------------------------------------------------------------------2399.0027 (34.841723)
Variances and covariances of random effects
-----------------------------------------------------------------------------***level 2 (idschool)
var(1): 803.02789 (62.165406)
------------------------------------------------------------------------------

*Main problem: Time to estimate the models*


*******************************************

12

III. Mixed models SPSS


*To run the multilevel model we have to use the command MIXED
***********************************************a two level model
*MIXED [dependent variable] WITH [independent variables]
*/ PRINT = SOLUTION TESTCOV
: To print the model and the test for the matrix of variance covariance
*/ FIXED = INTERCEPT [independent variables]
: Indicate the fixed part in the model
*/ RANDOM = INTERCEPT | SUBJECT([id level variable])
: Indicate the random part and the grouping variable for the data
***********************************************a three level model
*MIXED [dependent variable] WITH [independent variables]
*/ PRINT = SOLUTION TESTCOV
: To print the model and the test for the matrix of variance covariance
*/ FIXED = INTERCEPT [independent variables]
: Indicate the fixed part in the model
*/ RANDOM = INTERCEPT | SUBJECT([id level variable])
: Indicate the random part and the grouping variable for level 2
*/ RANDOM = INTERCEPT | SUBJECT([id level variable 2]*[id level variable 3])
: Indicate the random part and the grouping variable for level 2 and level
3
*********************************************** a two level model with random
coefficient
*MIXED [dependent variable] WITH [independent variables]
*/ PRINT = SOLUTION TESTCOV
: To print the model and the test for the matrix of variance covariance
*/ FIXED = INTERCEPT [independent variables]
: Indicate the fixed part in the model
*/ RANDOM = INTERCEPT [other random variables] | SUBJECT([id level variable])
COVTYPE(UN)
: Indicate the random part and the grouping variable for the data
and the type of matrix
*
covariance for the level 2 (unestructured)
*Examples*
**********
*Null Model
mixed math
/ print = solution testcov
/ fixed = intercept
/ random = intercept | subject(idschool).

13

Estimates of Fixed Effects(a)


95% Confidence Interval
Parameter
Intercept

Estimate
Std. Error
291.58139
2.171077
1
a Dependent Variable: MAth.

df

539.227

Sig.

134.303

Lower Bound

.000

Upper Bound

287.316585

295.846196

Estimates of Covariance Parameters(a)


95% Confidence Interval
Parameter
Residual
Intercept [subject = Variance
idschool]
a Dependent Variable: MAth.

Estimate
2489.8932
07
2422.1624
26

Std. Error

Wald Z

Sig.

Lower Bound

Upper Bound

35.663150

69.817

.000

2420.966727

2560.782067

160.49373
8

15.092

.000

2127.170040

2758.063864

*Model 1 - Student variables


mixed math with age gender nsibbl nuclear ses_i indig
/ print = solution testcov
/ fixed = intercept age gender nsibbl nuclear ses_i indig
/ random = intercept | subject(idschool).
Estimates of Fixed Effects(a)
95% Confidence Interval
Parameter
Intercept

Nsibbl

Estimate
360.98869
8
-8.197397
10.047433
-1.418299

nuclear

-4.110831

Age
gender

Std. Error

df

Sig.

Lower Bound

Upper Bound

9.142899

9964.925

39.483

.000

343.066768

378.910628

.564615

9830.408

-14.519

.000

-9.304158

-7.090636

1.053241

9675.797

-9.540

.000

-12.112006

-7.982860

.259937

9791.842

-5.456

.000

-1.927829

-.908768

1.191850

9579.555

-3.449

.001

-6.447110

-1.774552

.442489
.053298
2.547377
16.401693
a Dependent Variable: MAth.

8310.954

8.302

.000

.338012

.546965

8871.112

-6.439

.000

-21.395141

-11.408245

ses_i

Indigo

Estimates of Covariance Parameters(a)


95% Confidence Interval
Parameter
Residual
Intercept [subject = Variance
idschool]
a Dependent Variable: MAth.

Estimate
2415.1550
55
1418.4124
24

Std. Error

Wald Z

Sig.

Lower Bound

Upper Bound

35.330614

68.359

.000

2346.891618

2485.404054

111.94861
1

12.670

.000

1215.125679

1655.708409

*Model 2 - School variables


mixed math with age gender nsibbl nuclear ses_i indig gestion tipo area bserv dtotal prog_ebi

14

/ print = solution testcov


/ fixed = intercept age gender nsibbl nuclear ses_i indig gestion tipo area bserv dtotal prog_ebi
/ random = intercept | subject(idschool).
Estimates of Fixed Effects(a)
95% Confidence Interval
Parameter
Intercept

Nsibbl

Estimate
270.52710
2
-7.704743
10.237467
-1.104413

nuclear

-3.823929

1.185488

9732.357

-3.226

.001

-6.147731

-1.500126

ses_i

.055918

10022.105

3.269

.001

.073193

.292416

2.510379

8566.070

-5.333

.000

-18.309099

-8.467202

4.042729

502.568

-9.695

.000

-47.138085

-31.252623

Tipo

.182805
13.388150
39.195354
11.727876

3.866161

506.687

3.033

.003

4.132195

19.323557

Area

4.210408

4.328804

489.480

.973

.331

-4.294922

12.715738

Bserv

10.776182

1.946826

547.469

5.535

.000

6.952018

14.600345

.404681
.159381
prog_ebi
4.441248
16.208304
a Dependent Variable: MAth.

514.081

2.539

.011

.091562

.717800

549.555

-3.649

.000

-24.932203

-7.484406

Age
gender

Indigo
gestion

Std. Error

df

Sig.

Lower Bound

Upper Bound

44.334909

547.503

6.102

.000

183.439762

357.614443

.560797

9941.506

-13.739

.000

-8.804018

-6.605467

1.045682

9863.263

-9.790

.000

-12.287217

-8.187717

.259030

9865.989

-4.264

.000

-1.612164

-.596662

Dtotal

Estimates of Covariance Parameters(a)


95% Confidence Interval
Parameter
Residual
Intercept [subject = Variance
idschool]
a Dependent Variable: MAth.

Estimate
2400.2913
25
817.25668
9

Std. Error

Wald Z

Sig.

Lower Bound

Upper Bound

34.869622

68.836

.000

2332.911916

2469.616793

63.490120

12.872

.000

701.828974

951.668455

15

IV. Hierarchical Linear Model Program - HLM


1. Select the datasets for level 1 and level 2

2. Select the variables that will be used for the analysis at each level

16

3. Save the template file (.mdmt) and the mdm file (.mdm) with names that you prefer.

4. Press Make MDM to create the data file with the information of both levels. In the
case of a error, a message will appear on the screen.
5. Press Check Stats and save the descriptive statistics of your dataset at each level
(mean, standard deviation, minimum and maximum).
6. Press Done and you will have the following window

7. Once you have obtained the window with the data at both levels, we could start
running the different multilevel models.
17

8. Running a null model: Do a click on Math and select the option Outcome variable

Thus, we could see that we are running a simple anova with random effects by school

18

Select Run Analysis from the menu and then under option, choose Run the model
shown

Then go to File from your menu options and select View Output and a txt file will
appear with the regression analysis developed by the program
Program:
Authors:
Publisher:

HLM 6 Hierarchical Linear and Nonlinear Modeling


Stephen Raudenbush, Tony Bryk, & Richard Congdon
Scientific Software International, Inc. (c) 2000
techsupport@ssicentral.com
www.ssicentral.com
------------------------------------------------------------------------------Module:
HLM2.EXE (6.04.2754.2)
Date:
30 March 2009, Monday
Time:
3:35:36
------------------------------------------------------------------------------SPECIFICATIONS FOR THIS HLM2 RUN
Problem Title: no title
The data source for this run = C:\seminar\datasets\hlm\sixthgrade.mdm
The command file for this run = whlmtemp.hlm
Output file name
= C:\seminar\datasets\hlm\hlm2.txt
The maximum number of level-1 units = 10327
The maximum number of level-2 units = 564
The maximum number of iterations = 100
Method of estimation: restricted maximum likelihood
Weighting Specification
-----------------------

Level 1
Level 2
Precision

Weighting?
no
no
no

Weight
Variable
Name

The outcome variable is

Normalized?

MATH

The model specified for the fixed effects was:


---------------------------------------------------Level-1
Coefficients
----------------------

Level-2
Predictors
---------------

19

INTRCPT1, B0

INTRCPT2, G00

The model specified for the covariance components was:


--------------------------------------------------------Sigma squared (constant across level-2 units)
Tau dimensions
INTRCPT1
Summary of the model specified (in equation format)
--------------------------------------------------Level-1 Model
Y = B0 + R
Level-2 Model
B0 = G00 + U0
Iterations stopped due to small change in likelihood function
******* ITERATION 5 *******
Sigma_squared =
Tau
INTRCPT1,B0

2489.89329

2422.15990

Tau (as correlations)


INTRCPT1,B0 1.000
---------------------------------------------------Random level-1 coefficient
Reliability estimate
---------------------------------------------------INTRCPT1, B0
0.911
---------------------------------------------------The value of the likelihood function at iteration 5 = -5.579951E+004
The outcome variable is

MATH

Final estimation of fixed effects:


---------------------------------------------------------------------------Standard
Approx.
Fixed Effect
Coefficient
Error
T-ratio
d.f.
P-value
---------------------------------------------------------------------------For
INTRCPT1, B0
INTRCPT2, G00
291.581398
2.171072
134.303
563
0.000
---------------------------------------------------------------------------The outcome variable is

MATH

Final estimation of fixed effects


(with robust standard errors)
---------------------------------------------------------------------------Standard
Approx.
Fixed Effect
Coefficient
Error
T-ratio
d.f.
P-value
---------------------------------------------------------------------------For
INTRCPT1, B0
INTRCPT2, G00
291.581398
2.169134
134.423
563
0.000
---------------------------------------------------------------------------Final estimation of variance components:
----------------------------------------------------------------------------Random Effect
Standard
Variance
df
Chi-square P-value
Deviation
Component
-----------------------------------------------------------------------------

20

INTRCPT1,
U0
49.21544
2422.15990
563
9762.13458
0.000
level-1,
R
49.89883
2489.89329
----------------------------------------------------------------------------Statistics for current covariance components model
-------------------------------------------------Deviance
= 111599.013039
Number of estimated parameters = 2

21

9. Running the model 1 Student variables only

We add the level one variable by clicking on each variable and selecting the option add
variable uncentered. There are other options like: group mean centered and grand mean
centered. However, for this exercise we will run all the variables uncentered.
Program:
Authors:
Publisher:

HLM 6 Hierarchical Linear and Nonlinear Modeling


Stephen Raudenbush, Tony Bryk, & Richard Congdon
Scientific Software International, Inc. (c) 2000
techsupport@ssicentral.com
www.ssicentral.com
------------------------------------------------------------------------------Module:
HLM2.EXE (6.04.2754.2)
Date:
30 March 2009, Monday
Time:
3:38: 7
------------------------------------------------------------------------------SPECIFICATIONS FOR THIS HLM2 RUN
Problem Title: no title
The data source for this run = C:\seminar\datasets\hlm\sixthgrade.mdm
The command file for this run = whlmtemp.hlm
Output file name
= C:\seminar\datasets\hlm\hlm2.txt
The maximum number of level-1 units = 10327
The maximum number of level-2 units = 564
The maximum number of iterations = 100
Method of estimation: restricted maximum likelihood
Weighting Specification
-----------------------

Level 1

Weighting?
no

Weight
Variable
Name

Normalized?

22

Level 2
Precision

no
no

The outcome variable is

MATH

The model specified for the fixed effects was:


---------------------------------------------------Level-1
Coefficients
---------------------INTRCPT1, B0
#
AGE slope, B1
#
GENDER slope, B2
#
NSIBBL slope, B3
#
NUCLEAR slope, B4
#
SES_I slope, B5
#
INDIG slope, B6

Level-2
Predictors
--------------INTRCPT2, G00
INTRCPT2, G10
INTRCPT2, G20
INTRCPT2, G30
INTRCPT2, G40
INTRCPT2, G50
INTRCPT2, G60

'#' - The residual parameter variance for this level-1 coefficient has been set
to zero.
The model specified for the covariance components was:
--------------------------------------------------------Sigma squared (constant across level-2 units)
Tau dimensions
INTRCPT1
Summary of the model specified (in equation format)
--------------------------------------------------Level-1 Model
Y = B0 + B1*(AGE) + B2*(GENDER) + B3*(NSIBBL) + B4*(NUCLEAR) + B5*(SES_I) +
B6*(INDIG) + R
Level-2 Model
B0 = G00 + U0
B1 = G10
B2 = G20
B3 = G30
B4 = G40
B5 = G50
B6 = G60
Run-time deletion has reduced the number of level-1 records to 10084
Iterations stopped due to small change in likelihood function
******* ITERATION 6 *******
Sigma_squared =
Tau
INTRCPT1,B0

2415.14524

1418.53123

Tau (as correlations)


INTRCPT1,B0 1.000
---------------------------------------------------Random level-1 coefficient
Reliability estimate
---------------------------------------------------INTRCPT1, B0
0.865
---------------------------------------------------The value of the likelihood function at iteration 6 = -5.421304E+004

23

The outcome variable is

MATH

Final estimation of fixed effects:


---------------------------------------------------------------------------Standard
Approx.
Fixed Effect
Coefficient
Error
T-ratio
d.f.
P-value
---------------------------------------------------------------------------For
INTRCPT1, B0
INTRCPT2, G00
360.992707
9.142942
39.483
563
0.000
For
AGE slope, B1
INTRCPT2, G10
-8.197324
0.564615
-14.518
10077
0.000
For
GENDER slope, B2
INTRCPT2, G20
-10.047485
1.053240
-9.540
10077
0.000
For
NSIBBL slope, B3
INTRCPT2, G30
-1.418230
0.259937
-5.456
10077
0.000
For NUCLEAR slope, B4
INTRCPT2, G40
-4.110753
1.191849
-3.449
10077
0.001
For
SES_I slope, B5
INTRCPT2, G50
0.442432
0.053298
8.301
10077
0.000
For
INDIG slope, B6
INTRCPT2, G60
-16.400386
2.547400
-6.438
10077
0.000
---------------------------------------------------------------------------The outcome variable is

MATH

Final estimation of fixed effects


(with robust standard errors)
---------------------------------------------------------------------------Standard
Approx.
Fixed Effect
Coefficient
Error
T-ratio
d.f.
P-value
---------------------------------------------------------------------------For
INTRCPT1, B0
INTRCPT2, G00
360.992707
9.682294
37.284
563
0.000
For
AGE slope, B1
INTRCPT2, G10
-8.197324
0.586866
-13.968
10077
0.000
For
GENDER slope, B2
INTRCPT2, G20
-10.047485
1.131407
-8.881
10077
0.000
For
NSIBBL slope, B3
INTRCPT2, G30
-1.418230
0.249420
-5.686
10077
0.000
For NUCLEAR slope, B4
INTRCPT2, G40
-4.110753
1.231744
-3.337
10077
0.001
For
SES_I slope, B5
INTRCPT2, G50
0.442432
0.057503
7.694
10077
0.000
For
INDIG slope, B6
INTRCPT2, G60
-16.400386
2.673658
-6.134
10077
0.000
---------------------------------------------------------------------------Final estimation of variance components:
----------------------------------------------------------------------------Random Effect
Standard
Variance
df
Chi-square P-value
Deviation
Component
----------------------------------------------------------------------------INTRCPT1,
U0
37.66339
1418.53123
563
6158.23620
0.000
level-1,
R
49.14413
2415.14524
----------------------------------------------------------------------------Statistics for current covariance components model
-------------------------------------------------Deviance
= 108426.073337
Number of estimated parameters = 2

24

10. To run the model 2 School variables, we need to click on the level 2 button and
select where we want to add the level two variables: intercept or coefficients; in other
words if we want to add predictors for the intercept or cross-level interactions. In this
case for simplicity we only add predictors to the intercept.

Program:
Authors:
Publisher:

HLM 6 Hierarchical Linear and Nonlinear Modeling


Stephen Raudenbush, Tony Bryk, & Richard Congdon
Scientific Software International, Inc. (c) 2000
techsupport@ssicentral.com
www.ssicentral.com
------------------------------------------------------------------------------Module:
HLM2.EXE (6.04.2754.2)
Date:
30 March 2009, Monday
Time:
3:40:15
------------------------------------------------------------------------------SPECIFICATIONS FOR THIS HLM2 RUN
Problem Title: no title
The data source for this run = C:\seminar\datasets\hlm\sixthgrade.mdm
The command file for this run = whlmtemp.hlm
Output file name
= C:\seminar\datasets\hlm\hlm2.txt
The maximum number of level-1 units = 10327
The maximum number of level-2 units = 564
The maximum number of iterations = 100
Method of estimation: restricted maximum likelihood
Weighting Specification
-----------------------

Level 1

Weighting?
no

Weight
Variable
Name

Normalized?

25

Level 2
Precision

no
no

The outcome variable is

MATH

The model specified for the fixed effects was:


---------------------------------------------------Level-1
Coefficients
---------------------INTRCPT1, B0

#
#
#
#
#
#

AGE
GENDER
NSIBBL
NUCLEAR
SES_I
INDIG

slope,
slope,
slope,
slope,
slope,
slope,

B1
B2
B3
B4
B5
B6

Level-2
Predictors
--------------INTRCPT2, G00
GESTION, G01
TIPO, G02
AREA, G03
PROG_EBI, G04
BSERV, G05
DTOTAL, G06
INTRCPT2, G10
INTRCPT2, G20
INTRCPT2, G30
INTRCPT2, G40
INTRCPT2, G50
INTRCPT2, G60

'#' - The residual parameter variance for this level-1 coefficient has been set
to zero.
The model specified for the covariance components was:
--------------------------------------------------------Sigma squared (constant across level-2 units)
Tau dimensions
INTRCPT1
Summary of the model specified (in equation format)
--------------------------------------------------Level-1 Model
Y = B0 + B1*(AGE) + B2*(GENDER) + B3*(NSIBBL) + B4*(NUCLEAR) + B5*(SES_I) +
B6*(INDIG) + R
Level-2 Model
B0 = G00 + G01*(GESTION) + G02*(TIPO) + G03*(AREA) + G04*(PROG_EBI)
+ G05*(BSERV) + G06*(DTOTAL) + U0
B1 = G10
B2 = G20
B3 = G30
B4 = G40
B5 = G50
B6 = G60
Run-time deletion has reduced the number of level-1 records to 10084
Iterations stopped due to small change in likelihood function
******* ITERATION 6 *******
Sigma_squared =
Tau
INTRCPT1,B0

2400.29539
817.21163

Tau (as correlations)


INTRCPT1,B0 1.000
---------------------------------------------------Random level-1 coefficient
Reliability estimate
----------------------------------------------------

26

INTRCPT1, B0
0.799
---------------------------------------------------The value of the likelihood function at iteration 6 = -5.404159E+004
The outcome variable is

MATH

Final estimation of fixed effects:


---------------------------------------------------------------------------Standard
Approx.
Fixed Effect
Coefficient
Error
T-ratio
d.f.
P-value
---------------------------------------------------------------------------For
INTRCPT1, B0
INTRCPT2, G00
270.525672 44.332588
6.102
557
0.000
GESTION, G01
-39.195039
4.042508
-9.696
557
0.000
TIPO, G02
11.727700
3.865951
3.034
557
0.003
AREA, G03
4.210208
4.328564
0.973
557
0.332
PROG_EBI, G04
-16.207826
4.441016
-3.650
557
0.001
BSERV, G05
10.776193
1.946724
5.536
557
0.000
DTOTAL, G06
0.404681
0.159373
2.539
557
0.012
For
AGE slope, B1
INTRCPT2, G10
-7.704750
0.560796
-13.739
10071
0.000
For
GENDER slope, B2
INTRCPT2, G20
-10.237444
1.045681
-9.790
10071
0.000
For
NSIBBL slope, B3
INTRCPT2, G30
-1.104433
0.259029
-4.264
10071
0.000
For NUCLEAR slope, B4
INTRCPT2, G40
-3.823956
1.185488
-3.226
10071
0.002
For
SES_I slope, B5
INTRCPT2, G50
0.182821
0.055918
3.269
10071
0.001
For
INDIG slope, B6
INTRCPT2, G60
-13.388784
2.510361
-5.333
10071
0.000
---------------------------------------------------------------------------The outcome variable is

MATH

Final estimation of fixed effects


(with robust standard errors)
---------------------------------------------------------------------------Standard
Approx.
Fixed Effect
Coefficient
Error
T-ratio
d.f.
P-value
---------------------------------------------------------------------------For
INTRCPT1, B0
INTRCPT2, G00
270.525672 48.248823
5.607
557
0.000
GESTION, G01
-39.195039
4.046229
-9.687
557
0.000
TIPO, G02
11.727700
3.976412
2.949
557
0.004
AREA, G03
4.210208
4.054383
1.038
557
0.300
PROG_EBI, G04
-16.207826
4.640755
-3.492
557
0.001
BSERV, G05
10.776193
2.026056
5.319
557
0.000
DTOTAL, G06
0.404681
0.174626
2.317
557
0.021
For
AGE slope, B1
INTRCPT2, G10
-7.704750
0.585244
-13.165
10071
0.000
For
GENDER slope, B2
INTRCPT2, G20
-10.237444
1.119275
-9.146
10071
0.000
For
NSIBBL slope, B3
INTRCPT2, G30
-1.104433
0.245914
-4.491
10071
0.000
For NUCLEAR slope, B4
INTRCPT2, G40
-3.823956
1.219071
-3.137
10071
0.002
For
SES_I slope, B5
INTRCPT2, G50
0.182821
0.056285
3.248
10071
0.002
For
INDIG slope, B6
INTRCPT2, G60
-13.388784
2.500433
-5.355
10071
0.000
---------------------------------------------------------------------------Final estimation of variance components:
----------------------------------------------------------------------------Random Effect
Standard
Variance
df
Chi-square P-value
Deviation
Component
----------------------------------------------------------------------------INTRCPT1,
U0
28.58691
817.21163
557
3813.89027
0.000

27

level-1,
R
48.99281
2400.29539
----------------------------------------------------------------------------Statistics for current covariance components model
-------------------------------------------------Deviance
= 108083.173301
Number of estimated parameters = 2

11. If we want to have random coefficients, we need to click next to the coefficient for
each variable at Level 2 Model, and then add a random component for this variable
across schools.

In this case, we are adding a random component for the coefficient of age; this means that
this coefficient varies randomly across schools.

28

V. Multilevel in SAS
The command to run a multilevel model in SAS is PROC MIXED and the syntax is:
PROC MIXED DATA = [name of the data] METHOD = REML COVTEST;
CLASS [level 2 id];
MODEL [dependent variable] = [independent variables] / SOLUTION;
RANDOM [random variables] / SUBJECT=[level 2 id]
RUN;
COVTEST
METHOD
CLASS
MODEL
SOLUTION
RANDOM
SUBJECT
RUN

: option calls the significance test of random components (Wald test).


: method that will be used to estimate the model.
: indicates which variable is categorical.
: to define the model that will be estimated.
: requires to SAS to print the results of the estimation in the output
: indicates the component or variables that will be random in the model
: indicates the grouping variable in the model
: to run the program.

*********************************************a two level model


PROC MIXED DATA = [name of the data] METHOD = REML COVTEST;
CLASS [level 2 id];
MODEL [dependent variable] = [independent variables] / SOLUTION;
RANDOM [random variables] / SUBJECT=[level 2 id];
RUN;
*********************************************a three level model
PROC MIXED DATA = [name of the data] METHOD = REML COVTEST;
CLASS [level 2 id and level 3 id];
MODEL [dependent variable] = [independent variables] / SOLUTION;
RANDOM [random variables] / TYPE=UN SUBJECT=[level 3 id];
RANDOM [random variables] / TYPE=UN SUBJECT=[level 2 id([level 3 id])];
RUN;
Note: TYPE = UN indicates that will be estimated the unstructured variance for the
random effects.
*********************************************a two level model with random
coefficients
PROC MIXED DATA = [name of the data] METHOD = REML COVTEST;
CLASS [level 2 id];
MODEL [dependent variable] = [independent variables] / SOLUTION;
RANDOM [random variables] / TYPE=UN SUBJECT=[level 2 id];

29

RUN;
*Example*
*********
*Model Null
PROC MIXED DATA = achievement METHOD = REML COVTEST;
CLASS idschool;
MODEL math = / SOLUTION;
RANDOM intercept / SUBJECT=idschool;
RUN;
*Model 1 Student variables
PROC MIXED DATA = achievement METHOD = REML COVTEST;
CLASS idschool;
MODEL math = age gender nsibbl nuclear ses_i indig / SOLUTION;
RANDOM intercept / SUBJECT=idschool;
RUN;
*Model 2 School variables
PROC MIXED DATA = achievement METHOD = REML COVTEST;
CLASS idschool;
MODEL math = age gender nsibbl nuclear ses_i indig gestion tipo area bserv dtotal
prog_ebi / SOLUTION;
RANDOM intercept / SUBJECT=idschool;
RUN;

30

VI. Summary
In sum, the three programs give us the same results in terms of significance, coefficients and standard errors estimated, as we could
see in the following table:
Table 1. Math Achievement in Math for Sixth Grade in Peru (2004)
Age (years)
Female
Number of siblings
Nuclear family
Socioeconomic status
Indigenous
Intercept
Level 1 variance
Level 2 variance

GLLAMM

se ( )
-8.20
0.57
-10.05
1.05
-1.42
0.26
-4.11
1.19
0.44
0.06
-16.41
2.60
360.94
9.28
2414
1413

63
37

***
***
***
**
***
***
***
***
***

XTMIXED

se ( )
-8.20
0.56
-10.05
1.05
-1.42
0.26
-4.11
1.19
0.44
0.05
-16.40
2.55
360.99
9.14
2415
1418

63
37

***
***
***
**
***
***
***

-8.20
-10.05
-1.42
-4.11
0.44
-16.40
360.99

***
***

2415
1418

MIXED
se ( )
0.56
1.05
0.26
1.19
0.05
2.55
9.14
63
37

***
***
***
**
***
***
***

-8.20
-10.05
-1.42
-4.11
0.44
-16.40
360.99

***
***

2415
1419

HLM
se ( )
0.59
1.13
0.25
1.23
0.06
2.67
9.68
63
37

***
***
***
**
***
***
***
***
***

***p<.001, **p<.01, *p<.05, +p<.10

The main difference between the different programs relies on the type of dependent variable that we want to model. Thus, in the case
of HLM and GLLAMM, it is possible to model two level or three level models with dependent variables with discrete or binary
outcomes; while the XTMIXED and MIXED in STATA and SPSS respectively could be employed with continuous variables. Also,
one of the disadvantages of STATA is that XTMIXED and GLLAMM could not be estimated in combination with sampling weights,
while SPSS and HLM allow the user to use sampling weights. Also, in the case of HLM, one advantage is that it is possible to use
sampling weights at each level.

31

Useful links for multilevel modeling


UCLA: Statistical Computing web page
http://www.ats.ucla.edu/
Urbana Champaign Illinois: Multilevel Analysis Seminar
http://www.ed.uiuc.edu/courses/EdPsy490CK/
University of Texas at Austin: Division of Statistics + Scientific computation
http://ssc.utexas.edu/
Indiana University: Stat/Math Center
http://www.indiana.edu/~statmath/contact/consult.html

32

You might also like