Professional Documents
Culture Documents
27 October 2010
Lecture 14:
Random Number Generation
Presented by:
Paul Wileyto, Ph.D.
Anyone who uses software to produce random
numbers is in a state of sin.
John von Neumann
A good analog
random number
generator.
Why do we need Random
Numbers?
Simulation Input
Statistical Sampling
Assignment in Trials
Games
Where do you get random numbers?
Published Tables
Make Them
Computer Algorithms
Long period
Fast
Good pseudo-random numbers:
'
< >
+
e desired range.
Mod in SAS
proc iml; /* begin IML session */
q={20,30,40,50,70,90,160};
t=mod(q,7);
qt=q||t;
print qt; /* print matrix */
quit;
qt
20 6
30 2
40 5
50 1
70 0
90 6
160 6
SAS
Linear Congruential Generator (LCG)
-1
Most common
( ) mod
= seed, modulus m (large prime),
muliplier , and increment c
Repeats due to the modular
arithmetic that forces wrapping
of values into th
n n
o
X X c m
X
+
e desired range.
Mod in R
q<-matrix(seq(10,100, by=10),10,1)
qm=q%%13
qall<-cbind(q,qm)
qall
[,1] [,2]
[1,] 10 10
[2,] 20 7
[3,] 30 4
[4,] 40 1
[5,] 50 11
[6,] 60 8
[7,] 70 5
[8,] 80 2
[9,] 90 12
[10,] 100 9
R
proc iml; /* begin IML session */
seed = 123456;
c = j(5,1,seed);
b = uniform(c);
print b;
quit;
b
0.73902
0.2724794
0.7095326
0.3191636
0.367853
Unit Random Variates in SAS
SAS
RNGkind()
[1] "Mersenne-Twister" "Inversion"
set.seed(as.integer(format(Sys.time(), "%S%M%H")))
c<-matrix(runif(5),5,1)
c
[,1]
[1,] 0.9919911
[2,] 0.2598466
[3,] 0.1818524
[4,] 0.3357782
[5,] 0.2754353
Unit Random Variates in R
R
proc iml; /* begin IML session */
seed = 0;
c = j(5,1,seed);
b = uniform(c);
print b;
quit;
b
0.73902
0.2724794
0.7095326
0.3191636
0.367853
Unit Random Variates in SAS
Set seed to 0 to
grab a seed value
from the system
clock.
SAS
RANUNI() and IML UNIFORM() use a multiplicative
linear congruential generator (from SAS docs) where
SEED = mod( SEED * 397204094, 2**31-1 )
and then returns
SEED / (2**31-1)
SAS
Testing Randomness
Is it Uniform?
0 0.2 0.4 0.6 0.8 1
0
50
100
150
200
250
300
0 0.2 0.4 0.6 0.8 1
0
0.5
1
1.5
2
2.5
3
x 10
4
Testing Randomness
Might see
correlation in higher
dimensions
Plot X
i
versus X
i+k
for
serial correlation
0
.
2
.
4
.
6
.
8
1
x
0 .2 .4 .6 .8 1
y
0
.
2
.
4
.
6
.
8
1
x
1
0 .2 .4 .6 .8 1
y2
Linear Congruential Generator
The good
Fast
The Bad
Sequential correlation
a = 16,807, c = 0, M = 2,147,483,647
Overflow Method for integers
Divide by 2
32
to get floating point values
between 0 and 1.
Very Fast
1 j j
I aI c
+
+
Blum, Blum, Shub
Very slow
Cryptographically secure
( )
2
1
mod , ,
where p and q are large primes
n n
X X M M pq
+
Mersenne Twister
0 0 0 0 1 1 1 1
Mersenne Twister
XOR
XOR
0 0 0 1 1 1 1 1
XOR
Mersenne Twister
0 0 0 0 1 1 1 1
XOR
To
next
word
From
last
word
Mersenne
Twister
From:
John Savards
Cryptology
Page
http://www.quadibloc.com
Mersenne Twister
Fast
Rejection Methods
< <
What this means is:
proc iml; /* begin IML session */
u = j(1000,1,.);
call randgen(u,'uniform');
exrand=-log(1-u)/.04;
tbl=u||exrand;
print tbl;
varnames={"u","erand"};
create erand from tbl [colname=varnames];
append from tbl;
quit;
proc means data=erand;
var u erand;
run;
title 'Analysis Exponential RVs';
proc univariate data=erand noprint;
histogram erand / midpoints=5 to 205 by 10 exp;
run;
tbl
0.115794 3.0766303
0.754043 35.064963
0.157732 4.2914261
0.0431113 1.1017045
0.1086405 2.8751851
0.2632565 7.6378865
0.9448316 72.434124
0.3589581 11.116512
0.7109185 31.026164
0.4665676 15.710572
The MEANS Procedure
Variable N Mean Std DevMinimum
Maximu
m
u 1000 0.4819 0.2877 0.0010 0.9996
erand 1000 23.4993 23.7694 0.0240
194.533
0
>r=matrix(runif(10000), 10000,1)
>exrand=-log(1-r)/.04
>hist(exrand, freq = FALSE)
> mean(exrand)
[1] 24.55222
> 1/mean(exrand)
[1] 0.04072951
> hist(exrand, freq = FALSE)> help.search("means")
R
Histogram of exrand
exrand
D
e
n
s
i
t
y
0 50 100 150 200 250
0
.
0
0
0
0
.
0
1
0
0
.
0
2
0
R
( )
( )
1
Survival Time: ( ) exp
ln
Inverse Prob Transform:
1.5, 0.001
S U t
U
t
1
]
Weibull Survival
SAS
proc iml; /* begin IML session */
u = j(2000,1,.);
call randgen(u,'uniform');
wrand=(-log(1-u)/.001)##(1/1.5);
tbl=u||wrand;
print tbl;
varnames={"u","weibrand"};
create wrand from tbl [colname=varnames];
append from tbl;
Quit;
proc means data=wrand;
var u weibrand;
run;
title 'Analysis of Weibull RVs';
proc univariate data=wrand noprint;
histogram weibrand / midpoints=5 to 205 by 10 weibull;
run;
SAS
SAS
( )
( )
1
Survival Time: ( ) exp
ln
Inverse Prob Transform:
1.5, 0.001
S U t
U
t
1
]
Weibull Survival
R
> r=matrix(runif(10000), 10000,1)
> wrand=(-log(1-r)/.001)^(1/1.5)
> hist(wrand, freq = FALSE, main = paste("Histogram of
Survival Times"), breaks=50, xlab = "Survival Time")
R
R
proc lifetest data=Work.Wrand method=pl OUTSURV=work._surv;
time WEIBRAND * CENS (0);
run; quit;
goptions reset=all device=WIN;
data work._surv; set work._surv;
if survival > 0 then _lsurv = -log(survival);
if _lsurv > 0 then _llsurv = log(_lsurv);
run;
** Survival plots **;
goptions reset=symbol;
goptions ftext=SWISS ctext=BLACK htext=1 cells;
proc gplot data=work._surv ;
label weibrand = 'Survival Time';
axis2 minor=none major=(number=6)
label=(angle=90 'Survival Distribution Function');
symbol1 i=stepj c=BLUE l=1 width=1;
plot survival * weibrand=1 /
description="SDF of weibrand"
frame cframe=CXF7E1C2 caxis=BLACK vaxis=axis2 hminor=0 name='SDF';
run;
symbol1 i=join c=BLUE l=1 width=1;
quit;
goptions ftext= ctext= htext= reset=symbol;
SAS
SAS
> r=matrix(runif(1000), 1000,1)
> wrand=(-log(1-r)/.001)^(1/1.5)
> event=wrand<=200
> wrand2=wrand*(event)+200*(1-event)
> fit <- survfit(Surv(wrand2, event) ~ 1, data = aml)
> plot(fit, lty = 2:3,xlab = "Days", ylab="Survival")
>
R
R
( )
( )
1
0
1
Survival Time: ( ) exp , exp( )
ln
Inverse Prob Transform:
1.5, exp( )
ln(0.001) 2.30
( ) 0.5, 0.69
S P t
P
t
HR Drug
1
]
x
x
Simulating Weibull Regression Data, with
Proportional Hazards
SAS
proc iml; /* begin IML session */
u = j(400,1,.);
d = j(200,1,0) // j(200,1,1);
call randgen(u,'uniform');
wrand=(-log(1-u)/exp(log(.001)-0.69*d))##(1/1.5);
c = wrand<=200;
wrand=wrand##c + 200*(1-c);
tbl=u || wrand || d || c ;
print tbl;
varnames={"u","weibrand","treat", "cens"};
create wrand from tbl [colname=varnames];
append from tbl;
quit;
Data, drug treatment, 0,1)
Constant Value, drug effect * drug
SAS
options pageno=1;
proc lifetest data=Work.Wrand method=pl OUTSURV=work._surv;
time WEIBRAND * CENS (0); strata TREAT;
run; quit;
goptions reset=all device=WIN;
data work._surv; set work._surv;
if survival > 0 then _lsurv = -log(survival);
if _lsurv > 0 then _llsurv = log(_lsurv);
run;
** Survival plots **;
title;
footnote;
goptions reset=symbol;
goptions ftext=SWISS ctext=BLACK htext=1 cells;
proc gplot data=work._surv ;
label weibrand = 'Survival Time';
axis2 minor=none major=(number=6)
label=(angle=90 'Survival Distribution Function');
symbol1 i=stepj l=1 width=1; symbol2 i=stepj l=2 width=1; symbol3 i=stepj l=3 width=1;
plot survival * weibrand = treat /
description="SDF of weibrand by treat"
frame cframe=CXF7E1C2 caxis=BLACK vaxis=axis2 hminor=0 name='SDF';
run;
symbol1 i=join l=1 width=1; symbol2 i=join l=2 width=1; symbol3 i=join l=3 width=1;
quit;
SAS
SAS
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 202.5356 1 <.0001
Score 209.0491 1 <.0001
Wald 201.2807 1 <.0001
Analysis of Maximum Likelihood Estimates
Parameter Standard Hazard
Parameter DF Estimate Error Chi-Square Pr > ChiSq Ratio
treat 1 -0.70413 0.04963 201.2807 <.0001 0.495
*** Proportional Hazards Models *** ;
options pageno=1;
proc phreg data=Work.Wrand;
model WEIBRAND * CENS (0) = TREAT;
run; quit;
SAS
( )
( )
1
0
1
Survival Time: ( ) exp , exp( )
ln
Inverse Prob Transform:
1.5, exp( )
ln(0.001) 2.30
( ) 0.5, 0.69
S P t
P
t
HR Drug
1
]
x
x
Simulating Weibull Regression Data, with
Proportional Hazards
R
r=matrix(runif(400), 400,1)
drug=rbind(matrix(1,200,1),matrix(0,200,1))
wrand=(-log(1-r)/exp(log(.001)-0.69*drug))^(1/1.5)
event = wrand<=200;
wrand=wrand*event + 200*(1-event)
survreg(Surv(wrand, event) ~ drug, dist='weibull', model=TRUE, scale=1)
Call:
survreg(formula = Surv(wrand, event) ~ drug, dist = "weibull",
scale = 1, model = TRUE)
Coefficients:
(Intercept) drug
4.6042014 0.5978962
Scale fixed at 1
Loglik(model)= -1918.1 Loglik(intercept only)= -1932.6
Chisq= 29.06 on 1 degrees of freedom, p= 7e-08
n= 400
lsurv2 <- survfit(Surv(wrand, event) ~ drug, aml, type='fleming')
plot(lsurv2, lty=2:3,xlab = "Days", ylab="Survival")
Data, drug treatment (0,1)
Constant Value
Drug effect * drug
R
Package survival
R
Call:
phreg(formula = Surv(enter, wrand, event) ~ drug)
Covariate W.mean Coef Exp(Coef) se(Coef) Wald p
drug 0.586 -0.731 0.481 0.113 0.000
log(scale) 4.663 105.903 0.050 0.000
log(shape) 0.402 1.495 0.047 0.000
Events 327
Total time at risk 44359
Max. log. likelihood -1886.9
LR test statistic 42.4
Degrees of freedom 1
Overall p-value 7.38224e-11
> enter=matrix(0,400,1)
> fit <- phreg(Surv(enter, wrand, event) ~ drug)
> fit
> plot.phreg(fn="sur)
R
Package eha
R
Generating Numbers from
Specific Distributions
Rejection Method
Fast
Rejection
Choose x
U
based on inverse transform of the
integrated dominance function (F(x)).
Choose a uniform random number U1 in the range:
Calculate x by setting F(x)=U, and solving (the quadratic) for x.
g(x)
f(x)
x
U
2
0 1
2
a
U
b
Rejection
Reject if U2 >f(x)
g(x)
f(x)
x
U
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
0
0.05
0.1
0.15
0.2
0.25
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
0
0.05
0.1
0.15
0.2
0.25
315 In, 685 Rejected
Weibull Function?
( )
( )
( ) ( )
1
1
1
( )
( ) exp 2
( ) 1 exp
ln , 6.5, 1.8
u
x
x
f x times
x
F x
F x u
1
1
]
1
1
]
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
0
0.05
0.1
0.15
0.2
0.25
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
0
0.05
0.1
0.15
0.2
0.25
574 In, 426 Rejected
Binomial Distribution
(Bernoulli Trials are the simplest
example of the rejection method.)
Probability Pr(X=1): p
>>proc iml; /* begin IML session */
r = j(10,1,.);
call randgen(r,'uniform');
b=r>0.5;
print b;
quit;
b
1
0
1
1
0
0
0
0
0
1
SAS
> r=matrix(runif(10), 10,1)
> b=r<=0.5
> cbind(r,b)
[,1] [,2]
[1,] 0.4919652 1
[2,] 0.5088624 0
[3,] 0.5955355 0
[4,] 0.5243394 0
[5,] 0.5923056 0
[6,] 0.1610980 1
[7,] 0.9663659 0
[8,] 0.2548106 1
[9,] 0.4582953 1
[10,] 0.1170421 1
>
R
Simulating Outcomes from a
Logistic Model
+
+
+ +
proc iml; /* begin IML session */
u = j(400,1,.);
d = j(400,1,1)||(j(200,1,0) // j(200,1,1));
bta= {-1.0986 , 0.6931};
call randgen(u,'uniform');
expit=exp(d*bta)/(1+exp(d*bta));
outcome=u<=expit;
tbl=u || d || expit || outcome ;
varnames={"u","const","treat", "expit","outcome"};
create erand from tbl [colname=varnames];
append from tbl;
quit;
proc logistic data=Work.Erand DESCEND;
model OUTCOME = TREAT;
run;
SAS
1
1
0
0
1
1
d:
1
2
bta
bta
1
1
]
SAS
The LOGISTIC Procedure
Analysis of Maximum Likelihood Estimates
Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 -1.2367 0.1693 53.3383 <.0001
treat 1 0.5510 0.2261 5.9402 0.0148
Odds Ratio Estimates
Point 95% Wald
Effect Estimate Confidence Limits
treat 1.735 1.114 2.702
r=matrix(runif(400), 400,1)
drug=rbind(matrix(0,200,1),matrix(1,200,1))
d=cbind(matrix(1,400,1), drug)
parms=matrix(c(-1.0986 , 0.6931),2,1)
expit=exp(d%*%parms)/(1+exp(d%*%parms))
outcome=r<=expit
R
1
1
0
0
1
1
d:
1
2
parm
parm
1
1
]
R
> drugtrial<-glm(outcome~drug, family = binomial(link="logit"))
> summary(drugtrial)
Call:
glm(formula = outcome ~ drug, family = binomial(link = "logit"))
Deviance Residuals:
Min 1Q Median 3Q Max
-1.036 -1.036 -0.776 1.326 1.641
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.0460 0.1612 -6.488 8.68e-11 ***
drug 0.7026 0.2158 3.255 0.00113 **
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 511.49 on 399 degrees of freedom
Residual deviance: 500.67 on 398 degrees of freedom
AIC: 504.67
Number of Fisher Scoring iterations: 4
Normally Distributed Random Numbers
Box-Muller Transform
Box-Muller Transform
X
1
, X
2
, specify a position within the unit circle
Random angle, random radius
Marsaglia Method
Standardize
Recall that Var(u)=1/12
12
1
6
i
i
X u
Mixture distributions
Generating Multivariate Normal
Random Numbers
Desired Covariance Matrix
, is the Cholesky Decomposition of
Begin with independent standard normal RVs (0,1)
Correlated (Multivariate) Normal RVs:
N
+
V
V = R'R R V
Z
X = R'Z
:
Generating Multivariate Normal
Random Numbers
proc iml; /* begin IML session */
rmat={1 .3 .2 .1, .3 1 .3 .2, .2 .3 1 .3 , .1 .2 .3 1};
sigvec={53 36 12 47};
cvmat=rmat#(sigvec`*sigvec);
upr=half(cvmat);
print rmat;
print sigvec;
print cvmat;
print upr;
r1 = j(1000,4,.);
r2 = j(1000,4,.);
call randgen(r1,'uniform');
call randgen(r2,'uniform');
pi= 4*atan(1);
print pi;
/* Lets be wasteful */
z1=sqrt(-2*log(r1))#cos(2*pi*r2);
z1=z1*upr;
varnames={"x1","x2","x3","x4"};
create nrand from z1 [colname=varnames];
append from z1;
quit;
proc corr data=work.nrand pearson;
var x1 x2 x3 x4;
run;
SAS
rmat
1 0.3 0.2 0.1
0.3 1 0.3 0.2
0.2 0.3 1 0.3
0.1 0.2 0.3 1
sigvec
53 36 12 47
cvmat
2809 572.4 127.2 249.1
572.4 1296 129.6 338.4
127.2 129.6 144 169.2
249.1 338.4 169.2 2209
upr
53 10.8 2.4 4.7
0 34.341811 3.0190603 8.3757958
0 0 11.36333 11.672016
0 0 0 44.503035
SAS
The CORR Procedure
4 Variables: x1 x2 x3 x4
Simple Statistics
Variable N Mean Std Dev Sum Minimum Maximum
x1 1000 -0.86090 51.78291 -860.89502 -167.70178 157.51299
x2 1000 -0.21592 36.41244 -215.92386 -122.58068 120.05335
x3 1000 -0.06176 11.60953 -61.75755 -37.09589 43.83908
x4 1000 0.46483 46.63351 464.82762 -152.65527 143.41509
Pearson Correlation Coefficients, N = 1000
Prob > |r| under H0: Rho=0
x1 x2 x3 x4
x1 1.00000 0.30338 0.20341 0.11397
<.0001 <.0001 0.0003
x2 0.30338 1.00000 0.28186 0.24150
<.0001 <.0001 <.0001
x3 0.20341 0.28186 1.00000 0.34421
<.0001 <.0001 <.0001
x4 0.11397 0.24150 0.34421 1.00000
0.0003 <.0001 <.0001
SAS
Generating Multivariate Normal
Random Numbers
cmat<-rbind(c(1, .3, .2, .1), c(.3, 1, .3, .2), c(.2, .3, 1, .3) , c(.1, .2, .3, 1))
sigvec=c(53, 36, 12, 47)
vv=cmat*(sigvec%*%t(sigvec))
rr=chol(vv)
r1=matrix(runif(1000), 250,4)
r2=matrix(runif(1000), 250,4)
z1=sqrt(-2*log(r1))*cos(2*pi*r2)
rvs=z1%*%rr
R
> cmat
[,1] [,2] [,3] [,4]
[1,] 1.0 0.3 0.2 0.1
[2,] 0.3 1.0 0.3 0.2
[3,] 0.2 0.3 1.0 0.3
[4,] 0.1 0.2 0.3 1.0
> vv
[,1] [,2] [,3] [,4]
[1,] 2809.0 572.4 127.2 249.1
[2,] 572.4 1296.0 129.6 338.4
[3,] 127.2 129.6 144.0 169.2
[4,] 249.1 338.4 169.2 2209.0
R
> rr
[,1] [,2] [,3] [,4]
[1,] 53 10.80000 2.400000 4.700000
[2,] 0 34.34181 3.019060 8.375796
[3,] 0 0.00000 11.363330 11.672016
[4,] 0 0.00000 0.000000 44.503035
> cov(rvs)
[,1] [,2] [,3] [,4]
[1,] 2832.4200 561.2585 134.0656 533.7351
[2,] 561.2585 1235.7616 124.2373 382.5441
[3,] 134.0656 124.2373 127.4132 160.2173
[4,] 533.7351 382.5441 160.2173 2205.5903
> cor(rvs)
[,1] [,2] [,3] [,4]
[1,] 1.0000000 0.2999969 0.2231676 0.2135426
[2,] 0.2999969 1.0000000 0.3130961 0.2317137
[3,] 0.2231676 0.3130961 1.0000000 0.3022317
[4,] 0.2135426 0.2317137 0.3022317 1.0000000
> sd(rvs)
[1] 53.22048 35.15340 11.28774 46.96371
Subject-specific Random Effects
We have an error term (e
ij
) for
measurement j in subject i.
+
+ + +
+ + +
:
( )
i
K +
proc iml; /* begin IML session */
u = j(600,1,.);
d1=j(100,1,0)//j(100,1,1);
d1=d1//d1//d1;
id=j(200,1,0);
do i=1 to 200 by 1;
id[i,1]=i;
end;
id=id//id//id;
t=j(200,1,0)//j(200,1,1)//j(200,1,2);
k=j(200,1,.);
call randgen(k,'normal');
k=k//k//k;
bta= {-1.0986 , 0.6931,.4055,1};
d = j(600,1,1)||d1||t||k;
call randgen(u,'uniform');
expit=exp(d*bta)/(1+exp(d*bta));
y=u<=expit;
tbl=id||u || d || expit || y ;
varnames={"id","u","const","treat","t","k", "expit","outcome"};
create erand from tbl [colname=varnames];
append from tbl;
quit;
1
0
1
d:
1
2
1
bta
bta
1
1
1
1
]
0
1
0
1
ID
id:
ID
ID
k
k
k
SAS
. xtlogit outcome treat t, i(id)
Random-effects logistic regression Number of obs = 600
Group variable: id Number of groups = 200
Random effects u_i ~ Gaussian Obs per group: min = 3
avg = 3.0
max = 3
Wald chi2(2) = 11.07
Log likelihood = -394.754 Prob > chi2 = 0.0040
------------------------------------------------------------------------------
outcome | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
treat | .6023501 .2279107 2.64 0.008 .1556533 1.049047
t | .235297 .1123071 2.10 0.036 .015179 .4554149
_cons | -.9334699 .2040373 -4.57 0.000 -1.333376 -.5335642
-------------+----------------------------------------------------------------
/lnsig2u | -.1281394 .3971684 -.9065751 .6502964
-------------+----------------------------------------------------------------
sigma_u | .9379396 .18626 .6355353 1.384236
rho | .2109869 .0661172 .1093476 .3680594
------------------------------------------------------------------------------
Likelihood-ratio test of rho=0: chibar2(01) = 13.01 Prob >= chibar2 = 0.000
. di -394.754*(-2)
789.508
Stata, SAS data
Finally got this to run in SAS. I had forgotten that SAS requires you to sort.
Stata does not require sorted data for their mixed models.
proc sort data=erand;
by id t;
run;
proc nlmixed data=erand qpoints=5 ;
parms b0=0 b1=-.7 b2=.6 sig=0 ;
theta2 = b0+b1*treat+b2*t+u;
prb= exp(theta2)/(1+exp(theta2));
model outcome ~ binary(prb);
random u ~normal(0,sig) subject=id ;
run;
SAS
The NLMIXED Procedure
Fit Statistics
-2 Log Likelihood 789.5
AIC (smaller is better) 797.5
AICC (smaller is better) 797.6
BIC (smaller is better) 810.7
Parameter Estimates
Standard
Parameter Estimate Error DF t Value Pr > |t| Alpha Lower Upper Gradient
b0 -0.9332 0.2039 199 -4.58 <.0001 0.05 -1.3354 -0.5310 0.00039
b1 0.6021 0.2278 199 2.64 0.0089 0.05 0.1529 1.0513 -0.00007
b2 0.2353 0.1123 199 2.10 0.0374 0.05 0.01382 0.4567 0.000737
sig 0.8781 0.3480 199 2.52 0.0124 0.05 0.1917 1.5644 0.000363
We see tiny differences between this and Stata results, owing to differences in optimization
specs.
SAS
id=matrix(seq(1:200), 200,1)
k1=matrix(runif(200), 200,1)
k2=matrix(runif(200), 200,1)
k=sqrt(-2*log(k1))*cos(2*pi*k2)
id=rbind(id,id,id)
k=rbind(k,k,k)
drug =rbind(matrix(0,100,1), matrix(1,100,1))
drug=rbind(drug,drug,drug)
d=cbind(matrix(1,600,1),drug,k)
parms=matrix(c(-1.0986 , 0.6931,1),3,1)
expit=exp(d%*%parms)/(1+exp(d%*%parms))
outcome=r<=expit
1
0
1
d:
1
2
1
bta
bta
1
1
1
1
]
0
1
0
1
ID
id:
ID
ID
k
k
k
R
. xtlogit outcome drug, i(id)
Random-effects logistic regression Number of obs = 600
Group variable: id Number of groups = 200
Random effects u_i ~ Gaussian Obs per group: min = 3
avg = 3.0
max = 3
Wald chi2(1) = 14.55
Log likelihood = -368.112 Prob > chi2 = 0.0001
------------------------------------------------------------------------------
outcome | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
drug | .8331252 .2183881 3.81 0.000 .4050924 1.261158
_cons | -1.240213 .1708219 -7.26 0.000 -1.575018 -.9054085
-------------+----------------------------------------------------------------
/lnsig2u | -.6325958 .5624138 -1.734907 .469715
-------------+----------------------------------------------------------------
sigma_u | .7288423 .2049555 .4200198 1.264729
rho | .1390212 .0673177 .050895 .3271436
------------------------------------------------------------------------------
Likelihood-ratio test of rho=0: chibar2(01) = 5.08 Prob >= chibar2 = 0.012
R
Got this to run in R, using a mixed effects package called Zelig.
z.out1 <- zelig(outcome ~ drug + tag(1 | id),data=NULL, model="logit.mixed")
Delia Bailey and Ferdinand Alimadhi. 2007. "logit.mixed: Mixed effects logistic
model" in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical
Software," http://gking.harvard.edu/zelig
summary(z.out1)
Generalized linear mixed model fit by the Laplace approximation
Formula: outcome ~ drug + tag(1 | id)
AIC BIC logLik deviance
743.4 756.6 -368.7 737.4
Random effects:
Groups Name Variance Std.Dev.
id (Intercept) 0.39486 0.62838
Number of obs: 600, groups: id, 200
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.2172 0.1511 -8.054 8.02e-16 ***
drug 0.8174 0.2023 4.041 5.32e-05 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Correlation of Fixed Effects:
(Intr)
drug -0.747
R
General Approach to Correlated
Multivariate Random Numbers
Copulas
+
x
Simulating Weibull Regression Data, with
Time-Dependency in Drug Effect
( )
*
0
1 0
0
( ) exp exp(-2.30 0.3* - 0.004* * )
( ) 0
( ) ( )
( )
( )
( )
f t drug drug t t P
f t
f t d f t
f t
d
f t
t t
f t
clear
set obs 400
gen drug=_n>200
gen double P=uniform()
gen double t=1
gen double tpd=t+.0001
gen double f=exp(-exp(-2.30+0.3*drug-0.004*drug*t)*(t^0.67))-P
gen double fp=exp(-exp(-2.30+0.3*drug-0.004*drug*tpd)*(tpd^0.67))-P
gen double slope=(fp-f)/0.0001
forvalues i=1/50 {
qui replace f=exp(-exp(-2.30+0.3*drug-0.004*drug*t)*(t^0.67))-P
qui replace fp=exp(-exp(-2.30+0.3*drug-0.004*drug*tpd)*(tpd^0.67))-P
qui replace slope=(fp-f)/0.0001
qui replace t=t-f/slope
qui replace tpd=t+.0001
}
(Stata)
Matlab
>> drug=[zeros(1000,1);ones(1000,1)];
>> P=rand(2000,1);
>> cdf0=exp(-1.0986+0.6931*drug)./(1+exp(-1.0986+0.6931*drug));
>> outcome=P<=cdf0;
>> b = glmfit(drug,outcome,'binomial')
b =
-1.0616
0.6562
Matlab
0
.
0
0
0
.
2
5
0
.
5
0
0
.
7
5
1
.
0
0
0 50 100 150 200
analysis time
drug = 0 drug = 1
Kaplan-Meier survival estimates
(Stata)
. gen P=uniform()
. gen cdf0=exp(-1.0986+0.6931*drug)/(1+exp(-
1.0986+0.6931*drug))
. list in 1/10
+----------------------------+
| drug P cdf0 |
|----------------------------|
1. | 0 .2865897 .2500023 |
2. | 0 .3788754 .2500023 |
3. | 1 .3597057 .3999916 |
4. | 1 .7182508 .3999916 |
5. | 1 .4315197 .3999916 |
|----------------------------|
6. | 1 .2963237 .3999916 |
7. | 1 .7961193 .3999916 |
8. | 0 .056983 .2500023 |
9. | 0 .4622037 .2500023 |
10. | 0 .5336403 .2500023 |
+----------------------------+
(Stata)
. gen outcome=P<=cdf0
. list in 1/10
+--------------------------------------+
| drug P cdf0 outcome |
|--------------------------------------|
1. | 0 .2865897 .2500023 0 |
2. | 0 .3788754 .2500023 0 |
3. | 1 .3597057 .3999916 1 |
4. | 1 .7182508 .3999916 0 |
5. | 1 .4315197 .3999916 0 |
|--------------------------------------|
6. | 1 .2963237 .3999916 1 |
7. | 1 .7961193 .3999916 0 |
8. | 0 .056983 .2500023 1 |
9. | 0 .4622037 .2500023 0 |
10. | 0 .5336403 .2500023 0 |
+--------------------------------------+
(Stata)
. gen outcome=P<=cdf0
. logistic outcome drug
Logistic regression Number of obs = 2000
LR chi2(1) = 58.65
Prob > chi2 = 0.0000
Log likelihood = -1245.3138 Pseudo R2 = 0.0230
--------------------------------------------------------------------
outcome | OR SE z P>|z| [95% CI]
-------------+------------------------------------------------------
drug | 2.084 0.202 7.57 0.000 1.723 2.519
--------------------------------------------------------------------
_cons | -1.077 0.073 -14.83 0.000 -1.220 -0.935
--------------------------------------------------------------------
.
(Stata)