You are on page 1of 140

BSTA 670 Statistical Computing

27 October 2010
Lecture 14:
Random Number Generation
Presented by:
Paul Wileyto, Ph.D.
Anyone who uses software to produce random
numbers is in a state of sin.
John von Neumann
A good analog
random number
generator.
Why do we need Random
Numbers?

Simulation Input

Statistical Sampling

Assignment in Trials

Games
Where do you get random numbers?

Uniform Random Numbers

Published Tables

Make Them

Computer Algorithms

Harvest from Nature

Random draws from a specific


distribution

Make them from Uniform Random


Numbers
Software Random Number Generators

There are no true random number


generators but

There are Pseudo-Random Number


Generators

Computers have only a limited number of


bits to represent a number

Sooner or later, the sequence of random


numbers will repeat itself (period of the
generator)

The trick is to be good enough to look like


random numbers
Algorithms for Uniform Random Numbers
Good pseudo-random numbers:

Independent of the previous number

Long period

Sequence reproducible if started with


same initial conditions

Fast
Good pseudo-random numbers:

Equal probability for any number inside


interval [a,b]
Probability Density:
1
,
( )
0, ,
a x b
f x
b a
x a x b


'

< >

We are interested primarily in


uniform random numbers in the
interval [0,1].

Well refer to the realization of a uniform


random number over [0,1] as U.

Many of the algorithms produce integer


valued random numbers over interval
[0,b].

Transform to interval [0,1]


Linear Congruential Generator (LCG)
-1
Most common
( ) mod
= seed, modulus m (large prime),
muliplier , and increment c
Repeats due to the modular
arithmetic that forces wrapping
of values into th
n n
o
X X c m
X

+
e desired range.
Mod in SAS
proc iml; /* begin IML session */
q={20,30,40,50,70,90,160};
t=mod(q,7);
qt=q||t;
print qt; /* print matrix */
quit;
qt
20 6
30 2
40 5
50 1
70 0
90 6
160 6
SAS
Linear Congruential Generator (LCG)
-1
Most common
( ) mod
= seed, modulus m (large prime),
muliplier , and increment c
Repeats due to the modular
arithmetic that forces wrapping
of values into th
n n
o
X X c m
X

+
e desired range.
Mod in R
q<-matrix(seq(10,100, by=10),10,1)
qm=q%%13
qall<-cbind(q,qm)
qall
[,1] [,2]
[1,] 10 10
[2,] 20 7
[3,] 30 4
[4,] 40 1
[5,] 50 11
[6,] 60 8
[7,] 70 5
[8,] 80 2
[9,] 90 12
[10,] 100 9
R
proc iml; /* begin IML session */
seed = 123456;
c = j(5,1,seed);
b = uniform(c);
print b;
quit;
b
0.73902
0.2724794
0.7095326
0.3191636
0.367853
Unit Random Variates in SAS
SAS
RNGkind()
[1] "Mersenne-Twister" "Inversion"
set.seed(as.integer(format(Sys.time(), "%S%M%H")))
c<-matrix(runif(5),5,1)
c
[,1]
[1,] 0.9919911
[2,] 0.2598466
[3,] 0.1818524
[4,] 0.3357782
[5,] 0.2754353
Unit Random Variates in R
R
proc iml; /* begin IML session */
seed = 0;
c = j(5,1,seed);
b = uniform(c);
print b;
quit;
b
0.73902
0.2724794
0.7095326
0.3191636
0.367853
Unit Random Variates in SAS
Set seed to 0 to
grab a seed value
from the system
clock.
SAS
RANUNI() and IML UNIFORM() use a multiplicative
linear congruential generator (from SAS docs) where
SEED = mod( SEED * 397204094, 2**31-1 )
and then returns
SEED / (2**31-1)
SAS
Testing Randomness

Is it Uniform?
0 0.2 0.4 0.6 0.8 1
0
50
100
150
200
250
300


0 0.2 0.4 0.6 0.8 1
0
0.5
1
1.5
2
2.5
3
x 10
4
Testing Randomness

Generate two sets


and plot against
each other

Might see
correlation in higher
dimensions
Plot X
i
versus X
i+k
for
serial correlation
0
.
2
.
4
.
6
.
8
1
x
0 .2 .4 .6 .8 1
y
0
.
2
.
4
.
6
.
8
1
x
1
0 .2 .4 .6 .8 1
y2
Linear Congruential Generator

The good

Fast

Up to period of m random numbers

The Bad

Sequential correlation

Plots in more than 1 dimension do not fill in the


space uniformly, but tend to form bands

Not cryptographically secure

Selections of m, , and c are important


Linear Congruential Generator

Good magic number for linear


congruent method:

a = 16,807, c = 0, M = 2,147,483,647
Overflow Method for integers

Multiply two 32-bit numbers to get a 64


bit integer, that cannot be represented
in 32-bit space.

Low order 32 bits remain after the


overflow.

Divide by 2
32
to get floating point values
between 0 and 1.

Very Fast
1 j j
I aI c
+
+
Blum, Blum, Shub

Very slow

Not suited to simulation

Passes all tests

Cryptographically secure
( )
2
1
mod , ,
where p and q are large primes
n n
X X M M pq
+

Mersenne Twister

By Matsumoto and Nishimura (1997)

Caused a great deal of excitement in


1997.

Good statistical properties

Not good for cryptography

SAS IML RANDGEN function

Default technique for R runif()


Mersenne Twister

Im just going to give you the flavor of it

Its a bit shifting algorithm


32 bit word:

0 0 0 0 1 1 1 1
Mersenne Twister

XOR

Logical bitwise comparison


function

Compares two bits

If they are different, value is


1
If they are the same, value is
zero
>> a=[0 0 1 1]
a =
0 0 1 1
>> b=[0 1 1 0]
b =
0 1 1 0
>> c=xor(a,b)
c =
0 1 0 1
>>
MATLAB
Mersenne Twister

XOR

Logical bitwise comparison


function

Compares two bits

If they are different, value is


1
If they are the same, value is
zero
> a<-c(0, 0, 1, 1)
> b<-c(0, 1, 1, 0)
> c<-xor(a,b)
> c
[1] FALSE TRUE FALSE TRUE
> as.integer(c)
[1] 0 1 0 1
R
Mersenne Twister

Bit shifting algorithm


Use XOR function to flip values
32 bit word:

0 0 0 1 1 1 1 1
XOR
Mersenne Twister

Use 624 32 bit words to make one


19937 bit word (623*32 + 1)

XOR flip function in each 32-bit word


32 bit word:

0 0 0 0 1 1 1 1
XOR
To
next
word
From
last
word
Mersenne
Twister
From:
John Savards
Cryptology
Page
http://www.quadibloc.com
Mersenne Twister

By Matsumoto and Nishimura (1997)

Mersenne Prime Numbers (powers of 2


1) give period length: 2
19937-1
for 32 bit
numbers

Free C source code

Fast

Passes all randomness smell tests

Not cryptographically secure


proc iml; /* begin IML session */
r = j(10,1,.);
call randgen(r,'uniform');
print r;
quit;
r

0.0151013
0.5743561
0.5829185
0.6437729
0.1823678
0.3977417
0.476881
0.9845982
0.3211301
0.9623223
SAS
> RNGkind()
[1] "Mersenne-Twister" "Inversion"
> r=matrix(runif(10), 10,1)
> r
[,1]
[1,] 0.14645262
[2,] 0.04558767
[3,] 0.79254901
[4,] 0.57810786
[5,] 0.57831079
[6,] 0.30258424
[7,] 0.08682622
[8,] 0.77980499
[9,] 0.34161593
[10,] 0.98705945
R
Both R and SAS automatically grab a seed value
from the system clock at first use, unless you call
set.seed (in R) or randseed (in SAS) to set a specific
starting point
Grabbing a Seed from the
System Clock (SAS)
SAS
proc iml; /* begin IML session */
call randseed(12345);
r = j(10,1,.);
call randgen(r,'uniform');
print r;
quit;
r

0.5832971
0.9936254
0.5878877
0.8574689
0.8246889
0.2805668
0.6473969
0.3819192
0.4489572
0.8757847
SAS
> set.seed(12345)
> r=matrix(runif(10), 10,1)
> r
[,1]
[1,] 0.7209039
[2,] 0.8757732
[3,] 0.7609823
[4,] 0.8861246
[5,] 0.4564810
[6,] 0.1663718
[7,] 0.3250954
[8,] 0.5092243
[9,] 0.7277053
[10,] 0.9897369
R
Obtaining Random Numbers from
Specific Distributions

Inverse Probability Transform


Methods

Rejection Methods

Mixed Rejection and Transform

Methods for Correlated Random


Numbers
Obtaining Random Numbers from
Specific Distributions

Inverse Probability Transform methods

Let X be a random variable described by CDF F(X)

We wish to generate values of X distributed


according to F(X).

Given a continuous Uniform Random Variable U, in


[0,1], the Random Variable X=F
-1
(U).
{ }
1
( ) inf | ( ) , 0 1 F u x F x u u

< <
What this means is:

Solve for X in the CDF, so that when


you plug in U for F, you get a random
number from that specific distribution.
Example: Exponential Distribution
- x
(0, )
1
f(x)= e ( ) , ( ) 1 (0, )
1
log(1 )
log(1 )
, ( )
log(1 )
( ).
x
x
x
x
I x F x e I
Let y e
y
Solving for x x
y
So F y
U
which means that is an rv distributed
as Exponential


proc iml; /* begin IML session */
u = j(1000,1,.);
call randgen(u,'uniform');
exrand=-log(1-u)/.04;
tbl=u||exrand;
print tbl;
varnames={"u","erand"};
create erand from tbl [colname=varnames];
append from tbl;
quit;
proc means data=erand;
var u erand;
run;

title 'Analysis Exponential RVs';
proc univariate data=erand noprint;
histogram erand / midpoints=5 to 205 by 10 exp;
run;
tbl
0.115794 3.0766303
0.754043 35.064963
0.157732 4.2914261
0.0431113 1.1017045
0.1086405 2.8751851
0.2632565 7.6378865
0.9448316 72.434124
0.3589581 11.116512
0.7109185 31.026164
0.4665676 15.710572
The MEANS Procedure
Variable N Mean Std DevMinimum
Maximu
m
u 1000 0.4819 0.2877 0.0010 0.9996
erand 1000 23.4993 23.7694 0.0240
194.533
0
>r=matrix(runif(10000), 10000,1)
>exrand=-log(1-r)/.04
>hist(exrand, freq = FALSE)
> mean(exrand)
[1] 24.55222
> 1/mean(exrand)
[1] 0.04072951
> hist(exrand, freq = FALSE)> help.search("means")
R
Histogram of exrand
exrand
D
e
n
s
i
t
y
0 50 100 150 200 250
0
.
0
0
0
0
.
0
1
0
0
.
0
2
0
R
( )
( )
1
Survival Time: ( ) exp
ln
Inverse Prob Transform:
1.5, 0.001
S U t
U
t

1
]

Weibull Survival
SAS
proc iml; /* begin IML session */
u = j(2000,1,.);
call randgen(u,'uniform');
wrand=(-log(1-u)/.001)##(1/1.5);
tbl=u||wrand;
print tbl;
varnames={"u","weibrand"};
create wrand from tbl [colname=varnames];
append from tbl;
Quit;
proc means data=wrand;
var u weibrand;
run;
title 'Analysis of Weibull RVs';
proc univariate data=wrand noprint;
histogram weibrand / midpoints=5 to 205 by 10 weibull;
run;
SAS
SAS
( )
( )
1
Survival Time: ( ) exp
ln
Inverse Prob Transform:
1.5, 0.001
S U t
U
t

1
]

Weibull Survival
R
> r=matrix(runif(10000), 10000,1)
> wrand=(-log(1-r)/.001)^(1/1.5)
> hist(wrand, freq = FALSE, main = paste("Histogram of
Survival Times"), breaks=50, xlab = "Survival Time")
R
R
proc lifetest data=Work.Wrand method=pl OUTSURV=work._surv;

time WEIBRAND * CENS (0);

run; quit;

goptions reset=all device=WIN;

data work._surv; set work._surv;

if survival > 0 then _lsurv = -log(survival);

if _lsurv > 0 then _llsurv = log(_lsurv);

run;

** Survival plots **;

goptions reset=symbol;

goptions ftext=SWISS ctext=BLACK htext=1 cells;

proc gplot data=work._surv ;

label weibrand = 'Survival Time';

axis2 minor=none major=(number=6)

label=(angle=90 'Survival Distribution Function');

symbol1 i=stepj c=BLUE l=1 width=1;

plot survival * weibrand=1 /

description="SDF of weibrand"

frame cframe=CXF7E1C2 caxis=BLACK vaxis=axis2 hminor=0 name='SDF';

run;

symbol1 i=join c=BLUE l=1 width=1;

quit;

goptions ftext= ctext= htext= reset=symbol;

SAS
SAS
> r=matrix(runif(1000), 1000,1)
> wrand=(-log(1-r)/.001)^(1/1.5)
> event=wrand<=200
> wrand2=wrand*(event)+200*(1-event)
> fit <- survfit(Surv(wrand2, event) ~ 1, data = aml)
> plot(fit, lty = 2:3,xlab = "Days", ylab="Survival")
>
R
R
( )
( )
1
0
1
Survival Time: ( ) exp , exp( )
ln
Inverse Prob Transform:
1.5, exp( )
ln(0.001) 2.30
( ) 0.5, 0.69
S P t
P
t
HR Drug

1
]



x
x
Simulating Weibull Regression Data, with
Proportional Hazards
SAS
proc iml; /* begin IML session */
u = j(400,1,.);
d = j(200,1,0) // j(200,1,1);
call randgen(u,'uniform');
wrand=(-log(1-u)/exp(log(.001)-0.69*d))##(1/1.5);
c = wrand<=200;
wrand=wrand##c + 200*(1-c);
tbl=u || wrand || d || c ;
print tbl;
varnames={"u","weibrand","treat", "cens"};
create wrand from tbl [colname=varnames];
append from tbl;
quit;
Data, drug treatment, 0,1)
Constant Value, drug effect * drug
SAS
options pageno=1;
proc lifetest data=Work.Wrand method=pl OUTSURV=work._surv;

time WEIBRAND * CENS (0); strata TREAT;

run; quit;
goptions reset=all device=WIN;

data work._surv; set work._surv;

if survival > 0 then _lsurv = -log(survival);

if _lsurv > 0 then _llsurv = log(_lsurv);

run;
** Survival plots **;
title;
footnote;
goptions reset=symbol;
goptions ftext=SWISS ctext=BLACK htext=1 cells;

proc gplot data=work._surv ;

label weibrand = 'Survival Time';

axis2 minor=none major=(number=6)

label=(angle=90 'Survival Distribution Function');

symbol1 i=stepj l=1 width=1; symbol2 i=stepj l=2 width=1; symbol3 i=stepj l=3 width=1;

plot survival * weibrand = treat /

description="SDF of weibrand by treat"

frame cframe=CXF7E1C2 caxis=BLACK vaxis=axis2 hminor=0 name='SDF';

run;
symbol1 i=join l=1 width=1; symbol2 i=join l=2 width=1; symbol3 i=join l=3 width=1;

quit;
SAS
SAS
Testing Global Null Hypothesis: BETA=0



Test Chi-Square DF Pr > ChiSq



Likelihood Ratio 202.5356 1 <.0001

Score 209.0491 1 <.0001

Wald 201.2807 1 <.0001





Analysis of Maximum Likelihood Estimates



Parameter Standard Hazard

Parameter DF Estimate Error Chi-Square Pr > ChiSq Ratio



treat 1 -0.70413 0.04963 201.2807 <.0001 0.495
*** Proportional Hazards Models *** ;


options pageno=1;

proc phreg data=Work.Wrand;


model WEIBRAND * CENS (0) = TREAT;


run; quit;
SAS
( )
( )
1
0
1
Survival Time: ( ) exp , exp( )
ln
Inverse Prob Transform:
1.5, exp( )
ln(0.001) 2.30
( ) 0.5, 0.69
S P t
P
t
HR Drug

1
]



x
x
Simulating Weibull Regression Data, with
Proportional Hazards
R
r=matrix(runif(400), 400,1)
drug=rbind(matrix(1,200,1),matrix(0,200,1))
wrand=(-log(1-r)/exp(log(.001)-0.69*drug))^(1/1.5)
event = wrand<=200;
wrand=wrand*event + 200*(1-event)
survreg(Surv(wrand, event) ~ drug, dist='weibull', model=TRUE, scale=1)
Call:
survreg(formula = Surv(wrand, event) ~ drug, dist = "weibull",
scale = 1, model = TRUE)
Coefficients:
(Intercept) drug
4.6042014 0.5978962
Scale fixed at 1
Loglik(model)= -1918.1 Loglik(intercept only)= -1932.6
Chisq= 29.06 on 1 degrees of freedom, p= 7e-08
n= 400
lsurv2 <- survfit(Surv(wrand, event) ~ drug, aml, type='fleming')
plot(lsurv2, lty=2:3,xlab = "Days", ylab="Survival")
Data, drug treatment (0,1)
Constant Value
Drug effect * drug
R
Package survival
R
Call:
phreg(formula = Surv(enter, wrand, event) ~ drug)
Covariate W.mean Coef Exp(Coef) se(Coef) Wald p
drug 0.586 -0.731 0.481 0.113 0.000
log(scale) 4.663 105.903 0.050 0.000
log(shape) 0.402 1.495 0.047 0.000
Events 327
Total time at risk 44359
Max. log. likelihood -1886.9
LR test statistic 42.4
Degrees of freedom 1
Overall p-value 7.38224e-11
> enter=matrix(0,400,1)
> fit <- phreg(Surv(enter, wrand, event) ~ drug)
> fit
> plot.phreg(fn="sur)
R
Package eha
R
Generating Numbers from
Specific Distributions

Rejection Method

Fast

Good for Count Models

Good when you cannot find F


-1
, but have
f(x)

Generally Use Pairs of Random Numbers

Just like playing the game Battleship


The Rejection Method is Like Playing the Game Battleship
Rejection

Choose pairs of uniform random


numbers
x
U
between X
min
and X
max
y
U
between Y
min
and Y
max
Reject x
U
if y
U
> f(x) at x
U

Rejection
X
min
X
max
Y
max
Y
min
f(x)
Hit
Miss
Miss
Sample the area (two dimensions) containing the
probability distribution or density function uniformly.
Rejection

Simple version becomes inefficient if


the rejection area is large.
Large Dead Zone
g(x)
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
0
0.05
0.1
0.15
0.2
0.25
239 In, 761 Rejected
Binomial Count:
p=0.2
Trials=20
Matlab
Rejection

Can be made more efficient by uniform


sampling over a smaller target area.
g(x)
Smaller
Dead Zone
The trick is to sample uniformly over
the smaller area.
Rejection

Can be made more efficient by uniform


sampling over a smaller target area.
g(x)
f(x)
First, define "dominating function" ( ),
and corresponding integral or Cumulative
Distribution ( ).
( ) need not be normalized.
f x
F x
F x
Smaller
Dead Zone
Rejection

Can be made more efficient by uniform


sampling over a smaller target area.
g(x)
f(x)
Smaller
Dead Zone
2
2
( ) , 0
( )
2
( )
2
a
b
a
b
Max
f x a bx x
b
F x ax x
a
F
b
a
b
x

Rejection
Choose x
U
based on inverse transform of the
integrated dominance function (F(x)).
Choose a uniform random number U1 in the range:
Calculate x by setting F(x)=U, and solving (the quadratic) for x.
g(x)
f(x)
x
U
2
0 1
2
a
U
b

Rejection

Evaluate f(x), choose a second uniform


random number U2 between 0 and f(x).

Reject if U2 >f(x)
g(x)
f(x)
x
U
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
0
0.05
0.1
0.15
0.2
0.25
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
0
0.05
0.1
0.15
0.2
0.25
315 In, 685 Rejected
Weibull Function?
( )
( )
( ) ( )
1
1
1
( )
( ) exp 2
( ) 1 exp
ln , 6.5, 1.8
u
x
x
f x times
x
F x
F x u

1

1
]
1

1
]

-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
0
0.05
0.1
0.15
0.2
0.25
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
0
0.05
0.1
0.15
0.2
0.25
574 In, 426 Rejected
Binomial Distribution
(Bernoulli Trials are the simplest
example of the rejection method.)
Probability Pr(X=1): p
>>proc iml; /* begin IML session */
r = j(10,1,.);
call randgen(r,'uniform');
b=r>0.5;
print b;
quit;
b
1
0
1
1
0
0
0
0
0
1
SAS
> r=matrix(runif(10), 10,1)
> b=r<=0.5
> cbind(r,b)
[,1] [,2]
[1,] 0.4919652 1
[2,] 0.5088624 0
[3,] 0.5955355 0
[4,] 0.5243394 0
[5,] 0.5923056 0
[6,] 0.1610980 1
[7,] 0.9663659 0
[8,] 0.2548106 1
[9,] 0.4582953 1
[10,] 0.1170421 1
>
R
Simulating Outcomes from a
Logistic Model

But then, you never have just one


value of p for your Bernoulli Trials

Placebo Controlled Drug Trial

25% Success for Placebo

Odds Ratio of 2.0 for Treatment

Two different success probabilities,


based on logistic model
Logistic Model
( )
( )
( )
( )
( )
0
0
0
exp
Placebo: 0.25 , 1.0986
1 exp
Drug (0,1): OR=2.0, ln(OR)=0.6931
exp 1.0986 0.6931*
CDF
1 exp 1.0986 0.6931*
Drug
Success
Drug


+
+

+ +
proc iml; /* begin IML session */
u = j(400,1,.);
d = j(400,1,1)||(j(200,1,0) // j(200,1,1));
bta= {-1.0986 , 0.6931};
call randgen(u,'uniform');
expit=exp(d*bta)/(1+exp(d*bta));
outcome=u<=expit;
tbl=u || d || expit || outcome ;
varnames={"u","const","treat", "expit","outcome"};
create erand from tbl [colname=varnames];
append from tbl;
quit;


proc logistic data=Work.Erand DESCEND;

model OUTCOME = TREAT;

run;
SAS
1
1
0
0
1
1
d:
1
2
bta
bta
1
1
]
SAS
The LOGISTIC Procedure
Analysis of Maximum Likelihood Estimates
Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 -1.2367 0.1693 53.3383 <.0001
treat 1 0.5510 0.2261 5.9402 0.0148
Odds Ratio Estimates
Point 95% Wald
Effect Estimate Confidence Limits
treat 1.735 1.114 2.702
r=matrix(runif(400), 400,1)
drug=rbind(matrix(0,200,1),matrix(1,200,1))
d=cbind(matrix(1,400,1), drug)
parms=matrix(c(-1.0986 , 0.6931),2,1)
expit=exp(d%*%parms)/(1+exp(d%*%parms))
outcome=r<=expit


R
1
1
0
0
1
1
d:
1
2
parm
parm
1
1
]
R
> drugtrial<-glm(outcome~drug, family = binomial(link="logit"))
> summary(drugtrial)
Call:
glm(formula = outcome ~ drug, family = binomial(link = "logit"))
Deviance Residuals:
Min 1Q Median 3Q Max
-1.036 -1.036 -0.776 1.326 1.641
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.0460 0.1612 -6.488 8.68e-11 ***
drug 0.7026 0.2158 3.255 0.00113 **
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 511.49 on 399 degrees of freedom
Residual deviance: 500.67 on 398 degrees of freedom
AIC: 504.67
Number of Fisher Scoring iterations: 4
Normally Distributed Random Numbers

Inverse transform methods inefficient


for normal random numbers

Box-Muller Transform

z transformation of two random uniform


variates [X
1
,X
2
~U(0,1)]

Random radius, random


1 2
1 1 2
2 1 2
Get two z variates from two uniform
random numbers, and :
cos( ) 2ln( ) cos(2 )
sin( ) 2ln( ) sin(2 )
X X
z r X X
z r X X




Normally Distributed Random Numbers

Box-Muller Transform
X
1
, X
2
, specify a position within the unit circle
Random angle, random radius

Would be more efficient if it did not make calls to


trigonometric functions.

Marsaglia Method

Places the Unit Circle within a square, -1 to +1,


and samples the square uniformly.
Rejects draws that fall outside the circle.
But it avoids calls to trig functions.
2 2
1 2
1 1 2 2
1
2ln( ) 2ln( )
,
s X X
s s
z X z X
s s
+


Generating Numbers from
Specific Distributions

Normal, Using CLT (quick & dirty)

Sum several iterations of u

Standardize
Recall that Var(u)=1/12
12
1
6
i
i
X u

Correlated Multivariate Random Numbers

Simulating panel data, repeated


measures

Mixture distributions
Generating Multivariate Normal
Random Numbers
Desired Covariance Matrix
, is the Cholesky Decomposition of
Begin with independent standard normal RVs (0,1)
Correlated (Multivariate) Normal RVs:
N
+
V
V = R'R R V
Z
X = R'Z
:
Generating Multivariate Normal
Random Numbers
proc iml; /* begin IML session */
rmat={1 .3 .2 .1, .3 1 .3 .2, .2 .3 1 .3 , .1 .2 .3 1};
sigvec={53 36 12 47};
cvmat=rmat#(sigvec`*sigvec);
upr=half(cvmat);
print rmat;
print sigvec;
print cvmat;
print upr;
r1 = j(1000,4,.);
r2 = j(1000,4,.);
call randgen(r1,'uniform');
call randgen(r2,'uniform');
pi= 4*atan(1);
print pi;
/* Lets be wasteful */
z1=sqrt(-2*log(r1))#cos(2*pi*r2);
z1=z1*upr;
varnames={"x1","x2","x3","x4"};
create nrand from z1 [colname=varnames];
append from z1;
quit;
proc corr data=work.nrand pearson;
var x1 x2 x3 x4;
run;
SAS
rmat
1 0.3 0.2 0.1
0.3 1 0.3 0.2
0.2 0.3 1 0.3
0.1 0.2 0.3 1
sigvec
53 36 12 47
cvmat
2809 572.4 127.2 249.1
572.4 1296 129.6 338.4
127.2 129.6 144 169.2
249.1 338.4 169.2 2209
upr
53 10.8 2.4 4.7
0 34.341811 3.0190603 8.3757958
0 0 11.36333 11.672016
0 0 0 44.503035
SAS
The CORR Procedure
4 Variables: x1 x2 x3 x4
Simple Statistics
Variable N Mean Std Dev Sum Minimum Maximum
x1 1000 -0.86090 51.78291 -860.89502 -167.70178 157.51299
x2 1000 -0.21592 36.41244 -215.92386 -122.58068 120.05335
x3 1000 -0.06176 11.60953 -61.75755 -37.09589 43.83908
x4 1000 0.46483 46.63351 464.82762 -152.65527 143.41509
Pearson Correlation Coefficients, N = 1000
Prob > |r| under H0: Rho=0
x1 x2 x3 x4
x1 1.00000 0.30338 0.20341 0.11397
<.0001 <.0001 0.0003
x2 0.30338 1.00000 0.28186 0.24150
<.0001 <.0001 <.0001
x3 0.20341 0.28186 1.00000 0.34421
<.0001 <.0001 <.0001
x4 0.11397 0.24150 0.34421 1.00000
0.0003 <.0001 <.0001
SAS
Generating Multivariate Normal
Random Numbers
cmat<-rbind(c(1, .3, .2, .1), c(.3, 1, .3, .2), c(.2, .3, 1, .3) , c(.1, .2, .3, 1))
sigvec=c(53, 36, 12, 47)
vv=cmat*(sigvec%*%t(sigvec))
rr=chol(vv)
r1=matrix(runif(1000), 250,4)
r2=matrix(runif(1000), 250,4)
z1=sqrt(-2*log(r1))*cos(2*pi*r2)
rvs=z1%*%rr
R
> cmat
[,1] [,2] [,3] [,4]
[1,] 1.0 0.3 0.2 0.1
[2,] 0.3 1.0 0.3 0.2
[3,] 0.2 0.3 1.0 0.3
[4,] 0.1 0.2 0.3 1.0
> vv
[,1] [,2] [,3] [,4]
[1,] 2809.0 572.4 127.2 249.1
[2,] 572.4 1296.0 129.6 338.4
[3,] 127.2 129.6 144.0 169.2
[4,] 249.1 338.4 169.2 2209.0
R
> rr
[,1] [,2] [,3] [,4]
[1,] 53 10.80000 2.400000 4.700000
[2,] 0 34.34181 3.019060 8.375796
[3,] 0 0.00000 11.363330 11.672016
[4,] 0 0.00000 0.000000 44.503035
> cov(rvs)
[,1] [,2] [,3] [,4]
[1,] 2832.4200 561.2585 134.0656 533.7351
[2,] 561.2585 1235.7616 124.2373 382.5441
[3,] 134.0656 124.2373 127.4132 160.2173
[4,] 533.7351 382.5441 160.2173 2205.5903
> cor(rvs)
[,1] [,2] [,3] [,4]
[1,] 1.0000000 0.2999969 0.2231676 0.2135426
[2,] 0.2999969 1.0000000 0.3130961 0.2317137
[3,] 0.2231676 0.3130961 1.0000000 0.3022317
[4,] 0.2135426 0.2317137 0.3022317 1.0000000
> sd(rvs)
[1] 53.22048 35.15340 11.28774 46.96371
Subject-specific Random Effects
We have an error term (e
ij
) for
measurement j in subject i.

We also have a subject specific random


effect (k
i
)
For the subject in the measurement:
th th
ij ij i
i j
y x e k + +

Recipe for Subject-specific Random Effects

Create subjects for study

Assign treatment, covariates

Give each subject a random effect

Drawn from, say, N(0,V)

Generate predicted values based on


regression + random effects

Generate outcomes for each repeated


measure from specific distribution
Logistic Model
( )
( )
( )
( )
0
0
0
i
exp
Placebo: 0.25 , 1.0986
1 exp
Drug: OR=2.0, ln(OR)=0.6931
Time (0,1,2): OR 1.5, ln(OR)=0.4055
K N(0,1)
exp 1.0986 0.6931* 0.4055*
CDF
1 exp 1.0986 0.6931* 0.4055*
i
Drug Time K
Success
Drug Time


+
+ + +

+ + +
:
( )
i
K +
proc iml; /* begin IML session */
u = j(600,1,.);
d1=j(100,1,0)//j(100,1,1);
d1=d1//d1//d1;
id=j(200,1,0);
do i=1 to 200 by 1;
id[i,1]=i;
end;
id=id//id//id;
t=j(200,1,0)//j(200,1,1)//j(200,1,2);
k=j(200,1,.);
call randgen(k,'normal');
k=k//k//k;
bta= {-1.0986 , 0.6931,.4055,1};
d = j(600,1,1)||d1||t||k;
call randgen(u,'uniform');
expit=exp(d*bta)/(1+exp(d*bta));
y=u<=expit;
tbl=id||u || d || expit || y ;
varnames={"id","u","const","treat","t","k", "expit","outcome"};
create erand from tbl [colname=varnames];
append from tbl;
quit;
1
0
1
d:
1
2
1
bta
bta
1
1
1
1
]
0
1
0
1
ID
id:
ID
ID
k
k
k
SAS
. xtlogit outcome treat t, i(id)
Random-effects logistic regression Number of obs = 600
Group variable: id Number of groups = 200
Random effects u_i ~ Gaussian Obs per group: min = 3
avg = 3.0
max = 3
Wald chi2(2) = 11.07
Log likelihood = -394.754 Prob > chi2 = 0.0040
------------------------------------------------------------------------------
outcome | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
treat | .6023501 .2279107 2.64 0.008 .1556533 1.049047
t | .235297 .1123071 2.10 0.036 .015179 .4554149
_cons | -.9334699 .2040373 -4.57 0.000 -1.333376 -.5335642
-------------+----------------------------------------------------------------
/lnsig2u | -.1281394 .3971684 -.9065751 .6502964
-------------+----------------------------------------------------------------
sigma_u | .9379396 .18626 .6355353 1.384236
rho | .2109869 .0661172 .1093476 .3680594
------------------------------------------------------------------------------
Likelihood-ratio test of rho=0: chibar2(01) = 13.01 Prob >= chibar2 = 0.000
. di -394.754*(-2)
789.508
Stata, SAS data
Finally got this to run in SAS. I had forgotten that SAS requires you to sort.
Stata does not require sorted data for their mixed models.
proc sort data=erand;
by id t;
run;

proc nlmixed data=erand qpoints=5 ;
parms b0=0 b1=-.7 b2=.6 sig=0 ;
theta2 = b0+b1*treat+b2*t+u;
prb= exp(theta2)/(1+exp(theta2));
model outcome ~ binary(prb);
random u ~normal(0,sig) subject=id ;
run;
SAS
The NLMIXED Procedure
Fit Statistics
-2 Log Likelihood 789.5
AIC (smaller is better) 797.5
AICC (smaller is better) 797.6
BIC (smaller is better) 810.7
Parameter Estimates
Standard
Parameter Estimate Error DF t Value Pr > |t| Alpha Lower Upper Gradient
b0 -0.9332 0.2039 199 -4.58 <.0001 0.05 -1.3354 -0.5310 0.00039
b1 0.6021 0.2278 199 2.64 0.0089 0.05 0.1529 1.0513 -0.00007
b2 0.2353 0.1123 199 2.10 0.0374 0.05 0.01382 0.4567 0.000737
sig 0.8781 0.3480 199 2.52 0.0124 0.05 0.1917 1.5644 0.000363
We see tiny differences between this and Stata results, owing to differences in optimization
specs.
SAS
id=matrix(seq(1:200), 200,1)
k1=matrix(runif(200), 200,1)
k2=matrix(runif(200), 200,1)
k=sqrt(-2*log(k1))*cos(2*pi*k2)
id=rbind(id,id,id)
k=rbind(k,k,k)
drug =rbind(matrix(0,100,1), matrix(1,100,1))
drug=rbind(drug,drug,drug)
d=cbind(matrix(1,600,1),drug,k)
parms=matrix(c(-1.0986 , 0.6931,1),3,1)
expit=exp(d%*%parms)/(1+exp(d%*%parms))
outcome=r<=expit
1
0
1
d:
1
2
1
bta
bta
1
1
1
1
]
0
1
0
1
ID
id:
ID
ID
k
k
k
R
. xtlogit outcome drug, i(id)
Random-effects logistic regression Number of obs = 600
Group variable: id Number of groups = 200
Random effects u_i ~ Gaussian Obs per group: min = 3
avg = 3.0
max = 3
Wald chi2(1) = 14.55
Log likelihood = -368.112 Prob > chi2 = 0.0001
------------------------------------------------------------------------------
outcome | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
drug | .8331252 .2183881 3.81 0.000 .4050924 1.261158
_cons | -1.240213 .1708219 -7.26 0.000 -1.575018 -.9054085
-------------+----------------------------------------------------------------
/lnsig2u | -.6325958 .5624138 -1.734907 .469715
-------------+----------------------------------------------------------------
sigma_u | .7288423 .2049555 .4200198 1.264729
rho | .1390212 .0673177 .050895 .3271436
------------------------------------------------------------------------------
Likelihood-ratio test of rho=0: chibar2(01) = 5.08 Prob >= chibar2 = 0.012
R
Got this to run in R, using a mixed effects package called Zelig.
z.out1 <- zelig(outcome ~ drug + tag(1 | id),data=NULL, model="logit.mixed")
Delia Bailey and Ferdinand Alimadhi. 2007. "logit.mixed: Mixed effects logistic
model" in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical
Software," http://gking.harvard.edu/zelig
summary(z.out1)
Generalized linear mixed model fit by the Laplace approximation
Formula: outcome ~ drug + tag(1 | id)
AIC BIC logLik deviance
743.4 756.6 -368.7 737.4
Random effects:
Groups Name Variance Std.Dev.
id (Intercept) 0.39486 0.62838
Number of obs: 600, groups: id, 200
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.2172 0.1511 -8.054 8.02e-16 ***
drug 0.8174 0.2023 4.041 5.32e-05 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Correlation of Fixed Effects:
(Intr)
drug -0.747
R
General Approach to Correlated
Multivariate Random Numbers

Copulas

allow us to draw correlated random


numbers from different distributions

Random effects in Mixture Models

They use CDF probabilities of correlated


variables on the inside to map to
correlated uniform random numbers on
the margins

Those correlated uniform RVs may be used


to marry vastly different distributions.

Maintain Marginal Distributions


Generating Multivariate Random
Numbers
From SAS documentation, a Gaussian Copula

Independent Normal (N(0,1) ) random variables are


generated

These variables are transformed to a correlated set of


z-scores by using the Cholesky Decomposition of the
covariance matrix.
These correlated normal RVs are transformed to a
uniform by using (z).
F
-1
() is used to compute the final sample value
Generating Multivariate Random
Numbers
proc iml; /* begin IML session */
rmat={1 .3 .2 .1, .3 1 .3 .2, .2 .3 1 .3 , .1 .2 .3 1};
sigvec={1 1 1 1};
cvmat=rmat#(sigvec`*sigvec);
/* # is element-wise multiplication */
upr=half(cvmat);
print rmat;
print sigvec;
print cvmat;
print upr;
r1 = j(1000,4,.);
r2 = j(1000,4,.);
call randgen(r1,'uniform');
call randgen(r2,'uniform');
pi= 4*atan(1);
print pi;
z1=sqrt(-2*log(r1))#cos(2*pi*r2);
/* Note I could have gotten another z here */
z1=z1*upr;
z1=cdf('Normal',z1);
z1=gaminv(z1,3.0);
/* Standardized gamma parameter, also the
mean */
varnames={"x1","x2","x3","x4"};
create nrand from z1 [colname=varnames];
append from z1;
quit;
proc corr data=work.nrand pearson;
var x1 x2 x3 x4;
run;
SAS
rmat
1 0.3 0.2 0.1
0.3 1 0.3 0.2
0.2 0.3 1 0.3
0.1 0.2 0.3 1
sigvec
1 1 1 1
cvmat
1 0.3 0.2 0.1
0.3 1 0.3 0.2
0.2 0.3 1 0.3
0.1 0.2 0.3 1
SAS
The CORR Procedure
4 Variables: x1 x2 x3 x4
Simple Statistics
Variable N Mean Std Dev Sum Minimum Maximum
x1 1000 2.96320 1.73566 2963 0.11528 12.19072
x2 1000 3.01249 1.68236 3012 0.14039 10.20117
x3 1000 3.00336 1.68803 3003 0.34496 13.72023
x4 1000 3.08106 1.79858 3081 0.11148 13.25409
Pearson Correlation Coefficients, N = 1000
Prob > |r| under H0: Rho=0
x1 x2 x3 x4
x1 1.00000 0.25874 0.19005 0.10052
<.0001 <.0001 0.0015
x2 0.25874 1.00000 0.22622 0.13944
<.0001 <.0001 <.0001
x3 0.19005 0.22622 1.00000 0.32082
<.0001 <.0001 <.0001
x4 0.10052 0.13944 0.32082 1.00000
0.0015 <.0001 <.0001.
SAS
Generating Multivariate Random
Numbers
cmat<-rbind(c(1, .4, .4, .4), c(.4, 1, .4, .4), c(.4, .4, 1, .4) , c(.4, .4, .4, 1))
rr=chol(cmat)
r1=matrix(runif(1000), 250,4)
r2=matrix(runif(1000), 250,4)
z1=rbind(sqrt(-2*log(r1))*cos(2*pi*r2),sqrt(-2*log(r2))*cos(2*pi*r1))
rvs=z1%*%rr
cd=pnorm(rvs,mean=0,sd=1)
g<-qinvgamma(cd,2,3)
corr(cd)
[1] 0.4150358
corr(rvs)
[1] 0.4188932
corr(g)
[1] 0.2337756
R
>> U = copularnd('Gaussian',.4,10)
U =
0.8017 0.9388
0.3650 0.2250
0.8104 0.6253
0.3467 0.0988
0.6067 0.6561
0.4743 0.6723
0.6273 0.7427
0.9905 0.8249
0.4427 0.6925
0.3443 0.2711
>> U = copularnd('Gaussian',.4,10000);
>> corr(U)
ans =
1.0000 0.3765
0.3765 1.0000
>> X = norminv(U,0,1);
>> corr(X)
ans =
1.0000 0.3901
0.3901 1.0000
>> Xg = gaminv(U,2,3);
>> corr(Xg)
ans =
1.0000 0.3721
0.3721 1.0000
>>
Matlab
(a little more clear)
Old Slides
program define seedset
local ct =c(current_time)
local s1=substr("`ct'",7,2)
local s2=substr("`ct'",4,2)
local s3=substr("`ct'",2,1)
global newseed=real("`s1'" +"`s2'" +"`s3'")
di $newseed
set seed $newseed
end
Grabbing a Seed from the
System Clock (Stata)
LCG is default for Stata
. set obs 100
obs was 0, now 100
. gen x0=ceil(uniform()*100)
. gen m=ceil(uniform()*10)
. gen x1=mod(x0,m)
. list in 1/10
+------------+
| x0 m x1 |
|------------|
1. | 70 2 0 |
2. | 62 7 6 |
3. | 92 7 1 |
4. | 53 1 0 |
5. | 37 3 1 |
|------------|
6. | 78 1 0 |
7. | 47 2 1 |
8. | 91 2 1 |
9. | 98 1 0 |
10. | 71 9 8 |
+------------+
Testing Randomness (Stata)
Correlogram of X
i
versus X
i+k
for serial
correlation
. gen tv=_n
. tsset tv
time variable: tv, 1 to 20000
delta: 1 unit
. corrgram x, lags(40)
-1 0 1 -1 0
1
LAG AC PAC Q Prob>Q [Autocorrelation] [Partial
Autocor]
-----------------------------------------------------------------------------
--
1 0.0026 0.0026 .13551 0.7128 | |

2 -0.0011 -0.0011 .15995 0.9231 | |

3 -0.0004 -0.0004 .16301 0.9833 | |

4 -0.0131 -0.0131 3.5987 0.4630 | |

5 -0.0008 -0.0007 3.6115 0.6066 | |

6 0.0119 0.0118 6.4238 0.3774 | |

7 -0.0060 -0.0061 7.1533 0.4131 | |

8 0.0004 0.0003 7.1571 0.5198 | |

9 0.0057 0.0057 7.815 0.5529 | |

10 -0.0049 -0.0046 8.2893 0.6006 | |

11 -0.0097 -0.0099 10.19 0.5134 | |

12 0.0044 0.0043 10.581 0.5651 | |

13 0.0087 0.0090 12.102 0.5193 | |

14 0.0081 0.0079 13.424 0.4935 | |

15 0.0068 0.0064 14.357 0.4986 | |

16 0.0068 0.0071 15.285 0.5039 | |

17 -0.0029 -0.0025 15.449 0.5632 | |

. seedset
23491
. set obs 2000
obs was 0, now 2000
. gen P=uniform()
. gen enum=-ln(P)/.04
. ci
Variable | Obs Mean Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------
P | 2000 .5030619 .0065298 .4902559 .5158678
enum | 2000 24.79822 .553316 23.71309 25.88336
(Stata)
. set obs 200
obs was 0, now 200
. gen P=uniform()
. gen tte=(-ln(P)/0.1)^1.5
. gen fail=1
. replace fail=0 if tte>200
. replace tte=200 if tte>200
. stset tte, fail(fail)
failure event: fail != 0 & fail < .
obs. time interval: (0, tte]
exit on or before: failure
-------------------------------------------------------------------
200 total obs.
0 exclusions
-------------------------------------------------------------------
200 obs. remaining, representing
188 failures in single record/single failure data
9019.163 total analysis time at risk, at risk from t = 0
earliest observed entry t = 0
last observed exit t = 200
(Stata)
0
.
0
0
0
.
2
5
0
.
5
0
0
.
7
5
1
.
0
0
0 50 100 150 200
analysis time
Kaplan-Meier survival estimate
(Stata)
. streg, d(w) nohr
failure _d: fail
analysis time _t: tte
Weibull regression -- log relative-hazard form
No. of subjects = 200 Number of obs = 200
No. of failures = 188
Time at risk = 9019.163067
LR chi2(0) = 0.00
Log likelihood = -403.39593 Prob > chi2 = .
--------------------------------------------------------------------
_t | Coef. SE z P>|z| [95% CI]
-------------+------------------------------------------------------
_cons | -2.245 0.167 -13.47 0.000 -2.572 -1.918
-------------+------------------------------------------------------
delta | 0.625 0.036 0.558 0.701
--------------------------------------------------------------------
.
(Stata)
. gen P=uniform()
. gen tte=(-ln(P)/(exp(log(0.1)+log(0.5)*drug)))^1.5
. gen fail=1
. replace fail=0 if tte>200
(39 real changes made)
. replace tte=200 if tte>200
(39 real changes made)
. stset tte, fail(fail)
failure event: fail != 0 & fail < .
obs. time interval: (0, tte]
exit on or before: failure
------------------------------------------------------------------------------
400 total obs.
0 exclusions
------------------------------------------------------------------------------
400 obs. remaining, representing
361 failures in single record/single failure data
25170.04 total analysis time at risk, at risk from t = 0
earliest observed entry t = 0
last observed exit t = 200
.
(Stata)
. list drug P tte fail in 1/15
+-----------------------------------+
| drug P tte fail |
|-----------------------------------|
1. | 1 .842721 6.331312 1 |
2. | 0 .3839878 29.6119 1 |
3. | 1 .3483792 96.8484 1 |
4. | 0 .6035132 11.34804 1 |
5. | 1 .8460417 6.114305 1 |
|-----------------------------------|
6. | 1 .4935982 53.06192 1 |
7. | 1 .5173908 47.84433 1 |
8. | 1 .385052 83.39208 1 |
9. | 0 .8726683 1.589515 1 |
10. | 0 .0356283 192.5611 1 |
|-----------------------------------|
11. | 0 .8018837 3.280757 1 |
12. | 0 .6059877 11.21039 1 |
13. | 1 .7919235 10.07838 1 |
14. | 0 .1920578 67.02081 1 |
15. | 0 .0819428 125.1301 1 |
+-----------------------------------+
(Stata)
0
.
0
0
0
.
2
5
0
.
5
0
0
.
7
5
1
.
0
0
0 50 100 150 200
analysis time
drug = 0 drug = 1
Kaplan-Meier survival estimates
(Stata)
. streg drug, d(w) nohr
failure _d: fail
analysis time _t: tte
Weibull regression -- log relative-hazard form
No. of subjects = 400 Number of obs = 400
No. of failures = 361
Time at risk = 25170.03819
LR chi2(1) = 53.70
Log likelihood = -757.69677 Prob > chi2 = 0.0000
--------------------------------------------------------------------
_t | Coef. SE z P>|z| [95% CI]
-------------+------------------------------------------------------
drug | -0.788 0.107 -7.34 0.000 -0.998 -0.577
_cons | -2.458 0.143 -17.19 0.000 -2.738 -2.177
-------------+------------------------------------------------------
delta | 0.706 0.031 0.648 0.768
--------------------------------------------------------------------
.
(Stata)
Generating Multivariate Normal
Random Numbers
In Stata , gennorm (webseek to download):
Typing
. gennorm a b c, corr(.2 .3 .4)
creates a, b, and c with value draw from a N(0,S) distribution where
+- -+
| 1 |
S = | .2 1 |
| .3 .4 1 |
+- -+
That is, corr(a,b)=.2, corr(a,c)=.3, and corr(b,c)=.4
CONTINUED NEXT PAGE
(Stata)
Generating Multivariate Normal
Random Numbers
In Stata:
Example
-------
. set obs 10000
obs was 0, now 10000
. set seed 6819
. gennorm a b c, corr(.2 .3 .4)
. summarize a b c

Variable | Obs Mean Std. Dev. Min Max
-------------+----------------------------------------------------------------------------
a | 10000 -.0105333 1.005723 -3.694448 3.775433
b | 10000 -.0042212 1.000254 -3.695302 3.648826
c | 10000 -.0069625 .9989002 -3.996779 3.606923
. corr a b c
(obs=10000)
| a b c
-------------+---------------------------------
a | 1.0000
b | 0.2137 1.0000
c | 0.3035 0.3952 1.0000

(Stata)
Generating Multivariate Normal
Random Numbers
In Stata, drawnorm:
. clear
. matrix C=(1, 0.2, 0.3 \ 0.2, 1, 0.4 \ 0.3, 0.4, 1)
. drawnorm a b c, n(10000) corr(C)
(obs 10000)
. summarize a b c
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
a | 10000 -.0176275 .9920181 -3.701594 3.7838
b | 10000 .0009005 1.003002 -3.709259 3.518793
c | 10000 -.0149926 .9925292 -3.716346 4.009713
. corr a b c
(obs=10000)
| a b c
-------------+---------------------------
a | 1.0000
b | 0.1937 1.0000
c | 0.3051 0.4056 1.0000
(Stata)
( )
Survival Time: ( ) exp , exp( )
-2.30 0.3* - 0.08* *
Inverse Prob Transform: ?????????
How do you solve for ? (Not all answers are in the book.)
S P t
x drug drug t
t


+
x
Simulating Weibull Regression Data, with
Time-Dependency in Drug Effect
( )
*
0
1 0
0
( ) exp exp(-2.30 0.3* - 0.004* * )
( ) 0
( ) ( )
( )
( )
( )
f t drug drug t t P
f t
f t d f t
f t
d
f t
t t
f t

Remember Newtons Method?


t
0
t
1

clear
set obs 400
gen drug=_n>200
gen double P=uniform()
gen double t=1
gen double tpd=t+.0001
gen double f=exp(-exp(-2.30+0.3*drug-0.004*drug*t)*(t^0.67))-P
gen double fp=exp(-exp(-2.30+0.3*drug-0.004*drug*tpd)*(tpd^0.67))-P
gen double slope=(fp-f)/0.0001
forvalues i=1/50 {
qui replace f=exp(-exp(-2.30+0.3*drug-0.004*drug*t)*(t^0.67))-P
qui replace fp=exp(-exp(-2.30+0.3*drug-0.004*drug*tpd)*(tpd^0.67))-P
qui replace slope=(fp-f)/0.0001
qui replace t=t-f/slope
qui replace tpd=t+.0001
}
(Stata)
Matlab
>> drug=[zeros(1000,1);ones(1000,1)];
>> P=rand(2000,1);
>> cdf0=exp(-1.0986+0.6931*drug)./(1+exp(-1.0986+0.6931*drug));
>> outcome=P<=cdf0;
>> b = glmfit(drug,outcome,'binomial')
b =
-1.0616
0.6562
Matlab
0
.
0
0
0
.
2
5
0
.
5
0
0
.
7
5
1
.
0
0
0 50 100 150 200
analysis time
drug = 0 drug = 1
Kaplan-Meier survival estimates
(Stata)
. gen P=uniform()
. gen cdf0=exp(-1.0986+0.6931*drug)/(1+exp(-
1.0986+0.6931*drug))
. list in 1/10
+----------------------------+
| drug P cdf0 |
|----------------------------|
1. | 0 .2865897 .2500023 |
2. | 0 .3788754 .2500023 |
3. | 1 .3597057 .3999916 |
4. | 1 .7182508 .3999916 |
5. | 1 .4315197 .3999916 |
|----------------------------|
6. | 1 .2963237 .3999916 |
7. | 1 .7961193 .3999916 |
8. | 0 .056983 .2500023 |
9. | 0 .4622037 .2500023 |
10. | 0 .5336403 .2500023 |
+----------------------------+
(Stata)
. gen outcome=P<=cdf0
. list in 1/10
+--------------------------------------+
| drug P cdf0 outcome |
|--------------------------------------|
1. | 0 .2865897 .2500023 0 |
2. | 0 .3788754 .2500023 0 |
3. | 1 .3597057 .3999916 1 |
4. | 1 .7182508 .3999916 0 |
5. | 1 .4315197 .3999916 0 |
|--------------------------------------|
6. | 1 .2963237 .3999916 1 |
7. | 1 .7961193 .3999916 0 |
8. | 0 .056983 .2500023 1 |
9. | 0 .4622037 .2500023 0 |
10. | 0 .5336403 .2500023 0 |
+--------------------------------------+
(Stata)
. gen outcome=P<=cdf0
. logistic outcome drug
Logistic regression Number of obs = 2000
LR chi2(1) = 58.65
Prob > chi2 = 0.0000
Log likelihood = -1245.3138 Pseudo R2 = 0.0230
--------------------------------------------------------------------
outcome | OR SE z P>|z| [95% CI]
-------------+------------------------------------------------------
drug | 2.084 0.202 7.57 0.000 1.723 2.519
--------------------------------------------------------------------
_cons | -1.077 0.073 -14.83 0.000 -1.220 -0.935
--------------------------------------------------------------------
.
(Stata)

You might also like