14 RandomNumber

BSTA 670 Statistical Computing
27 October 2010
Lecture 14:
Random Number Generation
Presented by:
Paul Wileyto, Ph.D.
Anyone who uses software to produce random
numbers is in a state of sin.
John von Neumann
A good analog
random number
generator.
Why do we need Random
Numbers?
Simulation Input
Statistical Sampling
Assignment in Trials
Games
Where do you get random numbers?
Uniform Random Numbers
Published Tables
Make Them
Computer Algorithms
Harvest from Nature
Random draws from a specific

distribution
Make them from Uniform Random

Numbers
Software Random Number Generators
There are no true random number

generators but
There are Pseudo-Random Number

Generators
Computers have only a limited number of

bits to represent a number
Sooner or later, the sequence of random

numbers will repeat itself (period of the
generator)
The trick is to be good enough to look like

random numbers
Algorithms for Uniform Random Numbers
Good pseudo-random numbers:
Independent of the previous number
Long period
Sequence reproducible if started with

same initial conditions
Fast
Good pseudo-random numbers:
Equal probability for any number inside

interval [a,b]
Probability Density:
1
,
( )
0, ,
a x b
f x
b a
x a x b

'
< >
We are interested primarily in

uniform random numbers in the
interval [0,1].
Well refer to the realization of a uniform

random number over [0,1] as U.
Many of the algorithms produce integer

valued random numbers over interval
[0,b].
Transform to interval [0,1]

Linear Congruential Generator (LCG)
-1
Most common
( ) mod
= seed, modulus m (large prime),
muliplier , and increment c
Repeats due to the modular
arithmetic that forces wrapping
of values into th
n n
o
X X c m
X

+
e desired range.
Mod in SAS
proc iml; /* begin IML session */
q={20,30,40,50,70,90,160};
t=mod(q,7);
qt=q||t;
print qt; /* print matrix */
quit;
qt
20 6
30 2
40 5
50 1
70 0
90 6
160 6
SAS
Linear Congruential Generator (LCG)
-1
Most common
( ) mod
= seed, modulus m (large prime),
muliplier , and increment c
Repeats due to the modular
arithmetic that forces wrapping
of values into th
n n
o
X X c m
X

+
e desired range.
Mod in R
q<-matrix(seq(10,100, by=10),10,1)
qm=q%%13
qall<-cbind(q,qm)
qall
[,1] [,2]
[1,] 10 10
[2,] 20 7
[3,] 30 4
[4,] 40 1
[5,] 50 11
[6,] 60 8
[7,] 70 5
[8,] 80 2
[9,] 90 12
[10,] 100 9
R
seed = 123456;
c = j(5,1,seed);
b = uniform(c);
print b;
quit;
b
0.73902
0.2724794
0.7095326
0.3191636
0.367853
Unit Random Variates in SAS
SAS
RNGkind()
[1] "Mersenne-Twister" "Inversion"
set.seed(as.integer(format(Sys.time(), "%S%M%H")))
c<-matrix(runif(5),5,1)
c
[,1]
[1,] 0.9919911
[2,] 0.2598466
[3,] 0.1818524
[4,] 0.3357782
[5,] 0.2754353
Unit Random Variates in R
R
seed = 0;
c = j(5,1,seed);
b = uniform(c);
print b;
quit;
b
0.73902
0.2724794
0.7095326
0.3191636
0.367853
Unit Random Variates in SAS
Set seed to 0 to
grab a seed value
from the system
clock.
SAS
RANUNI() and IML UNIFORM() use a multiplicative
linear congruential generator (from SAS docs) where
SEED = mod( SEED * 397204094, 2**31-1 )
and then returns
SEED / (2**31-1)
SAS
Testing Randomness
Is it Uniform?
0 0.2 0.4 0.6 0.8 1
0
50
100
150
200
250
300

0 0.2 0.4 0.6 0.8 1
0
0.5
1
1.5
2
2.5
3
x 10
4
Testing Randomness
Generate two sets

and plot against
each other
Might see
correlation in higher
dimensions
Plot X
i
versus X
i+k
for
serial correlation
0
.
2
.
4
.
6
.
8
1
x
0 .2 .4 .6 .8 1
y
0
.
2
.
4
.
6
.
8
1
x
1
0 .2 .4 .6 .8 1
y2
Linear Congruential Generator
The good
Fast
Up to period of m random numbers
The Bad
Sequential correlation
Plots in more than 1 dimension do not fill in the

space uniformly, but tend to form bands
Not cryptographically secure
Selections of m, , and c are important

Linear Congruential Generator
Good magic number for linear

congruent method:
a = 16,807, c = 0, M = 2,147,483,647
Overflow Method for integers
Multiply two 32-bit numbers to get a 64

bit integer, that cannot be represented
in 32-bit space.
Low order 32 bits remain after the

overflow.
Divide by 2
32
to get floating point values
between 0 and 1.
Very Fast
1 j j
I aI c
+
+
Blum, Blum, Shub
Very slow
Not suited to simulation
Passes all tests
Cryptographically secure
( )
2
1
mod , ,
where p and q are large primes
n n
X X M M pq
+

Mersenne Twister
By Matsumoto and Nishimura (1997)
Caused a great deal of excitement in

1997.
Good statistical properties
Not good for cryptography
SAS IML RANDGEN function
Default technique for R runif()

Mersenne Twister
Im just going to give you the flavor of it
Its a bit shifting algorithm

32 bit word:
0 0 0 0 1 1 1 1
Mersenne Twister
XOR
Logical bitwise comparison

function
Compares two bits
If they are different, value is

1
If they are the same, value is
zero
>> a=[0 0 1 1]
a =
0 0 1 1
>> b=[0 1 1 0]
b =
0 1 1 0
>> c=xor(a,b)
c =
0 1 0 1
>>
MATLAB
Mersenne Twister
XOR
Logical bitwise comparison

function
Compares two bits
If they are different, value is

1
If they are the same, value is
zero
> a<-c(0, 0, 1, 1)
> b<-c(0, 1, 1, 0)
> c<-xor(a,b)
> c
[1] FALSE TRUE FALSE TRUE
> as.integer(c)
[1] 0 1 0 1
R
Mersenne Twister
Bit shifting algorithm

Use XOR function to flip values
32 bit word:
0 0 0 1 1 1 1 1
XOR
Mersenne Twister
Use 624 32 bit words to make one

19937 bit word (623*32 + 1)
XOR flip function in each 32-bit word

32 bit word:
0 0 0 0 1 1 1 1
XOR
To
next
word
From
last
word
Mersenne
Twister
From:
John Savards
Cryptology
Page
http://www.quadibloc.com
Mersenne Twister
By Matsumoto and Nishimura (1997)
Mersenne Prime Numbers (powers of 2

1) give period length: 2
19937-1
for 32 bit
numbers
Free C source code
Fast
Passes all randomness smell tests
Not cryptographically secure

r = j(10,1,.);
call randgen(r,'uniform');
print r;
quit;
r

0.0151013
0.5743561
0.5829185
0.6437729
0.1823678
0.3977417
0.476881
0.9845982
0.3211301
0.9623223
SAS
> RNGkind()
[1] "Mersenne-Twister" "Inversion"
> r=matrix(runif(10), 10,1)
> r
[,1]
[1,] 0.14645262
[2,] 0.04558767
[3,] 0.79254901
[4,] 0.57810786
[5,] 0.57831079
[6,] 0.30258424
[7,] 0.08682622
[8,] 0.77980499
[9,] 0.34161593
[10,] 0.98705945
R
Both R and SAS automatically grab a seed value
from the system clock at first use, unless you call
set.seed (in R) or randseed (in SAS) to set a specific
starting point
Grabbing a Seed from the
System Clock (SAS)
SAS
call randseed(12345);
r = j(10,1,.);
print r;
quit;
r

0.5832971
0.9936254
0.5878877
0.8574689
0.8246889
0.2805668
0.6473969
0.3819192
0.4489572
0.8757847
SAS
> set.seed(12345)
> r
[,1]
[1,] 0.7209039
[2,] 0.8757732
[3,] 0.7609823
[4,] 0.8861246
[5,] 0.4564810
[6,] 0.1663718
[7,] 0.3250954
[8,] 0.5092243
[9,] 0.7277053
[10,] 0.9897369
R
Obtaining Random Numbers from
Specific Distributions
Inverse Probability Transform

Methods
Rejection Methods
Mixed Rejection and Transform
Methods for Correlated Random

Numbers
Obtaining Random Numbers from
Inverse Probability Transform methods
Let X be a random variable described by CDF F(X)
We wish to generate values of X distributed

according to F(X).
Given a continuous Uniform Random Variable U, in

[0,1], the Random Variable X=F
-1
(U).
{ }
1
( ) inf | ( ) , 0 1 F u x F x u u
< <
What this means is:
Solve for X in the CDF, so that when

you plug in U for F, you get a random
number from that specific distribution.
Example: Exponential Distribution
- x
(0, )
1
f(x)= e ( ) , ( ) 1 (0, )
1
log(1 )
log(1 )
, ( )
log(1 )
( ).
x
x
x
x
I x F x e I
Let y e
y
Solving for x x
y
So F y
U
which means that is an rv distributed
as Exponential

u = j(1000,1,.);
call randgen(u,'uniform');
exrand=-log(1-u)/.04;
tbl=u||exrand;
print tbl;
varnames={"u","erand"};
create erand from tbl [colname=varnames];
append from tbl;
quit;
proc means data=erand;
var u erand;
run;

title 'Analysis Exponential RVs';
proc univariate data=erand noprint;
histogram erand / midpoints=5 to 205 by 10 exp;
run;
tbl
0.115794 3.0766303
0.754043 35.064963
0.157732 4.2914261
0.0431113 1.1017045
0.1086405 2.8751851
0.2632565 7.6378865
0.9448316 72.434124
0.3589581 11.116512
0.7109185 31.026164
0.4665676 15.710572
The MEANS Procedure
Variable N Mean Std DevMinimum
Maximu
m
u 1000 0.4819 0.2877 0.0010 0.9996
erand 1000 23.4993 23.7694 0.0240
194.533
0
>r=matrix(runif(10000), 10000,1)
>exrand=-log(1-r)/.04
>hist(exrand, freq = FALSE)
> mean(exrand)
[1] 24.55222
> 1/mean(exrand)
[1] 0.04072951
> hist(exrand, freq = FALSE)> help.search("means")
R
Histogram of exrand
exrand
D
e
n
s
i
t
y
0 50 100 150 200 250
0
.
0
0
0
0
.
0
1
0
0
.
0
2
0
R
( )
( )
1
Survival Time: ( ) exp
ln
Inverse Prob Transform:
1.5, 0.001
S U t
U
t
1
]

Weibull Survival
SAS
u = j(2000,1,.);
wrand=(-log(1-u)/.001)##(1/1.5);
tbl=u||wrand;
print tbl;
varnames={"u","weibrand"};
create wrand from tbl [colname=varnames];
append from tbl;
Quit;
proc means data=wrand;
var u weibrand;
run;
title 'Analysis of Weibull RVs';
proc univariate data=wrand noprint;
histogram weibrand / midpoints=5 to 205 by 10 weibull;
run;
SAS
SAS
( )
( )
1
Survival Time: ( ) exp
ln
1.5, 0.001
S U t
U
t
1
]

Weibull Survival
R
> r=matrix(runif(10000), 10000,1)
> wrand=(-log(1-r)/.001)^(1/1.5)
> hist(wrand, freq = FALSE, main = paste("Histogram of
Survival Times"), breaks=50, xlab = "Survival Time")
R
R
proc lifetest data=Work.Wrand method=pl OUTSURV=work._surv;

time WEIBRAND * CENS (0);

run; quit;

goptions reset=all device=WIN;

data work._surv; set work._surv;

if survival > 0 then _lsurv = -log(survival);

if _lsurv > 0 then _llsurv = log(_lsurv);

run;

** Survival plots **;

goptions reset=symbol;

goptions ftext=SWISS ctext=BLACK htext=1 cells;

proc gplot data=work._surv ;

label weibrand = 'Survival Time';

axis2 minor=none major=(number=6)

label=(angle=90 'Survival Distribution Function');

symbol1 i=stepj c=BLUE l=1 width=1;

plot survival * weibrand=1 /

description="SDF of weibrand"

frame cframe=CXF7E1C2 caxis=BLACK vaxis=axis2 hminor=0 name='SDF';

run;

symbol1 i=join c=BLUE l=1 width=1;

quit;

goptions ftext= ctext= htext= reset=symbol;

SAS
SAS
> r=matrix(runif(1000), 1000,1)
> wrand=(-log(1-r)/.001)^(1/1.5)
> event=wrand<=200
> wrand2=wrand*(event)+200*(1-event)
> fit <- survfit(Surv(wrand2, event) ~ 1, data = aml)
> plot(fit, lty = 2:3,xlab = "Days", ylab="Survival")
>
R
R
( )
( )
1
0
1
Survival Time: ( ) exp , exp( )
ln
1.5, exp( )
ln(0.001) 2.30
( ) 0.5, 0.69
S P t
P
t
HR Drug
1
]

x
x
Simulating Weibull Regression Data, with
Proportional Hazards
SAS
u = j(400,1,.);
d = j(200,1,0) // j(200,1,1);
wrand=(-log(1-u)/exp(log(.001)-0.69*d))##(1/1.5);
c = wrand<=200;
wrand=wrand##c + 200*(1-c);
tbl=u || wrand || d || c ;
print tbl;
varnames={"u","weibrand","treat", "cens"};
create wrand from tbl [colname=varnames];
append from tbl;
quit;
Data, drug treatment, 0,1)
Constant Value, drug effect * drug
SAS
options pageno=1;
proc lifetest data=Work.Wrand method=pl OUTSURV=work._surv;

time WEIBRAND * CENS (0); strata TREAT;

run; quit;
goptions reset=all device=WIN;

data work._surv; set work._surv;

if survival > 0 then _lsurv = -log(survival);

if _lsurv > 0 then _llsurv = log(_lsurv);

run;
** Survival plots **;
title;
footnote;
goptions reset=symbol;
goptions ftext=SWISS ctext=BLACK htext=1 cells;

proc gplot data=work._surv ;

label weibrand = 'Survival Time';

axis2 minor=none major=(number=6)

label=(angle=90 'Survival Distribution Function');

symbol1 i=stepj l=1 width=1; symbol2 i=stepj l=2 width=1; symbol3 i=stepj l=3 width=1;

plot survival * weibrand = treat /

description="SDF of weibrand by treat"

frame cframe=CXF7E1C2 caxis=BLACK vaxis=axis2 hminor=0 name='SDF';

run;
symbol1 i=join l=1 width=1; symbol2 i=join l=2 width=1; symbol3 i=join l=3 width=1;

quit;
SAS
SAS
Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq

Likelihood Ratio 202.5356 1 <.0001

Score 209.0491 1 <.0001

Wald 201.2807 1 <.0001

Analysis of Maximum Likelihood Estimates

Parameter Standard Hazard

Parameter DF Estimate Error Chi-Square Pr > ChiSq Ratio

treat 1 -0.70413 0.04963 201.2807 <.0001 0.495
*** Proportional Hazards Models *** ;

options pageno=1;

proc phreg data=Work.Wrand;

model WEIBRAND * CENS (0) = TREAT;

run; quit;
SAS
( )
( )
1
0
1
ln
1.5, exp( )
ln(0.001) 2.30
( ) 0.5, 0.69
S P t
P
t
HR Drug
1
]

x
x
Proportional Hazards
R
r=matrix(runif(400), 400,1)
drug=rbind(matrix(1,200,1),matrix(0,200,1))
wrand=(-log(1-r)/exp(log(.001)-0.69*drug))^(1/1.5)
event = wrand<=200;
wrand=wrand*event + 200*(1-event)
survreg(Surv(wrand, event) ~ drug, dist='weibull', model=TRUE, scale=1)
Call:
survreg(formula = Surv(wrand, event) ~ drug, dist = "weibull",
scale = 1, model = TRUE)
Coefficients:
(Intercept) drug
4.6042014 0.5978962
Scale fixed at 1
Loglik(model)= -1918.1 Loglik(intercept only)= -1932.6
Chisq= 29.06 on 1 degrees of freedom, p= 7e-08
n= 400
lsurv2 <- survfit(Surv(wrand, event) ~ drug, aml, type='fleming')
plot(lsurv2, lty=2:3,xlab = "Days", ylab="Survival")
Data, drug treatment (0,1)
Constant Value
Drug effect * drug
R
Package survival
R
Call:
phreg(formula = Surv(enter, wrand, event) ~ drug)
Covariate W.mean Coef Exp(Coef) se(Coef) Wald p
drug 0.586 -0.731 0.481 0.113 0.000
log(scale) 4.663 105.903 0.050 0.000
log(shape) 0.402 1.495 0.047 0.000
Events 327
Total time at risk 44359
Max. log. likelihood -1886.9
LR test statistic 42.4
Degrees of freedom 1
Overall p-value 7.38224e-11
> enter=matrix(0,400,1)
> fit <- phreg(Surv(enter, wrand, event) ~ drug)
> fit
> plot.phreg(fn="sur)
R
Package eha
R
Generating Numbers from
Rejection Method
Fast
Good for Count Models
Good when you cannot find F

-1
, but have
f(x)
Generally Use Pairs of Random Numbers
Just like playing the game Battleship

The Rejection Method is Like Playing the Game Battleship
Rejection
Choose pairs of uniform random

numbers
x
U
between X
min
and X
max
y
U
between Y
min
and Y
max
Reject x
U
if y
U
> f(x) at x
U

Rejection
X
min
X
max
Y
max
Y
min
f(x)
Hit
Miss
Miss
Sample the area (two dimensions) containing the
probability distribution or density function uniformly.
Rejection
Simple version becomes inefficient if

the rejection area is large.
Large Dead Zone
g(x)
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
0
0.05
0.1
0.15
0.2
0.25
239 In, 761 Rejected
Binomial Count:
p=0.2
Trials=20
Matlab
Rejection
Can be made more efficient by uniform

sampling over a smaller target area.
g(x)
Smaller
Dead Zone
The trick is to sample uniformly over
the smaller area.
Rejection

g(x)
f(x)
First, define "dominating function" ( ),
and corresponding integral or Cumulative
Distribution ( ).
( ) need not be normalized.
f x
F x
F x
Smaller
Dead Zone
Rejection

g(x)
f(x)
Smaller
Dead Zone
2
2
( ) , 0
( )
2
( )
2
a
b
a
b
Max
f x a bx x
b
F x ax x
a
F
b
a
b
x

Rejection
Choose x
U
based on inverse transform of the
integrated dominance function (F(x)).
Choose a uniform random number U1 in the range:
Calculate x by setting F(x)=U, and solving (the quadratic) for x.
g(x)
f(x)
x
U
2
0 1
2
a
U
b

Rejection
Evaluate f(x), choose a second uniform

random number U2 between 0 and f(x).
Reject if U2 >f(x)
g(x)
f(x)
x
U
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
0
0.05
0.1
0.15
0.2
0.25
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
0
0.05
0.1
0.15
0.2
0.25
Weibull Function?
( )
( )
( ) ( )
1
1
1
( )
( ) exp 2
( ) 1 exp
ln , 6.5, 1.8
u
x
x
f x times
x
F x
F x u

1

1
]
1

1
]

-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
0
0.05
0.1
0.15
0.2
0.25
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
-5 0 5 10 15 20 25
0
0.05
0.1
0.15
0.2
0.25
Count
F
r
e
q
u
e
n
c
y
0
0.05
0.1
0.15
0.2
0.25
Binomial Distribution
(Bernoulli Trials are the simplest
example of the rejection method.)
Probability Pr(X=1): p
>>proc iml; /* begin IML session */
r = j(10,1,.);
b=r>0.5;
print b;
quit;
b
1
0
1
1
0
0
0
0
0
1
SAS
> b=r<=0.5
> cbind(r,b)
[,1] [,2]
[1,] 0.4919652 1
[2,] 0.5088624 0
[3,] 0.5955355 0
[4,] 0.5243394 0
[5,] 0.5923056 0
[6,] 0.1610980 1
[7,] 0.9663659 0
[8,] 0.2548106 1
[9,] 0.4582953 1
[10,] 0.1170421 1
>
R
Simulating Outcomes from a
Logistic Model
But then, you never have just one

value of p for your Bernoulli Trials
Placebo Controlled Drug Trial
25% Success for Placebo
Odds Ratio of 2.0 for Treatment
Two different success probabilities,

based on logistic model
Logistic Model
( )
( )
( )
( )
( )
0
0
0
exp
Placebo: 0.25 , 1.0986
1 exp
Drug (0,1): OR=2.0, ln(OR)=0.6931
exp 1.0986 0.6931*
CDF
1 exp 1.0986 0.6931*
Drug
Success
Drug

+
+
+ +
u = j(400,1,.);
d = j(400,1,1)||(j(200,1,0) // j(200,1,1));
bta= {-1.0986 , 0.6931};
expit=exp(d*bta)/(1+exp(d*bta));
outcome=u<=expit;
tbl=u || d || expit || outcome ;
varnames={"u","const","treat", "expit","outcome"};
append from tbl;
quit;

proc logistic data=Work.Erand DESCEND;

model OUTCOME = TREAT;

run;
SAS
1
1
0
0
1
1
d:
1
2
bta
bta
1
1
]
SAS
The LOGISTIC Procedure
Analysis of Maximum Likelihood Estimates
Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 -1.2367 0.1693 53.3383 <.0001
treat 1 0.5510 0.2261 5.9402 0.0148
Odds Ratio Estimates
Point 95% Wald
Effect Estimate Confidence Limits
treat 1.735 1.114 2.702
r=matrix(runif(400), 400,1)
drug=rbind(matrix(0,200,1),matrix(1,200,1))
d=cbind(matrix(1,400,1), drug)
parms=matrix(c(-1.0986 , 0.6931),2,1)
expit=exp(d%*%parms)/(1+exp(d%*%parms))
outcome=r<=expit

R
1
1
0
0
1
1
d:
1
2
parm
parm
1
1
]
R
> drugtrial<-glm(outcome~drug, family = binomial(link="logit"))
> summary(drugtrial)
Call:
glm(formula = outcome ~ drug, family = binomial(link = "logit"))
Deviance Residuals:
Min 1Q Median 3Q Max
-1.036 -1.036 -0.776 1.326 1.641
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.0460 0.1612 -6.488 8.68e-11 ***
drug 0.7026 0.2158 3.255 0.00113 **
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 511.49 on 399 degrees of freedom
Residual deviance: 500.67 on 398 degrees of freedom
AIC: 504.67
Number of Fisher Scoring iterations: 4
Normally Distributed Random Numbers
Inverse transform methods inefficient

for normal random numbers
Box-Muller Transform
z transformation of two random uniform

variates [X
1
,X
2
~U(0,1)]
Random radius, random

1 2
1 1 2
2 1 2
Get two z variates from two uniform
random numbers, and :
cos( ) 2ln( ) cos(2 )
sin( ) 2ln( ) sin(2 )
X X
z r X X
z r X X

Normally Distributed Random Numbers
Box-Muller Transform
X
1
, X
2
, specify a position within the unit circle
Random angle, random radius
Would be more efficient if it did not make calls to

trigonometric functions.
Marsaglia Method
Places the Unit Circle within a square, -1 to +1,

and samples the square uniformly.
Rejects draws that fall outside the circle.
But it avoids calls to trig functions.
2 2
1 2
1 1 2 2
1
2ln( ) 2ln( )
,
s X X
s s
z X z X
s s
+

Generating Numbers from
Normal, Using CLT (quick & dirty)
Sum several iterations of u
Standardize
Recall that Var(u)=1/12
12
1
6
i
i
X u
Correlated Multivariate Random Numbers
Simulating panel data, repeated

measures
Mixture distributions
Generating Multivariate Normal
Random Numbers
Desired Covariance Matrix
, is the Cholesky Decomposition of
Begin with independent standard normal RVs (0,1)
Correlated (Multivariate) Normal RVs:
N
+
V
V = R'R R V
Z
X = R'Z
:
Random Numbers
rmat={1 .3 .2 .1, .3 1 .3 .2, .2 .3 1 .3 , .1 .2 .3 1};
sigvec={53 36 12 47};
cvmat=rmat#(sigvec`*sigvec);
upr=half(cvmat);
print rmat;
print sigvec;
print cvmat;
print upr;
r1 = j(1000,4,.);
r2 = j(1000,4,.);
call randgen(r1,'uniform');
pi= 4*atan(1);
print pi;
/* Lets be wasteful */
z1=sqrt(-2*log(r1))#cos(2*pi*r2);
z1=z1*upr;
varnames={"x1","x2","x3","x4"};
create nrand from z1 [colname=varnames];
append from z1;
quit;
proc corr data=work.nrand pearson;
var x1 x2 x3 x4;
run;
SAS
rmat
1 0.3 0.2 0.1
0.3 1 0.3 0.2
0.2 0.3 1 0.3
0.1 0.2 0.3 1
sigvec
53 36 12 47
cvmat
2809 572.4 127.2 249.1
572.4 1296 129.6 338.4
127.2 129.6 144 169.2
249.1 338.4 169.2 2209
upr
53 10.8 2.4 4.7
0 34.341811 3.0190603 8.3757958
0 0 11.36333 11.672016
0 0 0 44.503035
SAS
The CORR Procedure
4 Variables: x1 x2 x3 x4
Simple Statistics
Variable N Mean Std Dev Sum Minimum Maximum
x1 1000 -0.86090 51.78291 -860.89502 -167.70178 157.51299
x2 1000 -0.21592 36.41244 -215.92386 -122.58068 120.05335
x3 1000 -0.06176 11.60953 -61.75755 -37.09589 43.83908
x4 1000 0.46483 46.63351 464.82762 -152.65527 143.41509
Pearson Correlation Coefficients, N = 1000
Prob > |r| under H0: Rho=0
x1 x2 x3 x4
x1 1.00000 0.30338 0.20341 0.11397
<.0001 <.0001 0.0003
x2 0.30338 1.00000 0.28186 0.24150
<.0001 <.0001 <.0001
x3 0.20341 0.28186 1.00000 0.34421
<.0001 <.0001 <.0001
x4 0.11397 0.24150 0.34421 1.00000
0.0003 <.0001 <.0001
SAS
Random Numbers
cmat<-rbind(c(1, .3, .2, .1), c(.3, 1, .3, .2), c(.2, .3, 1, .3) , c(.1, .2, .3, 1))
sigvec=c(53, 36, 12, 47)
vv=cmat*(sigvec%*%t(sigvec))
rr=chol(vv)
r1=matrix(runif(1000), 250,4)
z1=sqrt(-2*log(r1))*cos(2*pi*r2)
rvs=z1%*%rr
R
> cmat
[,1] [,2] [,3] [,4]
[1,] 1.0 0.3 0.2 0.1
[2,] 0.3 1.0 0.3 0.2
[3,] 0.2 0.3 1.0 0.3
[4,] 0.1 0.2 0.3 1.0
> vv
[,1] [,2] [,3] [,4]
[1,] 2809.0 572.4 127.2 249.1
[2,] 572.4 1296.0 129.6 338.4
[3,] 127.2 129.6 144.0 169.2
[4,] 249.1 338.4 169.2 2209.0
R
> rr
[,1] [,2] [,3] [,4]
[1,] 53 10.80000 2.400000 4.700000
[2,] 0 34.34181 3.019060 8.375796
[3,] 0 0.00000 11.363330 11.672016
[4,] 0 0.00000 0.000000 44.503035
> cov(rvs)
[,1] [,2] [,3] [,4]
[1,] 2832.4200 561.2585 134.0656 533.7351
[2,] 561.2585 1235.7616 124.2373 382.5441
[3,] 134.0656 124.2373 127.4132 160.2173
[4,] 533.7351 382.5441 160.2173 2205.5903
> cor(rvs)
[,1] [,2] [,3] [,4]
[1,] 1.0000000 0.2999969 0.2231676 0.2135426
[2,] 0.2999969 1.0000000 0.3130961 0.2317137
[3,] 0.2231676 0.3130961 1.0000000 0.3022317
[4,] 0.2135426 0.2317137 0.3022317 1.0000000
> sd(rvs)
[1] 53.22048 35.15340 11.28774 46.96371
Subject-specific Random Effects
We have an error term (e
ij
) for
measurement j in subject i.
We also have a subject specific random

effect (k
i
)
For the subject in the measurement:
th th
ij ij i
i j
y x e k + +
Recipe for Subject-specific Random Effects
Create subjects for study
Assign treatment, covariates
Give each subject a random effect
Drawn from, say, N(0,V)
Generate predicted values based on

regression + random effects
Generate outcomes for each repeated

measure from specific distribution
Logistic Model
( )
( )
( )
( )
0
0
0
i
exp
Placebo: 0.25 , 1.0986
1 exp
Drug: OR=2.0, ln(OR)=0.6931
Time (0,1,2): OR 1.5, ln(OR)=0.4055
K N(0,1)
exp 1.0986 0.6931* 0.4055*
CDF
1 exp 1.0986 0.6931* 0.4055*
i
Drug Time K
Success
Drug Time

+
+ + +
+ + +
:
( )
i
K +
u = j(600,1,.);
d1=j(100,1,0)//j(100,1,1);
d1=d1//d1//d1;
id=j(200,1,0);
do i=1 to 200 by 1;
id[i,1]=i;
end;
id=id//id//id;
t=j(200,1,0)//j(200,1,1)//j(200,1,2);
k=j(200,1,.);
call randgen(k,'normal');
k=k//k//k;
bta= {-1.0986 , 0.6931,.4055,1};
d = j(600,1,1)||d1||t||k;
expit=exp(d*bta)/(1+exp(d*bta));
y=u<=expit;
tbl=id||u || d || expit || y ;
varnames={"id","u","const","treat","t","k", "expit","outcome"};
append from tbl;
quit;
1
0
1
d:
1
2
1
bta
bta
1
1
1
1
]
0
1
0
1
ID
id:
ID
ID
k
k
k
SAS
. xtlogit outcome treat t, i(id)
Random-effects logistic regression Number of obs = 600
Group variable: id Number of groups = 200
Random effects u_i ~ Gaussian Obs per group: min = 3
avg = 3.0
max = 3
Wald chi2(2) = 11.07
Log likelihood = -394.754 Prob > chi2 = 0.0040
------------------------------------------------------------------------------
outcome | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
treat | .6023501 .2279107 2.64 0.008 .1556533 1.049047
t | .235297 .1123071 2.10 0.036 .015179 .4554149
_cons | -.9334699 .2040373 -4.57 0.000 -1.333376 -.5335642
-------------+----------------------------------------------------------------
/lnsig2u | -.1281394 .3971684 -.9065751 .6502964
-------------+----------------------------------------------------------------
sigma_u | .9379396 .18626 .6355353 1.384236
rho | .2109869 .0661172 .1093476 .3680594
------------------------------------------------------------------------------
Likelihood-ratio test of rho=0: chibar2(01) = 13.01 Prob >= chibar2 = 0.000
. di -394.754*(-2)
789.508
Stata, SAS data
Finally got this to run in SAS. I had forgotten that SAS requires you to sort.
Stata does not require sorted data for their mixed models.
proc sort data=erand;
by id t;
run;

proc nlmixed data=erand qpoints=5 ;
parms b0=0 b1=-.7 b2=.6 sig=0 ;
theta2 = b0+b1*treat+b2*t+u;
prb= exp(theta2)/(1+exp(theta2));
model outcome ~ binary(prb);
random u ~normal(0,sig) subject=id ;
run;
SAS
The NLMIXED Procedure
Fit Statistics
-2 Log Likelihood 789.5
AIC (smaller is better) 797.5
AICC (smaller is better) 797.6
BIC (smaller is better) 810.7
Parameter Estimates
Standard
Parameter Estimate Error DF t Value Pr > |t| Alpha Lower Upper Gradient
b0 -0.9332 0.2039 199 -4.58 <.0001 0.05 -1.3354 -0.5310 0.00039
b1 0.6021 0.2278 199 2.64 0.0089 0.05 0.1529 1.0513 -0.00007
b2 0.2353 0.1123 199 2.10 0.0374 0.05 0.01382 0.4567 0.000737
sig 0.8781 0.3480 199 2.52 0.0124 0.05 0.1917 1.5644 0.000363
We see tiny differences between this and Stata results, owing to differences in optimization
specs.
SAS
id=matrix(seq(1:200), 200,1)
k1=matrix(runif(200), 200,1)
k2=matrix(runif(200), 200,1)
k=sqrt(-2*log(k1))*cos(2*pi*k2)
id=rbind(id,id,id)
k=rbind(k,k,k)
drug =rbind(matrix(0,100,1), matrix(1,100,1))
drug=rbind(drug,drug,drug)
d=cbind(matrix(1,600,1),drug,k)
parms=matrix(c(-1.0986 , 0.6931,1),3,1)
expit=exp(d%*%parms)/(1+exp(d%*%parms))
outcome=r<=expit
1
0
1
d:
1
2
1
bta
bta
1
1
1
1
]
0
1
0
1
ID
id:
ID
ID
k
k
k
R
. xtlogit outcome drug, i(id)
Random-effects logistic regression Number of obs = 600
Group variable: id Number of groups = 200
Random effects u_i ~ Gaussian Obs per group: min = 3
avg = 3.0
max = 3
Wald chi2(1) = 14.55
------------------------------------------------------------------------------
outcome | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
drug | .8331252 .2183881 3.81 0.000 .4050924 1.261158
_cons | -1.240213 .1708219 -7.26 0.000 -1.575018 -.9054085
-------------+----------------------------------------------------------------
/lnsig2u | -.6325958 .5624138 -1.734907 .469715
-------------+----------------------------------------------------------------
sigma_u | .7288423 .2049555 .4200198 1.264729
rho | .1390212 .0673177 .050895 .3271436
------------------------------------------------------------------------------
Likelihood-ratio test of rho=0: chibar2(01) = 5.08 Prob >= chibar2 = 0.012
R
Got this to run in R, using a mixed effects package called Zelig.
z.out1 <- zelig(outcome ~ drug + tag(1 | id),data=NULL, model="logit.mixed")
Delia Bailey and Ferdinand Alimadhi. 2007. "logit.mixed: Mixed effects logistic
model" in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical
Software," http://gking.harvard.edu/zelig
summary(z.out1)
Generalized linear mixed model fit by the Laplace approximation
Formula: outcome ~ drug + tag(1 | id)
AIC BIC logLik deviance
743.4 756.6 -368.7 737.4
Random effects:
Groups Name Variance Std.Dev.
id (Intercept) 0.39486 0.62838
Number of obs: 600, groups: id, 200
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.2172 0.1511 -8.054 8.02e-16 ***
drug 0.8174 0.2023 4.041 5.32e-05 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Correlation of Fixed Effects:
(Intr)
drug -0.747
R
General Approach to Correlated
Multivariate Random Numbers
Copulas
allow us to draw correlated random

numbers from different distributions
Random effects in Mixture Models
They use CDF probabilities of correlated

variables on the inside to map to
correlated uniform random numbers on
the margins
Those correlated uniform RVs may be used

to marry vastly different distributions.
Maintain Marginal Distributions

Generating Multivariate Random
Numbers
From SAS documentation, a Gaussian Copula
Independent Normal (N(0,1) ) random variables are

generated
These variables are transformed to a correlated set of

z-scores by using the Cholesky Decomposition of the
covariance matrix.
These correlated normal RVs are transformed to a
uniform by using (z).
F
-1
() is used to compute the final sample value
Numbers
rmat={1 .3 .2 .1, .3 1 .3 .2, .2 .3 1 .3 , .1 .2 .3 1};
sigvec={1 1 1 1};
cvmat=rmat#(sigvec`*sigvec);
/* # is element-wise multiplication */
upr=half(cvmat);
print rmat;
print sigvec;
print cvmat;
print upr;
r1 = j(1000,4,.);
r2 = j(1000,4,.);
pi= 4*atan(1);
print pi;
z1=sqrt(-2*log(r1))#cos(2*pi*r2);
/* Note I could have gotten another z here */
z1=z1*upr;
z1=cdf('Normal',z1);
z1=gaminv(z1,3.0);
/* Standardized gamma parameter, also the
mean */
varnames={"x1","x2","x3","x4"};
create nrand from z1 [colname=varnames];
append from z1;
quit;
proc corr data=work.nrand pearson;
var x1 x2 x3 x4;
run;
SAS
rmat
1 0.3 0.2 0.1
0.3 1 0.3 0.2
0.2 0.3 1 0.3
0.1 0.2 0.3 1
sigvec
1 1 1 1
cvmat
1 0.3 0.2 0.1
0.3 1 0.3 0.2
0.2 0.3 1 0.3
0.1 0.2 0.3 1
SAS
The CORR Procedure
4 Variables: x1 x2 x3 x4
Simple Statistics
Variable N Mean Std Dev Sum Minimum Maximum
x1 1000 2.96320 1.73566 2963 0.11528 12.19072
x2 1000 3.01249 1.68236 3012 0.14039 10.20117
x3 1000 3.00336 1.68803 3003 0.34496 13.72023
x4 1000 3.08106 1.79858 3081 0.11148 13.25409
Pearson Correlation Coefficients, N = 1000
Prob > |r| under H0: Rho=0
x1 x2 x3 x4
x1 1.00000 0.25874 0.19005 0.10052
<.0001 <.0001 0.0015
x2 0.25874 1.00000 0.22622 0.13944
<.0001 <.0001 <.0001
x3 0.19005 0.22622 1.00000 0.32082
<.0001 <.0001 <.0001
x4 0.10052 0.13944 0.32082 1.00000
0.0015 <.0001 <.0001.
SAS
Numbers
cmat<-rbind(c(1, .4, .4, .4), c(.4, 1, .4, .4), c(.4, .4, 1, .4) , c(.4, .4, .4, 1))
rr=chol(cmat)
z1=rbind(sqrt(-2*log(r1))*cos(2*pi*r2),sqrt(-2*log(r2))*cos(2*pi*r1))
rvs=z1%*%rr
cd=pnorm(rvs,mean=0,sd=1)
g<-qinvgamma(cd,2,3)
corr(cd)
[1] 0.4150358
corr(rvs)
[1] 0.4188932
corr(g)
[1] 0.2337756
R
>> U = copularnd('Gaussian',.4,10)
U =
0.8017 0.9388
0.3650 0.2250
0.8104 0.6253
0.3467 0.0988
0.6067 0.6561
0.4743 0.6723
0.6273 0.7427
0.9905 0.8249
0.4427 0.6925
0.3443 0.2711
>> U = copularnd('Gaussian',.4,10000);
>> corr(U)
ans =
1.0000 0.3765
0.3765 1.0000
>> X = norminv(U,0,1);
>> corr(X)
ans =
1.0000 0.3901
0.3901 1.0000
>> Xg = gaminv(U,2,3);
>> corr(Xg)
ans =
1.0000 0.3721
0.3721 1.0000
>>
Matlab
(a little more clear)
Old Slides
program define seedset
local ct =c(current_time)
local s1=substr("`ct'",7,2)
global newseed=real("`s1'" +"`s2'" +"`s3'")
di $newseed
set seed $newseed
end
Grabbing a Seed from the
System Clock (Stata)
LCG is default for Stata
. set obs 100
obs was 0, now 100
. gen x0=ceil(uniform()*100)
. gen m=ceil(uniform()*10)
. gen x1=mod(x0,m)
. list in 1/10
+------------+
| x0 m x1 |
|------------|
1. | 70 2 0 |
2. | 62 7 6 |
3. | 92 7 1 |
4. | 53 1 0 |
5. | 37 3 1 |
|------------|
6. | 78 1 0 |
7. | 47 2 1 |
8. | 91 2 1 |
9. | 98 1 0 |
10. | 71 9 8 |
+------------+
Testing Randomness (Stata)
Correlogram of X
i
versus X
i+k
for serial
correlation
. gen tv=_n
. tsset tv
time variable: tv, 1 to 20000
delta: 1 unit
. corrgram x, lags(40)
-1 0 1 -1 0
1
LAG AC PAC Q Prob>Q [Autocorrelation] [Partial
Autocor]
-----------------------------------------------------------------------------
--
1 0.0026 0.0026 .13551 0.7128 | |

2 -0.0011 -0.0011 .15995 0.9231 | |

3 -0.0004 -0.0004 .16301 0.9833 | |

4 -0.0131 -0.0131 3.5987 0.4630 | |

5 -0.0008 -0.0007 3.6115 0.6066 | |

6 0.0119 0.0118 6.4238 0.3774 | |

7 -0.0060 -0.0061 7.1533 0.4131 | |

8 0.0004 0.0003 7.1571 0.5198 | |

9 0.0057 0.0057 7.815 0.5529 | |

10 -0.0049 -0.0046 8.2893 0.6006 | |

11 -0.0097 -0.0099 10.19 0.5134 | |

12 0.0044 0.0043 10.581 0.5651 | |

13 0.0087 0.0090 12.102 0.5193 | |

14 0.0081 0.0079 13.424 0.4935 | |

15 0.0068 0.0064 14.357 0.4986 | |

16 0.0068 0.0071 15.285 0.5039 | |

17 -0.0029 -0.0025 15.449 0.5632 | |

. seedset
23491
. set obs 2000
obs was 0, now 2000
. gen P=uniform()
. gen enum=-ln(P)/.04
. ci
Variable | Obs Mean Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------
P | 2000 .5030619 .0065298 .4902559 .5158678
enum | 2000 24.79822 .553316 23.71309 25.88336
(Stata)
. set obs 200
obs was 0, now 200
. gen P=uniform()
. gen tte=(-ln(P)/0.1)^1.5
. gen fail=1
. replace fail=0 if tte>200
. replace tte=200 if tte>200
. stset tte, fail(fail)
failure event: fail != 0 & fail < .
obs. time interval: (0, tte]
exit on or before: failure
-------------------------------------------------------------------
200 total obs.
0 exclusions
-------------------------------------------------------------------
200 obs. remaining, representing
188 failures in single record/single failure data
9019.163 total analysis time at risk, at risk from t = 0
earliest observed entry t = 0
last observed exit t = 200
(Stata)
0
.
0
0
0
.
2
5
0
.
5
0
0
.
7
5
1
.
0
0
0 50 100 150 200
analysis time
Kaplan-Meier survival estimate
(Stata)
. streg, d(w) nohr
failure _d: fail
analysis time _t: tte
Weibull regression -- log relative-hazard form
No. of subjects = 200 Number of obs = 200
No. of failures = 188
Time at risk = 9019.163067
LR chi2(0) = 0.00
Log likelihood = -403.39593 Prob > chi2 = .
--------------------------------------------------------------------
_t | Coef. SE z P>|z| [95% CI]
-------------+------------------------------------------------------
_cons | -2.245 0.167 -13.47 0.000 -2.572 -1.918
-------------+------------------------------------------------------
delta | 0.625 0.036 0.558 0.701
--------------------------------------------------------------------
.
(Stata)
. gen P=uniform()
. gen tte=(-ln(P)/(exp(log(0.1)+log(0.5)*drug)))^1.5
. gen fail=1
. replace fail=0 if tte>200
(39 real changes made)
. replace tte=200 if tte>200
(39 real changes made)
. stset tte, fail(fail)
failure event: fail != 0 & fail < .
obs. time interval: (0, tte]
exit on or before: failure
------------------------------------------------------------------------------
400 total obs.
0 exclusions
------------------------------------------------------------------------------
400 obs. remaining, representing
361 failures in single record/single failure data
25170.04 total analysis time at risk, at risk from t = 0
earliest observed entry t = 0
last observed exit t = 200
.
(Stata)
. list drug P tte fail in 1/15
+-----------------------------------+
| drug P tte fail |
|-----------------------------------|
1. | 1 .842721 6.331312 1 |
2. | 0 .3839878 29.6119 1 |
3. | 1 .3483792 96.8484 1 |
4. | 0 .6035132 11.34804 1 |
5. | 1 .8460417 6.114305 1 |
|-----------------------------------|
6. | 1 .4935982 53.06192 1 |
7. | 1 .5173908 47.84433 1 |
8. | 1 .385052 83.39208 1 |
9. | 0 .8726683 1.589515 1 |
10. | 0 .0356283 192.5611 1 |
|-----------------------------------|
11. | 0 .8018837 3.280757 1 |
12. | 0 .6059877 11.21039 1 |
13. | 1 .7919235 10.07838 1 |
14. | 0 .1920578 67.02081 1 |
15. | 0 .0819428 125.1301 1 |
+-----------------------------------+
(Stata)
0
.
0
0
0
.
2
5
0
.
5
0
0
.
7
5
1
.
0
0
0 50 100 150 200
analysis time
drug = 0 drug = 1
Kaplan-Meier survival estimates
(Stata)
. streg drug, d(w) nohr
failure _d: fail
analysis time _t: tte
Weibull regression -- log relative-hazard form
No. of subjects = 400 Number of obs = 400
No. of failures = 361
Time at risk = 25170.03819
LR chi2(1) = 53.70
--------------------------------------------------------------------
_t | Coef. SE z P>|z| [95% CI]
-------------+------------------------------------------------------
drug | -0.788 0.107 -7.34 0.000 -0.998 -0.577
_cons | -2.458 0.143 -17.19 0.000 -2.738 -2.177
-------------+------------------------------------------------------
delta | 0.706 0.031 0.648 0.768
--------------------------------------------------------------------
.
(Stata)
Random Numbers
In Stata , gennorm (webseek to download):
Typing
. gennorm a b c, corr(.2 .3 .4)
creates a, b, and c with value draw from a N(0,S) distribution where
+- -+
| 1 |
S = | .2 1 |
| .3 .4 1 |
+- -+
That is, corr(a,b)=.2, corr(a,c)=.3, and corr(b,c)=.4
CONTINUED NEXT PAGE
(Stata)
Random Numbers
In Stata:
Example
-------
. set obs 10000
obs was 0, now 10000
. set seed 6819
. gennorm a b c, corr(.2 .3 .4)
. summarize a b c

Variable | Obs Mean Std. Dev. Min Max
-------------+----------------------------------------------------------------------------
a | 10000 -.0105333 1.005723 -3.694448 3.775433
b | 10000 -.0042212 1.000254 -3.695302 3.648826
c | 10000 -.0069625 .9989002 -3.996779 3.606923
. corr a b c
(obs=10000)
| a b c
-------------+---------------------------------
a | 1.0000
b | 0.2137 1.0000
c | 0.3035 0.3952 1.0000

(Stata)
Random Numbers
In Stata, drawnorm:
. clear
. matrix C=(1, 0.2, 0.3 \ 0.2, 1, 0.4 \ 0.3, 0.4, 1)
. drawnorm a b c, n(10000) corr(C)
(obs 10000)
. summarize a b c
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
a | 10000 -.0176275 .9920181 -3.701594 3.7838
b | 10000 .0009005 1.003002 -3.709259 3.518793
c | 10000 -.0149926 .9925292 -3.716346 4.009713
. corr a b c
(obs=10000)
| a b c
-------------+---------------------------
a | 1.0000
b | 0.1937 1.0000
c | 0.3051 0.4056 1.0000
(Stata)
( )
-2.30 0.3* - 0.08* *
Inverse Prob Transform: ?????????
How do you solve for ? (Not all answers are in the book.)
S P t
x drug drug t
t

+
x
Time-Dependency in Drug Effect
( )
*
0
1 0
0
( ) exp exp(-2.30 0.3* - 0.004* * )
( ) 0
( ) ( )
( )
( )
( )
f t drug drug t t P
f t
f t d f t
f t
d
f t
t t
f t
Remember Newtons Method?

t
0
t
1
clear
set obs 400
gen drug=_n>200
gen double P=uniform()
gen double t=1
gen double tpd=t+.0001
gen double f=exp(-exp(-2.30+0.3*drug-0.004*drug*t)*(t^0.67))-P
gen double fp=exp(-exp(-2.30+0.3*drug-0.004*drug*tpd)*(tpd^0.67))-P
gen double slope=(fp-f)/0.0001
forvalues i=1/50 {
qui replace f=exp(-exp(-2.30+0.3*drug-0.004*drug*t)*(t^0.67))-P
qui replace fp=exp(-exp(-2.30+0.3*drug-0.004*drug*tpd)*(tpd^0.67))-P
qui replace slope=(fp-f)/0.0001
qui replace t=t-f/slope
qui replace tpd=t+.0001
}
(Stata)
Matlab
>> drug=[zeros(1000,1);ones(1000,1)];
>> P=rand(2000,1);
>> cdf0=exp(-1.0986+0.6931*drug)./(1+exp(-1.0986+0.6931*drug));
>> outcome=P<=cdf0;
>> b = glmfit(drug,outcome,'binomial')
b =
-1.0616
0.6562
Matlab
0
.
0
0
0
.
2
5
0
.
5
0
0
.
7
5
1
.
0
0
0 50 100 150 200
analysis time
drug = 0 drug = 1
Kaplan-Meier survival estimates
(Stata)
. gen P=uniform()
. gen cdf0=exp(-1.0986+0.6931*drug)/(1+exp(-
1.0986+0.6931*drug))
. list in 1/10
+----------------------------+
| drug P cdf0 |
|----------------------------|
1. | 0 .2865897 .2500023 |
2. | 0 .3788754 .2500023 |
3. | 1 .3597057 .3999916 |
4. | 1 .7182508 .3999916 |
5. | 1 .4315197 .3999916 |
|----------------------------|
6. | 1 .2963237 .3999916 |
7. | 1 .7961193 .3999916 |
8. | 0 .056983 .2500023 |
9. | 0 .4622037 .2500023 |
10. | 0 .5336403 .2500023 |
+----------------------------+
(Stata)
. gen outcome=P<=cdf0
. list in 1/10
+--------------------------------------+
| drug P cdf0 outcome |
|--------------------------------------|
1. | 0 .2865897 .2500023 0 |
2. | 0 .3788754 .2500023 0 |
3. | 1 .3597057 .3999916 1 |
4. | 1 .7182508 .3999916 0 |
5. | 1 .4315197 .3999916 0 |
|--------------------------------------|
6. | 1 .2963237 .3999916 1 |
7. | 1 .7961193 .3999916 0 |
8. | 0 .056983 .2500023 1 |
9. | 0 .4622037 .2500023 0 |
10. | 0 .5336403 .2500023 0 |
+--------------------------------------+
(Stata)
. gen outcome=P<=cdf0
. logistic outcome drug
Logistic regression Number of obs = 2000
LR chi2(1) = 58.65
Prob > chi2 = 0.0000
Log likelihood = -1245.3138 Pseudo R2 = 0.0230
--------------------------------------------------------------------
outcome | OR SE z P>|z| [95% CI]
-------------+------------------------------------------------------
drug | 2.084 0.202 7.57 0.000 1.723 2.519
--------------------------------------------------------------------
_cons | -1.077 0.073 -14.83 0.000 -1.220 -0.935
--------------------------------------------------------------------
.
(Stata)

14 RandomNumber

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

14 RandomNumber

Uploaded by

Copyright:

Available Formats

BSTA 670 Statistical Computing

Uniform Random Numbers

Harvest from Nature

Random draws from a specific

Make them from Uniform Random

There are no true random number

There are Pseudo-Random Number

Computers have only a limited number of

Sooner or later, the sequence of random

The trick is to be good enough to look like

Independent of the previous number

Sequence reproducible if started with

Equal probability for any number inside

We are interested primarily in

Well refer to the realization of a uniform

Many of the algorithms produce integer

Transform to interval [0,1]

Generate two sets

Up to period of m random numbers

Plots in more than 1 dimension do not fill in the

Not cryptographically secure

Selections of m, , and c are important

Good magic number for linear

Multiply two 32-bit numbers to get a 64

Low order 32 bits remain after the

Not suited to simulation

Passes all tests

By Matsumoto and Nishimura (1997)

Caused a great deal of excitement in

Good statistical properties

Not good for cryptography

SAS IML RANDGEN function

Default technique for R runif()

Im just going to give you the flavor of it

Its a bit shifting algorithm

Logical bitwise comparison

Compares two bits

If they are different, value is

Logical bitwise comparison

Compares two bits

If they are different, value is

Bit shifting algorithm

Use 624 32 bit words to make one

XOR flip function in each 32-bit word

By Matsumoto and Nishimura (1997)

Mersenne Prime Numbers (powers of 2

Free C source code

Passes all randomness smell tests

Not cryptographically secure

Inverse Probability Transform

Mixed Rejection and Transform

Methods for Correlated Random

Inverse Probability Transform methods

Let X be a random variable described by CDF F(X)

We wish to generate values of X distributed

Given a continuous Uniform Random Variable U, in

Solve for X in the CDF, so that when

Good for Count Models

Good when you cannot find F

Generally Use Pairs of Random Numbers

Just like playing the game Battleship

Choose pairs of uniform random