You are on page 1of 28

U n i v e r s i t y o f S o u t h F l o r i d a

Homework 8 - Final project


Andres Garcia-Arce
HSC 6055 - Survival Analysis
Spring
!"
HSC6055 - Survival Analysis
Spring 2014
2
Contents
Problem description 4
!escriptive analysis 5
!escriptive statistics" 5
Survival and #a$ard curve estimator" 6
%i&ed covariates Co& proportional model '
Proportional Ha$ard model" '
Correlation (atri& and interaction e))ects" 10
Assumptions assessment" 12
(artingale residual analysis" 12
Sc#oen)eld residual analysis" 1*
+oodness o) )it o) t#e model 20
Co& Snell residuals" 20
Code 21


HSC6055 - Survival Analysis
Spring 2014
,
Tables:

-able 1 !escriptives o) .umerical /ariables 5
-able 2 !escriptives )or t#e Categorical variables 5
-able , %re0uencies o) categorical variables 6
-able 4 Coe))icients estimates )or t#e selected e))ects in t#e model 1it# )i&ed
covariates '
-able 5 %it statistics )or t#e model 1it# )i&ed covariates 10
-able 6 +lobal null #ypot#esis test )or t#e model 1it# )i&ed covariates 10
-able 2 Pearson Correlation among selected variables 11
-able * (odel comparison to let in t#e variable tbili 3bili in time4 in t#e analysis 11
-able ' Supremum test )or proportional #a$ard assumption 16
-able 10 Signi)icance o) variables in regressions o) Sc#oen)eld residuals by variable
1'
-able 11 Coe))icients )or t#e )inal model 20

Figures:

%igure 1 Survival curve )or t#e event 2
%igure 2 Survival curve ad5usted by strata Se& 36ogran7 test p-value80124 2
%igure , Survival curves ad5usted by strata drug 36ogran7 test p-value80*14 *
%igure 4 Survival Curve ad5usted by strata stage 3logran7 test p-value801,4 *
%igure 5 3a4 9-( estimated survival curve: 3b4 Survival curve )itted by t#e model 10
%igure 6 Cumulative martingale ;esidual analysis )or linearity )or bili 3a4: albumin
3b4: copper 3c4: S+<- 3d4 and protime 3e4 1,
%igure 2 (artingale residuals )or prot#rombine 3a4: S+<- 3b4: copper 3c4: albumin
3d4: bilirrubin 3e4 14
%igure * Cumulative martingales )or logbili 3a4: albumin 3b4: logcopper 3c4: S+<-
3d4: protime 3e4 15
%igure ' (artingale ;esidual plot )or logbili 3a4: albumin 3b4: logcopper 3c4: S+<-
3d4: prot#rombine 3e4 16
%igure 10 PH assumption plot test 12
%igure 11 Sc#oen)eld residual plots )or t#e PH assumption 1*
%igure 12 Co&-Snell residuals )or original model 3a4: and )or t#e )inal model 3b4 20

Equations:

=0uation 1 (odel 1it# %i&ed covariates '
=0uation 2 %inal PH model 1'

HSC6055 - Survival Analysis
Spring 2014
4
Problem description

>elo1 is a description o) t#e variables recorded )rom t#e (ayo Clinic trial
inprimary biliary cirr#osis 3P>C4 o) t#e liver conducted bet1een 1'24 and 1'*4 A
total o) 424 P>C patients? re)erred to (ayo Clinic duringt#at ten-year interval? met
eligibility criteria )or t#e randomi$ed placebo controlled trial o) t#e drug !-
penicillamine -#e )irst ,12 cases in t#e data set participated in t#e randomi$ed
trial? and contain largely complete data -#e additional 112 cases did not participate
in t#e clinical trial? but consented to #ave basic measurements recorded and to be
)ollo1ed )or survivalSi& o) t#ose cases 1ere lost to )ollo1-up s#ortly a)ter
diagnosis? so t#ereare data #ere on an additional 106 cases as 1ell as t#e ,12
randomi$ed participants

id 8 case number
futime 8 number o) days bet1een registration and t#e earlier o) deat#?
transplantation?
or study analysis time in @uly? 1'*6
status 8 08alive? 18liver transplant? 28dead
drug 8 18 !-penicillamine? 28placebo
age 8 age in days
sex8 08male? 18)emale
ascites 8 presence o) ascites" 08no 18yes
hepato 8 presence o) #epatomegaly 08no 18yes
spiders 8 presence o) spiders 08no 18yes
edema 8 presence o) edema 08no edema and no diuretic t#erapy )or edema:
5 8 edema present 1it#out diuretics? or edema resolved by diuretics: 1 8
edema despite diuretic t#erapy
bili 8 serum bilirubin in mgAdl
chol 8 serum c#olesterol in mgAdl
albumin 8 albumin in gmAdl
copper 8 urine copper in ugAday
alk_phos 8 al7aline p#osp#atase in BAliter
sgot8 S+<- in BAml
trig 8 triglicerides in mgAdl
platelet 8 platelets per cubic mlA1000
protime 8 prot#rombin time in seconds
stage 8 #istologic stage o) disease

14 Bse step1ise selection procedure to select t#e important covariates t#at are
associated 1it# t#e survival outcome:
24 Assess t#e linearity assumption )or eac# identi)ied continuous covariates and
)i& t#e linearity violations i) e&isted:
,4 Assess t#e PH assumption )or eac# identi)ied covariate and )i& t#e PH
assumption violations i) e&isted:
44 -est t#e overall goodness o) )it o) t#e )inal model
HSC6055 - Survival Analysis
Spring 2014
5
Descriptive analysis

Descriptive statistics:

>asic descriptive statistics o) t#e data reveal t#at 1e are 1or7ing 1it# 11
.umerical variables and * categorical variables Also some o) t#e blood la test are
missing because t#is 106 e&tra cases Cn some cases de missing values e&ceed t#e
106 e&tra cases 1#ic# may lead to imputable missing data-able 1 and -able 2
s#o1s in detail t#e descriptive o) t#e data <ne interesting )act is t#at t#e values
stored in t#e variable Age #ave been trans)ormed 1#ic# tell us about t#at t#is data
mig#t be real data

Table 1 Descriptives of Numerical Variables
Variable N NMiss Min Mean Median Max StdMean Mode
age
418 0
9598.00 18533.35 18628.00 28650.00 186.64 19724
albumin
418 0
1.96 3.50 3.53 4.64 0.02 3.35
alk_phos
312 106
289.00 1982.66 1259.00 13862.40 121.18 559
bili
418 0
0.30 3.22 1.40 28.00 0.22 0.7
chol
284 134
120.00 369.51 309.50 1775.00 13.76 260
copper
310 108
4.00 97.65 73.00 588.00 4.86 52
platelet
407 11
62.00 257.02 251.00 721.00 4.87 344
protime
416 2
9.00 10.73 10.60 18.00 0.05 10.6
sgot
312 106
26.35 122.56 114.70 457.25 3.21 71.3
trig
282 136
33.00 124.70 108.00 598.00 3.88 118
futime
418 0
41.00 1917.78 1730.00 4795.00 54.03 41



Table 2 Descriptives for the Categorical variables
Variable N NMiss Mode
ascites 312 106 0
drug 312 106 1
edema 418 0 0
hepato 312 106 1
sex 418 0 1
spiders 312 106 0
stage 412 6 3
status 418 0 0




HSC6055 - Survival Analysis
Spring 2014
6



Table 3 re!uencies of categorical variables
Variable Value Freq %
stage
1 16 5.14
2 66 21.22
3 120 38.59
4 109 35.05
edema
0 262 84.24
0.5 29 9.32
1 20 6.43
drug
1 158 50.8
2 153 49.2

Variable Value Freq %
spiders
0 222 71.38
1 89 28.62
hepato
0 151 48.55
1 160 51.45
ascites
0 287 92.28
1 24 7.72
sex
0 36 11.58
1 275 88.42




Since t#e mec#anism )or t#e missing data is considered )rom t#e description
o) t#e data as no random? 1e cannot easily impute it %or t#is 1or7 t#e imputation
met#od #as been discarded and 1e 1ill study only t#e )irst ,12 observations

Survival and haard curve estimator:

%or t#is analysis t#e survival time is considered to be t#e variable D)utimeD
1#ile t#e censoring is considered to be t#e DtransplantD 314 or DaliveD304 values o)
t#e variable status -#e censoring variable status is se calculated as D0D )or censored
and D1D )or observed

A estimated 9-( survival curve )or t#e survival time in %igure 1a -#e
survival curve s#o1s to be decreasing 1it# a strange be#avior in t#e second #al) o)
it -#e smoot#ed curve o) #a$ard rate also s#o1s a pea7 in t#e second #al) t#at #elp
us to support t#at idea 3see %igure 1b4


HSC6055 - Survival Analysis
Spring 2014
2
3a4 3b4
igure 1 "urvival curve for the event#

-o )urt#er inspect t#e survival p#enomena and t#e role o) t#e e&planatory
variables t#at are available in t#e dataset? strati)ied 9-( curves are plotted and 6og
ran7 tests are per)ormed %or t#e 9-( curve ad5usted by Se& 1e can see in %igure 2a
t#at be)ore t#e 1000 days t#e se&81 3)emale4 #as 1orse survival t#an se&80 3male4?
a)ter t#at t#ere is a big c#ange Also combining t#e mode )or t#e group and t#is plot
1e see t#at t#e male survival curve is less stable t#an t#e curve o) )emale survival
and t#is is because t#ere is more 1omen t#an men enrolled in t#is study
3.Emale8,6 compared to .E)emale8 2254 %igure 2b s#o1s a similar pat# t#an in
t#e #a$ard rate plot un-ad5usted t#at can support t#e pea7 and additionally can tell
us #o1 t#e )emale group #ave a big in)luence on t#e global data because t#e number
o) observations 3see -able ,4


3a4 3b4
igure 2 "urvival curve ad$usted b% strata "ex &'ogrank test p(value)*#12+#


HSC6055 - Survival Analysis
Spring 2014
*
%or t#e usage o) !-penicillamine? 1e see a positive in)luence in t#e early
stage o) t#e disease 30-1000 days4 and t#en t#e e))ect is 1orse t#an t#e placebo At
late stages t#e e))ect seems to be unclear -#e #a$ard ratio s#o1s t#at people under
t#e treatment possesses more ris7 to die 3see%igure , Survival curves ad5usted by
strata drug 36ogran7 test p-value80*14%igure ,4


3a4 3b4
igure 3 "urvival curves ad$usted b% strata drug# &'ogrank test p(value)*#,1+


-#e last 9-( curve to analy$e is t#e one ad5usted by Stage Cn t#is case %igure 4
s#o1s #o1 t#e survival )or patients occur 1it# respect to t#e progression o) t#e
disease

3a4 3b4
igure - "urvival Curve ad$usted b% strata stage &logrank test p(value)*#13+#

HSC6055 - Survival Analysis
Spring 2014
'
Fi!ed covariates Co! proportional model
Proportional "aard model:

A Proportional #a$ards regression model is )itted to test t#e signi)icance o)
t#e di))erent covariates in t#e survival outcome -#is met#od is a non-parametric
and it includes t#ree main assumptions" non in)ormative censoring? linearity o)
e))ects and proportional #a$ards -#e )irst assumption relates to t#e e&periment
itsel) so since 1e 5ust #ave t#e data 1e 1ill assume t#at t#is is true A)ter a
candidate model is selected 1e 1ill c#ec7 t#e ot#er assumptions using di))erent
statistical tools

Bsing SAS ', 1e )it a proportional #a$ard model using t#e procedure PH;=+
3code included in appendi&4 Cn a )irst step 1e consider 5ust )i&ed e))ect as candidate
variables in t#e model -#e met#od )or variable selection in t#e )itting process is
Step1ise 1it# sensibility )or add or remove variables o) 01 A)ter t#e )itting process
concludes? t#e candidate model is s#o1n in =0uation 1

.!uation 1 /odel 0ith ixed covariates
procphregdata=two plots(overlay)=(survival);
title' S-Model PBC Data fixed covariates';
modelfutime*status(0)= age edema bili albumin copper sgotprotime stage
;
run;

Cn -able 4 t#e coe))icients and P-/alues )or eac# variable selected are s#o1n At t#is
step 1e can say t#at all variables selected #ave are signi)icant Additionally t#e ACC
1it# and 1it#out variables s#o1n t#at t#e )itted model is better t#an t#e one
1it#out covariates -able 5

Table - Coefficients estimates for the selected effects in the model 0ith fixed covariates#
Analysis of Maximum Likelihood Estimates
Parameter DF
Parameter
Estimate
Standard
Error
Chi-
Square
Pr>ChiSq
Hazard
Ratio
Label
age 1 0.0000897 0.0000259 12.0139 0.0005 1.000
edema 1 0.82964 0.31094 7.1192 0.0076 2.292
bili 1 0.08483 0.01864 20.7175 <.0001 1.089 Serum Bilirubin mg/dl
albumin 1 -0.78665 0.25513 9.5071 0.0020 0.455 Albumin in gm/dl
copper 1 0.00274 0.0009224 8.8347 0.0030 1.003 Urine copper in ug/day
sgot 1 0.00470 0.00166 8.0370 0.0046 1.005 SGOT
protime 1 0.26791 0.09107 8.6540 0.0033 1.307
prothrombin time in
seconds
stage 1 0.40488 0.13552 8.9252 0.0028 1.499 histologic stage of disease





HSC6055 - Survival Analysis
Spring 2014
10
Table 1 it statistics for the model 0ith fixed covariates#
Model Fit Statistics
Criterion
Without
Covariates
With
Covariates
-2 LOG L 1267.801 1074.006
AIC 1267.801 1090.006
SBC 1267.801 1112.568

-#e global #ypot#esis o) signi)icance o) t#e model 3see -able 64 suggest t#at
t#e model itsel) is signi)icant


Table 2 3lobal null h%pothesis test for the model 0ith fixed covariates#
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr>ChiSq
Likelihood Ratio 193.7954 8 <.0001
Score 304.8057 8 <.0001
Wald 205.2342 8 <.0001



3a4 3b4

igure 1 &a+ 4(/ estimated survival curve5 &b+ "urvival curve fitted b% the model
-#e )itted curve s#o1s a similar aspect to t#e 9-( estimated curve (ore
analysis are needed to inspect )or t#e validity o) t#e model .o1 interaction among
t#e e))ects c#osen by t#e model is inspected

Correlation #atri! and interaction effects:

-o decide i) t#ere is any interaction e))ect needed a Pearson correlation
)actor )or eac# combination o) t1o e))ects is computed -#e results are s#o1n in
-able 2 -#e results s#o1s 3besides o) time4 t#at bili and copper #ave some level o)
correlation So models )or t#e interaction and eac# variable by separate are
computed to compare t#e results
HSC6055 - Survival Analysis
Spring 2014
11

Table 6 7earson Correlation among selected variables



Table , /odel comparison to let in the variable tbili &bili in time+ in the anal%sis
No covariates 8riginal model 9dding cobili 8nl% cobili
9:C 12625,' 10*''42 FF Pval8014'4 110441'

G#en considering t#e addition o) cobili 3interaction bet1een bili and copper4
into de original model? t#e ne1 variable is not signi)icant Cn t#e model 1#en copper
and bili are dismissed )rom t#e )inal model and cobili is added to t#e model? t#e ACC
value is #ig#er t#an t#e ACC )or t#e original model -#ere)ore t#e model comparison
s#o1s t#at t#e best model is t#e original one? and even 1#en t#e interaction
variable is signi)icant? t#e improvement in terms o) t#e ACC criteria is not good
enoug# to include it in t#e analysis Ge move )or1ard 1it# t#e assumption
assessment
HSC6055 - Survival Analysis
Spring 2014
12
$ssumptions assessment:

As 1e discussed above? t#e proportional #a$ard model #ave , main
assumptions" t#e non-in)ormative censoring? t#e linearity o) t#e e))ects and t#e
proportional #a$ards 3p# assumption4 %irst 1e 1ill inspect t#e )unctional )orm 1it#
t#e martingale residuals analysis

#artingale residual analysis:

-#is tec#ni0ue is used to assess t#e linearity o) t#e covariates -#is plots are
s#o1ing t#e data and t#en 1000 simulation pat#s C) t#e simulated pat#s )its t#e
data and s#o1s linear pat# t#en 1e say t#e variable doesnHt need a trans)ormation
3See %igure 64


3a4 3b4

3c4 3d4
HSC6055 - Survival Analysis
Spring 2014
1,


3e4

igure 2 Cumulative martingale ;esidual anal%sis for linearit% for bili &a+5 albumin &b+5 copper &c+5 "38T
&d+ and protime &e+#
-#e results )rom t#e previous test suggest t#at t#e variable bili does not
comply 1it# t#e assumption o) linearity Additionally in t#e %igure 2 1e see t#e
trend also present in copper -o )i& t#is a log trans)ormation is introduced )or bili
and copper? and a)ter t#at a ne1 test is per)ormed


3a4 3b4

3c4 3d4
HSC6055 - Survival Analysis
Spring 2014
14






3e4

igure 6 /artingale residuals for prothrombine &a+5 "38T &b+5 copper &c+5 albumin &d+5 bilirrubin &e+#
A)ter t#e trans)ormation? t#e martingales residuals does not s#o1 evidence
o) non linearity in t#e data 3see %igure * and %igure '4 Additionally t#e
trans)ormation o) t#ese t1o variables #ave decreased t#e ACC criteria )rom 10*'4'
to 1021521 1#ic# con)irms t#at t#is model 1it# trans)ormed variables
outper)orms t#e original model



3a4 3b4

3c4 3d4
HSC6055 - Survival Analysis
Spring 2014
15


3e4

igure , Cumulative martingales for logbili &a+5 albumin &b+5 logcopper &c+5 "38T &d+5 protime &e+#


3a4 3b4

3c4 3d4
HSC6055 - Survival Analysis
Spring 2014
16


3e4

igure < /artingale ;esidual plot for logbili &a+5 albumin &b+5 logcopper &c+5 "38T &d+5 prothrombine &e+#

-#e analysis )rom t#e supremum test s#o1s t#at t#e continuous variable
protime does not comply 1it# t#e PH assumption 3see %igure 10 and -able '4 -o
)urt#er inspect t#is 1e 1ill assess t#e Sc#oen)eld residuals plot to support t#e PH
assumption in t#e variables

Table < "upremum test for proportional ha=ard assumption#
Supremum Test for Proportionals Hazards Assumption
Variable
Maximum Absolute
Value
Replications Seed
Pr >
MaxAbsVal
age 0.8346 1000 856517000 0.3860
edema 1.4757 1000 856517000 0.0140
logbili 1.3354 1000 856517000 0.1570
albumin 0.7722 1000 856517000 0.6460
logcopper 1.0610 1000 856517000 0.2210
sgot 1.0275 1000 856517000 0.2590
protime 1.8215 1000 856517000 <.0001
stage 1.0574 1000 856517000 0.1850




3a4 3b4
HSC6055 - Survival Analysis
Spring 2014
12

3c4 3d4

3e4 3)4

3#4 3i4
igure 1* 7> assumption plot test#




HSC6055 - Survival Analysis
Spring 2014
1*
Schoenfeld residual analysis:

-#e Sc#oen)eld residual plots con)irms t#at protime s#o1s a trend 1#ic#
e&plicitly s#o1s evidence o) a violation in t#e PH assumption 3See %igure 114




3a4 3b4

3c4 3d4

3e4 3)4
igure 11 "choenfeld residual plots for the 7> assumption
HSC6055 - Survival Analysis
Spring 2014
1'
Additionally? 1e compute regressions to eac# plot? supporting t#at t#e only
model 1it# signi)icant variables is t#e one 1#ic# study t#e variable protime3

Table 1* "ignificance of variables in regressions of "choenfeld residuals b% variable
4

Table 1* "ignificance of variables in regressions of "choenfeld residuals b% variable


-o solve t#is problem 1e 1ill introduce a dummy variable in order to
account )or t1o di))erent #a$ards by a cutting point -#is point is )ound by
computing PH models )or all t#e candidates observed times and c#oosing t#e time
1it# t#e #ig#est li7eli#ood a)ter t#at t#e variable get divided into t1o di))erent time
dependent variables
-#e time 1#ic# ma&imi$es t#e criteria is 250? so it is used as a cutting point
to )it a ne1 model 1it# t#is t#e ne1 model improving )rom an ACC o) 1021521 to
10612*'
At t#is point t#e model decided t#at S+<- and t#e protime value a)ter 250
1ere not signi)icant? so e&tracting t#ese variables t#e model ac#ieves an ACC o)
105'202 3See =0uation 24

.!uation 2 inal 7> model#
proc phreg data=three;
title ' S-Model PBC Data fixed covariates';
model futime*status(0)= age edema logbili albumin logcopper stage ;
output out = figure11_1 LOGSURV = h /method = ch; /*-logsurv is the
cox-snell residual*/
run;


%ar P&%alue
age 0686#
logbili 05$8
albumin 05%%%
logcopper 0$"0!
protime 00$"#
S'(T 0$!05
HSC6055 - Survival Analysis
Spring 2014
20
'oodness of fit of the model
Co! Snell residuals:

-#e goodness o) )it is assessed by t#e study o) t#e Co&-Snell residuals Cn
order to compare t#e t1o di))erent models 1e 1ill plot t#e Co&-Snell residuals plots
)or t#e original and t#e )inal model



3a4 3b4

igure 12 Cox("nell residuals for original model &a+5 and for the final model &b+#
-#e Co&-Snell residual plots s#o1s t#at in overall t#e )itness #as been
improved in t#e process 3see %igure 124 -#e estimation o) coe))icients )or t#e )inal
model are s#o1n in -able 11

Table 11 Coefficients for the final model#
Analysis of Maximum Likelihood Estimates
Parameter DF
Parameter
Estimate
Standard
Error
Chi-
Square
Pr >
ChiSq
Hazard
Ratio
Label
age 1 0.0000849 0.0000237 12.8665 0.0003 1.000
edema 1 0.83535 0.29121 8.2285 0.0041 2.306
logbili 1 0.73118 0.11481 40.5621 <.0001 2.078
albumin 1 -0.77556 0.25287 9.4064 0.0022 0.460 Albumin in gm/dl
logcopper 1 0.35758 0.13613 6.8996 0.0086 1.430
stage 1 0.28121 0.13420 4.3909 0.0361 1.325
histologic stage of
disease
z1 1 0.64732 0.13804 21.9894 <.0001 1.910

From the coefficients we can say that the logbili and the protime at the first 750
days are the two most influent variables in the survival outcome.


HSC6055 - Survival Analysis
Spring 2014
21
Code

data one ;
infile 'S:\HW8\data.csv' delimiter = ',' MISSOVER DSD
lrecl=32767 obs=312 ;
input
futime
status
drug
age
sex
ascites
hepato
spiders
edema
bili
chol
albumin
copper
alk_phos
sgot
trig
platelet
protime
stage
;
run;

proc freq data=two;

run;

data two; set one;
label futime="Time to event or end of study"
ascites="Presence of ascites"
hepato="Presence of Hepatomegaly"
spiders="Presence of Spiders"
bili="Serum Bilirubin mg/dl"
chol="Serum Cholesterol mg/dl"
albumin="Albumin in gm/dl"
copper="Urine copper in ug/day"
alk_phos="Alkaline phosphatase in U/liter"
sgot="SGOT"
trig="Triglicerides in mg/dl"
platelet="platelets per cubic ml/1000"
protime="prothrombin time in seconds"
stage="histologic stage of disease";
run;

data two; set two;
if status=1 then status=0;
run;

data two; set two;
if status=2 then status=1;
run;
HSC6055 - Survival Analysis
Spring 2014
22

proc univariate data=two MODE;
histogram;
run;

proc lifetest data=two aalen plot=hazard ;
time futime*status(0);
run;

proc lifetest data=two aalen plot=hazard ;
time futime*status(0);
strata sex;
run;

proc lifetest data=two aalen plot=hazard;
time futime*status(0);
strata drug;
run;

proc lifetest data=two aalen plot=hazard;
time futime*status(0);
strata stage;
run;

proc phreg data=two;
model futime*status(0)= drug age sex ascites hepato
spiders edema bili chol albumin copper alk_phos sgot trig platelet
protime stage
/selection=Stepwise SLentry=0.1 SLstay=0.1;
;
run;

/*Exploring interactions among variables*/

proc corr data=two plots(maxpoints=none)=matrix(histogram);
var futime age edema bili albumin copper sgot protime stage ;
run;


proc phreg data=two plots(overlay)=(survival);
title ' S-Model PBC Data fixed covariates';
model futime*status(0)= age edema bili albumin copper sgot protime
stage ;
run;

proc phreg data=two plots(overlay)=(survival);
title ' S-Model PBC Data fixed covariates';
model futime*status(0)= age edema bili copper cobili albumin sgot
protime stage ;
cobili=bili*copper;
run;

proc phreg data=two plots(overlay)=(survival);
title ' S-Model PBC Data fixed covariates';
model futime*status(0)= age edema cobili albumin sgot protime stage ;
cobili=bili*copper;
run;
HSC6055 - Survival Analysis
Spring 2014
2,

proc phreg data=two plots(overlay)=(survival);
title ' S-Model PBC Data fixed covariates';
model futime*status(0)= age edema tbili talbumin copper sgot protime
stage ;
tbili=bili*futime;
talbumin=futime*albumin;
run;

proc phreg data=two ;
title ' S-Model PBC Data fixed covariates';
model futime*status(0)= age edema tbili albumin copper sgot protime
stage ;
tbili=bili*futime;
run;

******************************************************************;
*********model with fixed covariates and interactions
(FC)******************************;
******************************************************************;

proc phreg data=two plots(overlay)=(survival);
title ' S-Model PBC Data fixed covariates';
model futime*status(0)= age edema bili albumin copper sgot protime
stage ;
run;

proc phreg data=two plots(overlay)=(survival);
title ' S-Model PBC Data fixed covariates';
model futime*status(0)= age edema tbili albumin copper sgot protime
stage ;
tbili=bili*futime;
assess var=(tbili) ph /crpanel resample;
run;

***********************;
/*martinngale plot*/;
**********************;

proc phreg data = two;
title ' S-Model PBC Data fixed covariates';
model futime*status(0)= age edema bili albumin copper sgot protime
stage ;
assess var=(bili albumin copper sgot protime) /crpanel resample;
output out = figure11_4 RESMART = mgale ;
run;


proc loess data=figure11_4;
title ' Martingale residuals for Bili';
ods output OutputStatistics=figure11_4a;
model mgale = bili / smooth=0.6 direct;
run;

proc loess data=figure11_4;
title ' Martingale residuals for Albumin';
ods output OutputStatistics=figure11_4a;
HSC6055 - Survival Analysis
Spring 2014
24
model mgale = albumin / smooth=0.6 direct;
run;

proc loess data=figure11_4;
title ' Martingale residuals for copper';
ods output OutputStatistics=figure11_4a;
model mgale = copper / smooth=0.6 direct;
run;

proc loess data=figure11_4;
title ' Martingale residuals for sgot';
ods output OutputStatistics=figure11_4a;
model mgale = sgot / smooth=0.6 direct;
run;

proc loess data=figure11_4;
title ' Martingale residuals for protime';
ods output OutputStatistics=figure11_4a;
model mgale = protime / smooth=0.6 direct;
run;

/*transforming the variables*/

data three; set two;
logbili=log(bili);
logcopper=log(copper);
run;

proc phreg data = three;
title ' S-Model PBC Data fixed covariates';
model futime*status(0)= age edema logbili albumin logcopper sgot
protime stage ;
assess var=(logbili albumin logcopper sgot protime)ph /crpanel
resample;
output out = figure11_4 RESMART = mgale ;
run;


proc loess data=figure11_4;
title ' Martingale residuals for Bili';
ods output OutputStatistics=figure11_4a;
model mgale = logbili / smooth=0.6 direct;
run;

proc loess data=figure11_4;
title ' Martingale residuals for Albumin';
ods output OutputStatistics=figure11_4a;
model mgale = albumin / smooth=0.6 direct;
run;

proc loess data=figure11_4;
title ' Martingale residuals for copper';
ods output OutputStatistics=figure11_4a;
model mgale = logcopper / smooth=0.6 direct;
run;

proc loess data=figure11_4;
HSC6055 - Survival Analysis
Spring 2014
25
title ' Martingale residuals for sgot';
ods output OutputStatistics=figure11_4a;
model mgale = sgot / smooth=0.6 direct;
run;

proc loess data=figure11_4;
title ' Martingale residuals for protime';
ods output OutputStatistics=figure11_4a;
model mgale = protime / smooth=0.6 direct;
run;

/*checking the oh assumption*/

proc phreg data = three;
title ' S-Model PBC Data fixed covariates';
model futime*status(0)= age edema logbili albumin logcopper sgot
protime stage ;
assess ph /crpanel resample;
run;

***************************************;
******* Schoenfeld residual plot*******;
***************************************;

proc phreg data=three;
title ' S-Model PBC Data fixed covariates';
model futime*status(0)=age logbili albumin logcopper protime sgot
edema stage;
output out=b wtressch=wtsch1 wtsch2 wtsch3 wtsch4 wtsch5 wtsch6;
run;


proc phreg data=three;
model futime*status(0)=age logbili albumin logcopper protime sgot
edema stage;
output out=b ressch= sch1 sch2 sch3 sch4 sch5 sch6;
run;


proc reg data=b;
model sch1=futime;
run;
quit;


proc reg data=b;
model sch2=futime;
run;
quit;

proc reg data=b;
model sch3=futime;
run;
quit;

proc reg data=b;
model sch4=futime;
HSC6055 - Survival Analysis
Spring 2014
26
run;
quit;

proc reg data=b;
model sch5=futime;
run;
quit;

proc reg data=b;
model sch6=futime;
run;
quit;


******************************************************************;
***************fixing time dependent variable*********************;
******************************************************************;

/*to extract the event time (not censored)*/
proc sql noprint;
select distinct futime into :event_time separated by ' '
from three
where status = 1;
quit;
%put &event_time;

/*Marco coded to run the analysis trying the differents cutting times*/
%macro event_lpl(data, time, censor, var, delta_list);
%let k=1;
%let whole =;
%let dep = %scan(&delta_list, &k, ' ');
%do %while(&dep NE );
ods listing close;
proc phreg data=&data;
model &time*&censor(0) = &var z2;
if &time > &dep then z2 = &var;
else z2 = 0;
ods output FitStatistics = _temp&k;
run;
ods output close;
ods listing;
%let whole= &whole%str( _temp&k);
%let k = %eval(&k + 1);
%let dep = %scan(&delta_list, &k, ' ');
%end;

data whole;
set &whole;
if Criterion = "-2 LOG L" ;
run;
data whole;
set whole;
e_time = scan("&delta_list", _n_, ' ');
logp = - withcovariates/2;
run;
proc print data = whole noobs;
var e_time logp;
HSC6055 - Survival Analysis
Spring 2014
22
run;
%mend;
/*run the macro for our dataset and the times collected before*/
%event_lpl(three, futime, status, protime, &event_time)

proc phreg data = three;
model futime*status(0)= age edema logbili albumin logcopper sgot stage
z1 z2 ;
if (futime <= 750 ) then z1=protime; else z1=0;
if (futime >750) then z2=protime; else z2=0;
run;

proc phreg data = three;
model futime*status(0)= age edema logbili albumin logcopper stage z1 ;
if (futime <= 750 ) then z1=protime; else z1=0;
run;



***********************;
/*Cox-Snell plot* Original model*/
***********************;

proc phreg data=two;
title' S-Model PBC Data fixed covariates';
model futime*status(0)= age edema bili albumin copper sgot protime
stage ;
output out = figure11_1 LOGSURV = h /method = ch; /*-logsurv is the
cox-snell residual*/
run;

data figure11_1a;
set figure11_1;
h = -h;
cons = 1;
run;

proc phreg data = figure11_1a ;
model h*status(0) = cons;
output out = figure11_1b logsurv = ls /method = ch;
run;

data figure11_1c;
set figure11_1b;
haz = - ls;
run;

proc sort data = figure11_1c;
by h;
run;

title "Cox-Snell residuals plot Original model";
axis1 order = (0 to 3 by .5) minor = none;
axis2 order = (0 to 3 by .5) minor = none label = ( a=90);
symbol1 i = stepjl c= blue;
symbol2 i = join c = red l = 3;

HSC6055 - Survival Analysis
Spring 2014
2*
proc gplot data = figure11_1c;
plot haz*h =1 h*h =2 /overlay haxis=axis1 vaxis= axis2;
label haz = "Estimated Cumulative Hazard Rates";
label h = "Residual";
run;
quit;


***********************;
/*Cox-Snell plot* Final model*/
***********************;

proc phreg data=three;
title ' S-Model PBC Data fixed covariates';
model futime*status(0)= age edema logbili albumin logcopper stage ;

output out = figure11_1 LOGSURV = h /method = ch; /*-logsurv is the
cox-snell residual*/
run;

data figure11_1a;
set figure11_1;
h = -h;
cons = 1;
run;

proc phreg data = figure11_1a ;
model h*status(0) = cons;
output out = figure11_1b logsurv = ls /method = ch;
run;

data figure11_1c;
set figure11_1b;
haz = - ls;
run;

proc sort data = figure11_1c;
by h;
run;

title "Cox-Snell residuals plot final model";
axis1 order = (0 to 3 by .5) minor = none;
axis2 order = (0 to 3 by .5) minor = none label = ( a=90);
symbol1 i = stepjl c= blue;
symbol2 i = join c = red l = 3;

proc gplot data = figure11_1c;
plot haz*h =1 h*h =2 /overlay haxis=axis1 vaxis= axis2;
label haz = "Estimated Cumulative Hazard Rates";
label h = "Residual";
run;
quit;

You might also like