Solution: ########### Problem 1: ########### part a) ####### I fit proportional hazard model and use the ANOVA command. It tells us that only the main effect of x1 is significant with p-value 0.024. The p-vlaue of the main effect of x2 is 0.435 and the p-value of the interaction term is 0.679. Even after I exclude the interaction term, the p-value of the main effect is also 0.435. Thus, the final model is the proportional hazard model with only x1. Then, I compute the Kaplan-Meier estimator and plot the log(time) versus log(-log(survival)) for the two groups of x1. We can find that that the slopes for the same log(time) are different but not far from each other. At least, one increases, the other increases. Thus, we conclude that the proportional hazard property is not seriourly wrong. ####### Part b) ####### The estimate of the parameter is beta=0.982 if x1=1; and beta=0 if x1=0. (You may state it as a continous variable). In the output, survival0 gives us the estimate of h_0(t) and S_0(t) which are baseline hazard and survival functions. Becuase beta=0.982>0 if X1=1, the patients younger than 50 expected survive longer. when t=20, the estimates and the confidence intervals are survival 95% CI x1=0: 0.497 [0.299,0.825] x1=1: 0.1542 [0.0528,0.450] I plot the sruvival functions from cox proportional hazard model as well as those from KM method. Ther are not very different. ############### Output Problem 1: ######### > aml <- read.table("h:\\data\\aml.data",h=T) > aml weeks censor x1 x2 1 18 1 0 0 2 9 1 0 1 3 28 0 0 0 4 31 1 0 1 5 39 0 0 1 6 19 0 0 1 7 45 0 0 1 8 6 1 0 1 9 8 1 0 1 10 15 1 0 1 11 23 1 0 0 12 28 0 0 0 13 7 1 0 1 14 12 1 1 0 15 9 1 1 0 16 8 1 1 0 17 2 1 1 1 18 26 0 1 0 19 10 1 1 0 20 4 1 1 0 21 3 1 1 0 22 4 1 1 0 23 18 1 1 1 24 8 1 1 1 25 3 1 1 1 26 14 1 1 1 27 3 1 1 0 28 13 1 1 1 29 13 1 1 1 30 35 0 1 0 ######### part a) ######### > library(survival) > g <- coxph(Surv(weeks,censor)~x1*x2,data=aml) > anova(g,test="Chi") Analysis of Deviance Table Cox model: response is Surv(weeks, censor) Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev P(>|Chi|) NULL 30 130.644 x1 1 5.084 29 125.560 0.024 x2 1 0.608 28 124.952 0.435 x1:x2 1 0.171 27 124.781 0.679 ############ > g <- coxph(Surv(weeks,censor)~x1+x2,data=aml) > anova(g,test="Chi") Analysis of Deviance Table Cox model: response is Surv(weeks, censor) Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev P(>|Chi|) NULL 30 130.644 factor(x1) 1 5.084 29 125.560 0.024 factor(x2) 1 0.608 28 124.952 0.435 ############ > gkm$surv [1] 0.9230769 0.8461538 0.7692308 0.6923077 0.6153846 0.5384615 0.5384615 0.4487179 0.4487179 [10] 0.2991453 0.2991453 0.2991453 0.9411765 0.7647059 0.6470588 0.5294118 0.4705882 0.4117647 [19] 0.3529412 0.2352941 0.1764706 0.1176471 0.1176471 0.1176471 > gkm$time [1] 6 7 8 9 15 18 19 23 28 31 39 45 2 3 4 8 9 10 12 13 14 18 26 35 ############# > first <- gkm$surv[1:12] > time1 <- gkm$time[1:12] > second <- gkm$surv[13:24] > time2 <- gkm$time[13:24] ############## > plot(log(time1),log(-log(first)),xlim=c(0.5,4.0),ylim=c(-3.0,1.0),type="l") > lines(log(time2),log(-log(second))) ################# part b) ################# > g <- coxph(Surv(weeks,censor)~x1,data=aml) > summary(g) Call: coxph(formula = Surv(weeks, censor) ~ x1, data = aml) n= 30 coef exp(coef) se(coef) z p x1 0.982 2.67 0.449 2.19 0.029 exp(coef) exp(-coef) lower .95 upper .95 x1 2.67 0.374 1.11 6.44 Rsquare= 0.156 (max possible= 0.987 ) Likelihood ratio test= 5.08 on 1 df, p=0.0241 Wald test = 4.79 on 1 df, p=0.0287 Score (logrank) test = 5.13 on 1 df, p=0.0236 > survival0 <- survfit(g,newdata=data.frame(x1=0)) > survival1 <- survfit(g,newdata=data.frame(x1=1)) ################ > summary(survival0) Call: survfit.coxph(object = g, newdata = data.frame(x1 = 0)) time n.risk n.event survival std.err lower 95% CI upper 95% CI 2 30 1 0.983 0.0178 0.949 1.000 3 29 3 0.929 0.0415 0.851 1.000 4 26 2 0.890 0.0550 0.788 1.000 6 24 1 0.869 0.0615 0.756 0.998 7 23 1 0.848 0.0677 0.725 0.992 8 22 3 0.784 0.0853 0.634 0.970 9 19 2 0.738 0.0964 0.571 0.953 10 17 1 0.714 0.1018 0.540 0.944 12 16 1 0.689 0.1070 0.508 0.934 13 15 2 0.633 0.1165 0.441 0.908 14 13 1 0.602 0.1206 0.406 0.891 15 12 1 0.567 0.1239 0.370 0.870 18 11 2 0.497 0.1287 0.299 0.825 23 8 1 0.455 0.1307 0.259 0.799 31 4 1 0.381 0.1381 0.187 0.775 > summary(survival1) Call: survfit.coxph(object = g, newdata = data.frame(x1 = 1)) time n.risk n.event survival std.err lower 95% CI upper 95% CI 2 30 1 0.9553 0.0439 0.8730 1.000 3 29 3 0.8212 0.0829 0.6739 1.000 4 26 2 0.7318 0.0972 0.5640 0.950 6 24 1 0.6871 0.1027 0.5127 0.921 7 23 1 0.6442 0.1065 0.4660 0.891 8 22 3 0.5223 0.1114 0.3438 0.793 9 19 2 0.4444 0.1110 0.2724 0.725 10 17 1 0.4070 0.1096 0.2400 0.690 12 16 1 0.3696 0.1079 0.2086 0.655 13 15 2 0.2948 0.1030 0.1486 0.585 14 13 1 0.2574 0.0999 0.1203 0.551 15 12 1 0.2200 0.0961 0.0934 0.518 18 11 2 0.1542 0.0843 0.0528 0.450 23 8 1 0.1219 0.0763 0.0357 0.416 31 4 1 0.0761 0.0641 0.0146 0.396 > plot(survival0,conf.int=F,lty=1) > lines(survival1,lty=2) > legend(25,1,c("<50",">=50"),lty=1:2) ##################### > plot(survival0,conf.int=F,lty=1) > lines(survival1) > lines(gkm,conf.int=F,lty=c(2,2)) > legend(25,1,c("Cox","KM"),lty=1:2)