########### # Problem 1 ########### (e) You can either use the difference between the G2 values or the Wald test. The difference of G2 value is 59.375-58.114=1.261. Based on chisq distribution with one degrees of freedom. It is not significant. The Wald statistic value is abs((0.886469-1)/0.099297)=1.14. It is less than 1.96. It is still not significant. Thus, we conclude that the log of service can be one. (f) The G2 value is 23.587 based on 17 degrees of freedom. Its p-value is 0.1311. Thus, it fits the data (h) The outputs are given in (e) and (f). Clearly, the interaction effect model fits the data. (i) When overdispersion is present, we do not change the first column but we need to adjust the standard error. The way is to multiply it by sqrt(dispersion). I prefer the main effect model with overdispersion because it is easy to interpret. (j) The interpretation is the same as the model without the dispersion parameter. Based on the output, we conclude that the incident rates were reduced by types E, A, C, D, and B. Both Year and Period increased the incident rate. ########### # Problem 2. ########### (a) Read my code. (b) The Pearson Chi-square is 22.37 based on 4 df. It is significant. Thus, we conclude that the two variables are not independent. (c) The G2 value of the main effect model is 22.254 based on 4 df. It does not fit the data. Therefore, we conclude that the two variables are not independent. This is consistent with the previous one. (d) I define a linear-by-linear association term. The interaction term has 1 df. It is given by the product of scores of the two variables. The scores are 1,2,3 accoring to their ordinal levels. The residual deviance of the model is 1.4449 with 3 df. The fit is good. (e) The wald statistic value is 4.479 with p-value 7*10^(-6). It is significant. The likelihood ratio statistic is 22.254-1.4449=20.8091. Based on chi-square with one degrees of freedom, it is also significant. ########### # Problem 3. ########### (a) Read my code. (b) I calculated the Pearson Chi-square. The value is 65.8129. Based on 6 df, it is significant. Thus, we conclude that the two variables are not independent. (c) I fit a main-effect Poisson model. The residual deviance is 51.795 based on 6 degrees of freedom. It is significant. Thus, we still conclude that the two variables are not independent. (d) We find the largest (positive) residual occurred at tumor=freckle and site=head. The least (negative) residual occurred at tumer=superficial) and site=head. These two are concerns. (e) Look at my plot. Interpret it by yourself. The main issue is the same as I said in (d). (f) I remove the first four rows and refit the model. The residual deviance is 2.16 based on 3 degrees of freedom. It is good. Therefore, we conclude that the two variables are independent if site=head is excluded. ######################### # Appendix: R code and Output ######################### Problem 1 library(MASS) data(ships) summary(ships) (e) mod.one <- glm(incidents~offset(log(service))+type+year+period,data=ships[ships$service!=0,],family=poisson) summary(mod.one) mod.main <- glm(incidents~log(service)+type+year+period,data=ships[ships$service!=0,],family=poisson) summary(mod.main) (f) mod.interaction <- glm(incidents~log(service)+(type+year+period)^2,data=ships[ships$service!=0,],family=poisson) summary(mod.interaction) (i) phi.one <- sum(residuals(mod.one,"pearson")^2)/mod.one$df.residual summary(mod.one,dispersion=phi.one) phi.interaction <- sum(residuals(mod.interaction,"pearson")^2)/mod.interaction$df.residual summary(mod.interaction,dispersion=phi.interaction) #### # Output #### (e) > mod.one <- glm(incidents~offset(log(service))+type+year+period,data=ships[ships$service!=0,],family=poisson) > summary(mod.one) Call: glm(formula = incidents ~ offset(log(service)) + type + year + period, family = poisson, data = ships[ships$service != 0, ]) Deviance Residuals: Min 1Q Median 3Q Max -2.5348 -0.9319 -0.3686 0.4654 2.8833 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -10.079076 0.876149 -11.504 < 2e-16 *** typeB -0.546090 0.178415 -3.061 0.002208 ** typeC -0.632631 0.329500 -1.920 0.054862 . typeD -0.232257 0.287979 -0.807 0.419951 typeE 0.405975 0.234933 1.728 0.083981 . year 0.042247 0.012826 3.294 0.000988 *** period 0.023705 0.008091 2.930 0.003392 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 146.328 on 33 degrees of freedom Residual deviance: 59.375 on 27 degrees of freedom AIC: 171.24 Number of Fisher Scoring iterations: 5 > mod.main <- glm(incidents~log(service)+type+year+period,data=ships[ships$service!=0,],family=poisson) > summary(mod.main) Call: glm(formula = incidents ~ log(service) + type + year + period, family = poisson, data = ships[ships$service != 0, ]) Deviance Residuals: Min 1Q Median 3Q Max -2.2355 -1.0345 -0.4454 0.6005 2.8353 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -8.616856 1.528004 -5.639 1.71e-08 *** log(service) 0.886469 0.099297 8.927 < 2e-16 *** typeB -0.330248 0.261301 -1.264 0.2063 typeC -0.736295 0.341342 -2.157 0.0310 * typeD -0.284220 0.291989 -0.973 0.3304 typeE 0.335936 0.242645 1.384 0.1662 year 0.035468 0.013802 2.570 0.0102 * period 0.022079 0.008114 2.721 0.0065 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 614.539 on 33 degrees of freedom Residual deviance: 58.114 on 26 degrees of freedom AIC: 171.98 Number of Fisher Scoring iterations: 5 (f) > mod.interaction <- glm(incidents~log(service)+(type+year+period)^2,data=ships[ships$service!=0,],family=poisson) > summary(mod.interaction) Call: glm(formula = incidents ~ log(service) + (type + year + period)^2, family = poisson, data = ships[ships$service != 0, ]) Deviance Residuals: Min 1Q Median 3Q Max -1.7275 -0.6130 -0.2162 0.6046 1.6672 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -54.594381 13.306378 -4.103 4.08e-05 *** log(service) 1.572735 0.208786 7.533 4.97e-14 *** typeB -7.223085 4.597967 -1.571 0.116199 typeC -0.878883 7.072650 -0.124 0.901105 typeD 8.012326 6.273882 1.277 0.201570 typeE 21.440036 6.492087 3.302 0.000958 *** year 0.653653 0.184827 3.537 0.000405 *** period 0.606442 0.168771 3.593 0.000327 *** typeB:year 0.075003 0.059813 1.254 0.209857 typeC:year 0.100314 0.104084 0.964 0.335153 typeD:year -0.125715 0.088079 -1.427 0.153492 typeE:year -0.320983 0.098754 -3.250 0.001153 ** typeB:period 0.006657 0.029908 0.223 0.823864 typeC:period -0.091645 0.048176 -1.902 0.057132 . typeD:period 0.016939 0.062733 0.270 0.787151 typeE:period 0.018335 0.037186 0.493 0.621961 year:period -0.008846 0.002478 -3.570 0.000357 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 614.539 on 33 degrees of freedom Residual deviance: 23.587 on 17 degrees of freedom AIC: 155.45 Number of Fisher Scoring iterations: 6 > 1-pchisq(23.587,17) [1] 0.131098 (i) > phi.one <- sum(residuals(mod.one,"pearson")^2)/mod.one$df.residual > summary(mod.one,dispersion=phi.one) Call: glm(formula = incidents ~ offset(log(service)) + type + year + period, family = poisson, data = ships[ships$service != 0, ]) Deviance Residuals: Min 1Q Median 3Q Max -2.5348 -0.9319 -0.3686 0.4654 2.8833 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -10.07908 1.36829 -7.366 1.76e-13 *** typeB -0.54609 0.27863 -1.960 0.0500 . typeC -0.63263 0.51458 -1.229 0.2189 typeD -0.23226 0.44974 -0.516 0.6056 typeE 0.40597 0.36690 1.107 0.2685 year 0.04225 0.02003 2.109 0.0349 * period 0.02370 0.01264 1.876 0.0607 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for poisson family taken to be 2.438925) Null deviance: 146.328 on 33 degrees of freedom Residual deviance: 59.375 on 27 degrees of freedom AIC: 171.24 Number of Fisher Scoring iterations: 5 > > > phi.interaction <- sum(residuals(mod.interaction,"pearson")^2)/mod.interaction$df.residual > summary(mod.interaction,dispersion=phi.interaction) Call: glm(formula = incidents ~ log(service) + (type + year + period)^2, family = poisson, data = ships[ships$service != 0, ]) Deviance Residuals: Min 1Q Median 3Q Max -1.7275 -0.6130 -0.2162 0.6046 1.6672 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -54.594381 15.366085 -3.553 0.000381 *** log(service) 1.572735 0.241104 6.523 6.89e-11 *** typeB -7.223085 5.309690 -1.360 0.173716 typeC -0.878883 8.167432 -0.108 0.914307 typeD 8.012326 7.245022 1.106 0.268767 typeE 21.440036 7.497003 2.860 0.004239 ** year 0.653653 0.213436 3.063 0.002195 ** period 0.606442 0.194895 3.112 0.001861 ** typeB:year 0.075003 0.069072 1.086 0.277535 typeC:year 0.100314 0.120195 0.835 0.403944 typeD:year -0.125715 0.101713 -1.236 0.216464 typeE:year -0.320983 0.114040 -2.815 0.004883 ** typeB:period 0.006657 0.034537 0.193 0.847160 typeC:period -0.091645 0.055633 -1.647 0.099494 . typeD:period 0.016939 0.072444 0.234 0.815126 typeE:period 0.018335 0.042942 0.427 0.669394 year:period -0.008846 0.002861 -3.092 0.001990 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for poisson family taken to be 1.333542) Null deviance: 614.539 on 33 degrees of freedom Residual deviance: 23.587 on 17 degrees of freedom AIC: 155.45 Number of Fisher Scoring iterations: 6 ############## # Problem 2 ############## parstum <- read.table("c:\\tonglinzhang\\stat526\\spring2018\\parstum.txt",h=T) parstum (a) (ov <- xtabs(count~parent+student,data=parstum)) (b) (pp <- prop.table(xtabs(count~student,data=parstum))) (qq <- prop.table(xtabs(count~parent,data=parstum))) (fv <- outer(qq,pp)*sum(ov)) sum((ov-fv)^2/fv) (c) mod.main <- glm(count~parent+student,family=poisson,data=parstum) summary(mod.main) (d) uu <- rep(1:3,rep(3,3)) vv <- rep(1:3,3) uu vv mod.ll <- glm(count~parent+student+I(uu*vv),family=poisson,data=parstum) summary(mod.ll) ################ # Output ################ > (ov <- xtabs(count~parent+student,data=parstum)) student parent Never Occasional Regular Both 17 11 19 Neither 141 54 40 One 68 44 51 student Never Occasional Regular 0.5078652 0.2449438 0.2471910 > (qq <- prop.table(xtabs(count~parent,data=parstum))) parent Both Neither One 0.1056180 0.5280899 0.3662921 > (fv <- outer(qq,pp)*sum(ov)) student parent Never Occasional Regular Both 23.86966 11.51236 11.61798 Neither 119.34831 57.56180 58.08989 One 82.78202 39.92584 40.29213 > > sum((ov-fv)^2/fv) [1] 22.37314 > mod.main <- glm(count~parent+student,family=poisson,data=parstum) > summary(mod.main) Call: glm(formula = count ~ parent + student, family = poisson, data = parstum) Deviance Residuals: 1 2 3 4 5 6 7 8 9 1.9261 -0.4744 -2.5161 -1.6770 0.6342 1.6194 -1.4832 -0.1522 1.9818 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 3.1726 0.1531 20.716 < 2e-16 *** parentNeither 1.6094 0.1598 10.072 < 2e-16 *** parentOne 1.2436 0.1656 7.511 5.85e-14 *** studentOccasional -0.7292 0.1166 -6.253 4.03e-10 *** studentRegular -0.7201 0.1163 -6.194 5.88e-10 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 218.594 on 8 degrees of freedom Residual deviance: 22.254 on 4 degrees of freedom AIC: 81.584 Number of Fisher Scoring iterations: 4 > uu <- rep(1:3,rep(3,3)) > vv <- rep(1:3,3) > > uu [1] 1 1 1 2 2 2 3 3 3 > vv [1] 1 2 3 1 2 3 1 2 3 > > mod.ll <- glm(count~parent+student+I(uu*vv),family=poisson,data=parstum) > summary(mod.ll) Call: glm(formula = count ~ parent + student + I(uu * vv), family = poisson, data = parstum) Deviance Residuals: 1 2 3 4 5 6 7 8 9 0.19377 -0.06715 -0.27943 -0.58653 0.26591 0.45951 0.68322 -0.36221 -0.31759 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.49784 0.42767 3.502 0.000461 *** parentNeither 3.04624 0.37619 8.098 5.61e-16 *** parentOne 2.01532 0.25501 7.903 2.72e-15 *** studentOccasional -1.32261 0.17344 -7.626 2.42e-14 *** studentRegular -1.97630 0.31000 -6.375 1.83e-10 *** I(uu * vv) 0.38832 0.08669 4.479 7.49e-06 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 218.5943 on 8 degrees of freedom Residual deviance: 1.4449 on 3 degrees of freedom AIC: 62.775 Number of Fisher Scoring iterations: 4 ################ # R code for problem 3 ################ melanoma <- read.table("c:\\tonglinzhang\\stat526\\spring2018\\melanoma.txt",h=T) melanoma (a) (ov <- xtabs(count~tumor+site,data=melanoma)) (b) (pp <- prop.table(xtabs(count~site,data=melanoma))) (qq <- prop.table(xtabs(count~tumor,data=melanoma))) (fv <- outer(qq,pp)*sum(ov)) sum((ov-fv)^2/fv) (c) mod.main <- glm(count~tumor+site,family=poisson,data=melanoma) summary(mod.main) (d) z <- xtabs(residuals(mod.main)~tumor+site,data=melanoma) z svdz <- svd(z) svdz$d svdz$d^2/sum(svdz$d^2) leftsv <- svdz$u[,1:2]%*%diag(sqrt(svdz$d[1:2])) rightsv <- svdz$v[,1:2]%*%diag(sqrt(svdz$d[1:2])) ll <- 1.1*max(abs(rightsv),abs(leftsv)) plot(rbind(leftsv,rightsv),asp=1,xlim=c(-ll,ll),ylim=c(-ll,ll),xlab="SV1",ylab="SV2",type="n") abline(h=0,v=0) text(leftsv,dimnames(z)[[1]]) text(rightsv,dimnames(z)[[2]]) (e) mod.main <- glm(count~tumor+site,family=poisson,data=melanoma[-(1:4),]) summary(mod.main) ######### # Output ######### > (ov <- xtabs(count~tumor+site,data=melanoma)) site tumor extremity head trunk freckle 10 22 2 indeterminate 28 11 17 nodular 73 19 33 superficial 115 16 54 > (pp <- prop.table(xtabs(count~site,data=melanoma))) site extremity head trunk 0.565 0.170 0.265 > (qq <- prop.table(xtabs(count~tumor,data=melanoma))) tumor freckle indeterminate nodular superficial 0.0850 0.1400 0.3125 0.4625 > (fv <- outer(qq,pp)*sum(ov)) site tumor extremity head trunk freckle 19.210 5.78 9.010 indeterminate 31.640 9.52 14.840 nodular 70.625 21.25 33.125 superficial 104.525 31.45 49.025 > > sum((ov-fv)^2/fv) [1] 65.81293 > mod.main <- glm(count~tumor+site,family=poisson,data=melanoma) > summary(mod.main) Call: glm(formula = count ~ tumor + site, family = poisson, data = melanoma) Deviance Residuals: Min 1Q Median 3Q Max -3.0453 -1.0741 0.1297 0.5857 5.1354 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 2.9554 0.1770 16.696 < 2e-16 *** tumorindeterminate 0.4990 0.2174 2.295 0.0217 * tumornodular 1.3020 0.1934 6.731 1.68e-11 *** tumorsuperficial 1.6940 0.1866 9.079 < 2e-16 *** sitehead -1.2010 0.1383 -8.683 < 2e-16 *** sitetrunk -0.7571 0.1177 -6.431 1.27e-10 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 295.203 on 11 degrees of freedom Residual deviance: 51.795 on 6 degrees of freedom AIC: 122.91 Number of Fisher Scoring iterations: 5 > z <- xtabs(residuals(mod.main)~tumor+site,data=melanoma) > z site tumor extremity head trunk freckle -2.31583297 5.13537787 -2.82829426 indeterminate -0.66016102 0.46798432 0.54787007 nodular 0.28104581 -0.49711084 -0.02173229 superficial 1.00813975 -3.04533605 0.69899703 > mod.main <- glm(count~tumor+site,family=poisson,data=melanoma[-(1:4),]) > summary(mod.main) Call: glm(formula = count ~ tumor + site, family = poisson, data = melanoma[-(1:4), ]) Deviance Residuals: 5 6 7 8 9 10 11 12 -1.03073 0.00574 -0.14558 0.67478 0.61880 -0.00393 0.09909 -0.48271 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 2.1003 0.2911 7.215 5.40e-13 *** tumorindeterminate 1.3218 0.3249 4.068 4.74e-05 *** tumornodular 2.1785 0.3046 7.153 8.51e-13 *** tumorsuperficial 2.6450 0.2987 8.854 < 2e-16 *** sitetrunk -0.7571 0.1177 -6.431 1.27e-10 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 237.2178 on 7 degrees of freedom Residual deviance: 2.1647 on 3 degrees of freedom AIC: 52.677 Number of Fisher Scoring iterations: 4