> ############### > # Section 1 > ############### > > y <- c(320,14,80,36) > particle <- gl(2,1,4,labels=c("no","yes")) > quality <- gl(2,2,labels=c("good","bad")) > (wafer <- data.frame(y,particle,quality)) y particle quality 1 320 no good 2 14 yes good 3 80 no bad 4 36 yes bad > (ov <- xtabs(y ~ quality+particle)) particle quality no yes good 320 14 bad 80 36 > > modl <- glm(y ~ particle+quality, poisson) > library(faraway) Attaching package: ‘faraway’ The following object is masked _by_ ‘.GlobalEnv’: wafer Warning message: package ‘faraway’ was built under R version 3.4.3 > sumary(modl) Estimate Std. Error z value Pr(>|z|) (Intercept) 5.69336 0.05720 99.5350 < 2.2e-16 particleyes -2.07944 0.15000 -13.8630 < 2.2e-16 qualitybad -1.05755 0.10777 -9.8129 < 2.2e-16 n = 4 p = 3 Deviance = 54.03045 Null Deviance = 474.09877 (Difference = 420.06832) > drop1(modl,test="Chi") Single term deletions Model: y ~ particle + quality Df Deviance AIC LRT Pr(>Chi) 54.03 83.77 particle 1 363.91 391.66 309.88 < 2.2e-16 *** quality 1 164.22 191.96 110.19 < 2.2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > > ############## > # drop1 is similar to SS3 > # anova is similar to SS1 > ############## > > (t(model.matrix(modl)) %*% y)[,] (Intercept) particleyes qualitybad 450 50 116 > (pp <- prop.table( xtabs(y ~ particle))) particle no yes 0.8888889 0.1111111 > (qp <- prop.table( xtabs(y ~ quality))) quality good bad 0.7422222 0.2577778 > (fv <- outer(qp,pp)*450) particle quality no yes good 296.8889 37.11111 bad 103.1111 12.88889 > > ############ > # Predicted count > ############ > > 2*sum(ov*log(ov/fv)) [1] 54.03045 > sum( (ov-fv)^2/fv) [1] 62.81231 > prop.test(ov) 2-sample test for equality of proportions with continuity correction data: ov X-squared = 60.124, df = 1, p-value = 8.907e-15 alternative hypothesis: two.sided 95 percent confidence interval: 0.1757321 0.3611252 sample estimates: prop 1 prop 2 0.9580838 0.6896552 > > ############ > # The way to compute X2 and G2 > # The last one contains continuity correction. > ############ > > (m <- matrix(y,nrow=2)) [,1] [,2] [1,] 320 80 [2,] 14 36 > modb <- glm(m ~ 1, family=binomial) > deviance(modb) [1] 54.03045 > fisher.test(ov) Fisher's Exact Test for Count Data data: ov p-value = 2.955e-13 alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 5.090628 21.544071 sample estimates: odds ratio 10.21331 > > ############ > # The above if Fisher exact test. It is > # developed based on multinomial distirbution but not > # normal approximation. > ############ > > (320*36)/(14*80) [1] 10.28571 > > ############### > # The above is sample odds ratio > ############### > > ############### > # Section 2 > ############### > > data(haireye, package="faraway") > haireye y eye hair 1 5 green BLACK 2 29 green BROWN 3 14 green RED 4 16 green BLOND 5 15 hazel BLACK 6 54 hazel BROWN 7 14 hazel RED 8 10 hazel BLOND 9 20 blue BLACK 10 84 blue BROWN 11 17 blue RED 12 94 blue BLOND 13 68 brown BLACK 14 119 brown BROWN 15 26 brown RED 16 7 brown BLOND > (ct <- xtabs(y ~ hair + eye, haireye)) eye hair green hazel blue brown BLACK 5 15 20 68 BROWN 29 54 84 119 RED 14 14 17 26 BLOND 16 10 94 7 > summary(ct) Call: xtabs(formula = y ~ hair + eye, data = haireye) Number of cases in table: 592 Number of factors: 2 Test for independence of all factors: Chisq = 138.29, df = 9, p-value = 2.325e-25 > dotchart(ct) > mosaicplot(ct, color=TRUE, main=NULL, las=1) > modc <- glm(y ~ hair+eye, family=poisson, haireye) > sumary(modc) Estimate Std. Error z value Pr(>|z|) (Intercept) 2.45751 0.15230 16.1361 < 2.2e-16 hairBROWN 0.97386 0.11294 8.6227 < 2.2e-16 hairRED -0.41945 0.15279 -2.7453 0.006045 hairBLOND 0.16206 0.13089 1.2381 0.215690 eyehazel 0.37372 0.16241 2.3010 0.021389 eyeblue 1.21175 0.14239 8.5099 < 2.2e-16 eyebrown 1.23474 0.14202 8.6940 < 2.2e-16 n = 16 p = 7 Deviance = 146.44358 Null Deviance = 453.30765 (Difference = 306.86407) > > ############ > # No way to fit a better model. > ############ > > ############ > # Section 3 > ############ > > z <- xtabs(residuals(modc,type="pearson")~hair+eye,haireye) > svdz <- svd(z,2,2) > leftsv <- svdz$u %*% diag(sqrt(svdz$d[1:2])) > rightsv <- svdz$v %*% diag(sqrt(svdz$d[1:2])) > > ############# > # Singular Value Decomposition (SVD). > # It is a popular method for dimension reduction. > ############# > > ll <- 1.1*max(abs(rightsv),abs(leftsv)) > plot(rbind(leftsv,rightsv),asp=1,xlim=c(-ll,ll),ylim=c(-ll,ll), xlab="SV1",ylab="SV2",type="n") > abline(h=0,v=0) > text(leftsv,dimnames(z)[[1]]) > text(rightsv,dimnames(z)[[2]]) > > ############### > # Pay attention to how to explain the figure. > # You need to focus on the first dimension and then look at the second dimension. > ############### > > ############ > # Section 4 > ############ > > data(eyegrade) > (ct <- xtabs(y ~ right+left, eyegrade)) left right best second third worst best 1520 266 124 66 second 234 1512 432 78 third 117 362 1772 205 worst 36 82 179 492 > summary(ct) Call: xtabs(formula = y ~ right + left, data = eyegrade) Number of cases in table: 7477 Number of factors: 2 Test for independence of all factors: Chisq = 8097, df = 9, p-value = 0 > (symfac <- factor(apply(eyegrade[,2:3],1, function(x) paste(sort(x),collapse="-")))) 1 2 3 4 5 best-best best-second best-third best-worst best-second 6 7 8 9 10 second-second second-third second-worst best-third second-third 11 12 13 14 15 third-third third-worst best-worst second-worst third-worst 16 worst-worst 10 Levels: best-best best-second best-third best-worst ... worst-worst > > ########### > # We want to define a new variable symfac to fit a symmetric model. > # This is a trick. > ########### > > mods <- glm(y ~ symfac, eyegrade, family=poisson) > summary(mods) Call: glm(formula = y ~ symfac, family = poisson, data = eyegrade) Deviance Residuals: Min 1Q Median 3Q Max -2.2185 -0.4776 0.0000 0.4700 2.0083 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 7.326466 0.025649 285.638 < 2e-16 *** symfacbest-second -1.805005 0.051555 -35.011 < 2e-16 *** symfacbest-third -2.534816 0.069334 -36.559 < 2e-16 *** symfacbest-worst -3.394640 0.102283 -33.189 < 2e-16 *** symfacsecond-second -0.005277 0.036322 -0.145 0.884 symfacsecond-third -1.342529 0.043787 -30.660 < 2e-16 *** symfacsecond-worst -2.944439 0.083114 -35.427 < 2e-16 *** symfacthird-third 0.153399 0.034960 4.388 1.15e-05 *** symfacthird-worst -2.068970 0.057114 -36.225 < 2e-16 *** symfacworst-worst -1.127987 0.051869 -21.747 < 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 8692.334 on 15 degrees of freedom Residual deviance: 19.249 on 6 degrees of freedom AIC: 156.63 Number of Fisher Scoring iterations: 4 > > c(deviance(mods),df.residual(mods)) [1] 19.24919 6.00000 > pchisq(deviance(mods),df.residual(mods),lower=F) [1] 0.003762852 > > round(matrix(mods$fit,4),4) [,1] [,2] [,3] [,4] [1,] 1520.0 250 120.5 51 [2,] 250.0 1512 397.0 80 [3,] 120.5 397 1772.0 192 [4,] 51.0 80 192.0 492 > > ########### > # Predicted counts are symmetric. > # Pay attention the residual DF. Interpret. > ########### > > round(xtabs(residuals(mods) ~ right+left, eyegrade),3) left right best second third worst best 0.000 1.001 0.317 2.008 second -1.023 0.000 1.732 -0.225 third -0.320 -1.783 0.000 0.928 worst -2.219 0.223 -0.949 0.000 > > margin.table(ct,1) right best second third worst 1976 2256 2456 789 > margin.table(ct,2) left best second third worst 1907 2222 2507 841 > modq <- glm(y ~ right+left+symfac, eyegrade, family=poisson) > summary(modq) Call: glm(formula = y ~ right + left + symfac, family = poisson, data = eyegrade) Deviance Residuals: 1 2 3 4 5 6 7 8 0.0000 0.1612 -0.8394 0.8894 -0.1706 0.0000 0.6325 -1.1284 9 10 11 12 13 14 15 16 0.9114 -0.6760 0.0000 0.2409 -1.0933 1.2003 -0.2548 0.0000 Coefficients: (3 not defined because of singularities) Estimate Std. Error z value Pr(>|z|) (Intercept) 7.32647 0.02565 285.638 < 2e-16 *** rightsecond -2.43955 0.09055 -26.942 < 2e-16 *** rightthird -1.61523 0.06955 -23.223 < 2e-16 *** rightworst -0.72288 0.05641 -12.816 < 2e-16 *** leftsecond -2.33241 0.09149 -25.493 < 2e-16 *** leftthird -1.39721 0.07012 -19.927 < 2e-16 *** leftworst -0.40510 0.05641 -7.182 6.88e-13 *** symfacbest-second 0.57954 0.09462 6.125 9.07e-10 *** symfacbest-third -1.03453 0.08633 -11.983 < 2e-16 *** symfacbest-worst -2.84322 0.10266 -27.696 < 2e-16 *** symfacsecond-second 4.76669 0.16668 28.598 < 2e-16 *** symfacsecond-third 2.54814 0.11038 23.085 < 2e-16 *** symfacsecond-worst NA NA NA NA symfacthird-third 3.16584 0.11415 27.734 < 2e-16 *** symfacthird-worst NA NA NA NA symfacworst-worst NA NA NA NA --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 8692.3336 on 15 degrees of freedom Residual deviance: 7.2708 on 3 degrees of freedom AIC: 150.65 Number of Fisher Scoring iterations: 4 > > pchisq(deviance(modq),df.residual(modq),lower=F) [1] 0.06375054 > > ########## > # The above contains main effects and a symmetric interaction effect. > # Explain the residual DF by yourself. > # You need to compare the previous two models. > ########## > > anova(mods,modq,test="Chi") Analysis of Deviance Table Model 1: y ~ symfac Model 2: y ~ right + left + symfac Resid. Df Resid. Dev Df Deviance Pr(>Chi) 1 6 19.2492 2 3 7.2708 3 11.978 0.007457 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > modqi <- glm(y ~ right + left, eyegrade, family=poisson, subset= -c(1,6,11,16)) > pchisq(deviance(modqi),df.residual(modqi),lower=F) [1] 4.411781e-41 > > ########### > # Section 5 > ########### > > data(femsmoke) > femsmoke y smoker dead age 1 2 yes yes 18-24 2 1 no yes 18-24 3 3 yes yes 25-34 4 5 no yes 25-34 5 14 yes yes 35-44 6 7 no yes 35-44 7 27 yes yes 45-54 8 12 no yes 45-54 9 51 yes yes 55-64 10 40 no yes 55-64 11 29 yes yes 65-74 12 101 no yes 65-74 13 13 yes yes 75+ 14 64 no yes 75+ 15 53 yes no 18-24 16 61 no no 18-24 17 121 yes no 25-34 18 152 no no 25-34 19 95 yes no 35-44 20 114 no no 35-44 21 103 yes no 45-54 22 66 no no 45-54 23 64 yes no 55-64 24 81 no no 55-64 25 7 yes no 65-74 26 28 no no 65-74 27 0 yes no 75+ 28 0 no no 75+ > (ct <- xtabs(y ~ smoker+dead,femsmoke)) dead smoker yes no yes 139 443 no 230 502 > prop.table(ct,1) dead smoker yes no yes 0.2388316 0.7611684 no 0.3142077 0.6857923 > summary(ct) Call: xtabs(formula = y ~ smoker + dead, data = femsmoke) Number of cases in table: 1314 Number of factors: 2 Test for independence of all factors: Chisq = 9.121, df = 1, p-value = 0.002527 > (cta <- xtabs(y ~ smoker+dead,femsmoke, subset=(age=="55-64"))) dead smoker yes no yes 51 64 no 40 81 > prop.table(cta,1) dead smoker yes no yes 0.4434783 0.5565217 no 0.3305785 0.6694215 > > ############# > # Given angle group, compute the observed probabilities. > ############# > > prop.table(xtabs(y ~ smoker+age, femsmoke),2) age smoker 18-24 25-34 35-44 45-54 55-64 65-74 75+ yes 0.4700855 0.4412811 0.4739130 0.6250000 0.4872881 0.2181818 0.1688312 no 0.5299145 0.5587189 0.5260870 0.3750000 0.5127119 0.7818182 0.8311688 > > ########## > # All of the observed probabilityes. > ########## > > fisher.test(cta) Fisher's Exact Test for Count Data data: cta p-value = 0.08304 alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 0.9203097 2.8334038 sample estimates: odds ratio 1.610329 > > ########### > # Exact test > ########### > > ct3 <- xtabs(y ~ smoker+dead+age,femsmoke) > ct3 , , age = 18-24 dead smoker yes no yes 2 53 no 1 61 , , age = 25-34 dead smoker yes no yes 3 121 no 5 152 , , age = 35-44 dead smoker yes no yes 14 95 no 7 114 , , age = 45-54 dead smoker yes no yes 27 103 no 12 66 , , age = 55-64 dead smoker yes no yes 51 64 no 40 81 , , age = 65-74 dead smoker yes no yes 29 7 no 101 28 , , age = 75+ dead smoker yes no yes 13 0 no 64 0 > > apply(ct3, 3, function(x) (x[1,1]*x[2,2])/(x[1,2]*x[2,1])) 18-24 25-34 35-44 45-54 55-64 65-74 75+ 2.301887 0.753719 2.400000 1.441748 1.613672 1.148515 NaN > mantelhaen.test(ct3,exact=TRUE) Exact conditional test of independence in 2 x 2 x k tables data: ct3 S = 139, p-value = 0.01591 alternative hypothesis: true common odds ratio is not equal to 1 95 percent confidence interval: 1.068889 2.203415 sample estimates: common odds ratio 1.530256 > summary(ct3) Call: xtabs(formula = y ~ smoker + dead + age, data = femsmoke) Number of cases in table: 1314 Number of factors: 3 Test for independence of all factors: Chisq = 790.6, df = 19, p-value = 2.14e-155 > > ########## > # Descriptive Analysis and test independent given age groups. > ########## > > modi <- glm(y ~ smoker + dead + age, femsmoke, family=poisson) > c(deviance(modi),df.residual(modi)) [1] 735.0028 19.0000 > (coefsmoke <- exp(c(0,coef(modi)[2]))) smokerno 1.000000 1.257732 > coefsmoke/sum(coefsmoke) smokerno 0.4429224 0.5570776 > prop.table(xtabs(y ~ smoker, femsmoke)) smoker yes no 0.4429224 0.5570776 > > ############# > # It is the main effect Poisson model, which is also the independence model. > ############ > > modj <- glm(y ~ smoker*dead + age, femsmoke, family=poisson) > c(deviance(modj),df.residual(modj)) [1] 725.8025 18.0000 > > ########### > # The above is joint independence model. > ########### > > modc <- glm(y ~ smoker*age + age*dead, femsmoke, family=poisson) > c(deviance(modc),df.residual(modc)) [1] 8.326939 7.000000 > > ########## > # It is conditional independence model. > ########## > > modu <- glm(y ~ (smoker+age+dead)^2, femsmoke, family=poisson) > > ######### > # It is uniform association model. > # It does not have a good interpretation. > ######### > > ctf <- xtabs(fitted(modu) ~ smoker+dead+age,femsmoke) # predicted probabilities > > apply(ctf, 3, function(x) (x[1,1]*x[2,2])/(x[1,2]*x[2,1])) 18-24 25-34 35-44 45-54 55-64 65-74 75+ 1.533275 1.533275 1.533275 1.533275 1.533275 1.533275 1.533275 > exp(coef(modu)['smokerno:deadno']) smokerno:deadno 1.533275 > modsat <- glm(y ~ smoker*age*dead, femsmoke, family=poisson) > summary(modsat) Call: glm(formula = y ~ smoker * age * dead, family = poisson, data = femsmoke) Deviance Residuals: [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [26] 0 0 0 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 6.931e-01 7.071e-01 0.980 0.326959 smokerno -6.931e-01 1.225e+00 -0.566 0.571426 age25-34 4.055e-01 9.129e-01 0.444 0.656923 age35-44 1.946e+00 7.559e-01 2.574 0.010047 * age45-54 2.603e+00 7.328e-01 3.552 0.000383 *** age55-64 3.239e+00 7.208e-01 4.493 7.02e-06 *** age65-74 2.674e+00 7.311e-01 3.658 0.000254 *** age75+ 1.872e+00 7.596e-01 2.464 0.013727 * deadno 3.277e+00 7.203e-01 4.550 5.38e-06 *** smokerno:age25-34 1.204e+00 1.426e+00 0.844 0.398485 smokerno:age35-44 -9.573e-15 1.309e+00 0.000 1.000000 smokerno:age45-54 -1.178e-01 1.273e+00 -0.093 0.926278 smokerno:age55-64 4.502e-01 1.243e+00 0.362 0.717172 smokerno:age65-74 1.941e+00 1.243e+00 1.562 0.118321 smokerno:age75+ 2.287e+00 1.262e+00 1.812 0.069937 . smokerno:deadno 8.337e-01 1.239e+00 0.673 0.501027 age25-34:deadno 4.200e-01 9.276e-01 0.453 0.650685 age35-44:deadno -1.362e+00 7.751e-01 -1.758 0.078824 . age45-54:deadno -1.938e+00 7.521e-01 -2.577 0.009960 ** age55-64:deadno -3.050e+00 7.444e-01 -4.097 4.18e-05 *** age65-74:deadno -4.699e+00 8.344e-01 -5.631 1.79e-08 *** age75+:deadno -2.914e+01 6.965e+04 0.000 0.999666 smokerno:age25-34:deadno -1.116e+00 1.443e+00 -0.773 0.439232 smokerno:age35-44:deadno 4.174e-02 1.330e+00 0.031 0.974964 smokerno:age45-54:deadno -4.679e-01 1.296e+00 -0.361 0.718160 smokerno:age55-64:deadno -3.552e-01 1.268e+00 -0.280 0.779372 smokerno:age65-74:deadno -6.953e-01 1.326e+00 -0.524 0.600044 smokerno:age75+:deadno -2.428e+00 9.851e+04 0.000 0.999980 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 1.1939e+03 on 27 degrees of freedom Residual deviance: 3.0330e-10 on 0 degrees of freedom AIC: 190.19 Number of Fisher Scoring iterations: 21 > > ########### > # Saturated Model > ########### > > drop1(modsat,test="Chi") Single term deletions Model: y ~ smoker * age * dead Df Deviance AIC LRT Pr(>Chi) 0.0000 190.19 smoker:age:dead 6 2.3809 180.58 2.3809 0.8815 > drop1(modu,test="Chi") Single term deletions Model: y ~ (smoker + age + dead)^2 Df Deviance AIC LRT Pr(>Chi) 2.38 180.58 smoker:age 6 92.63 258.83 90.25 < 2e-16 *** smoker:dead 1 8.33 184.52 5.95 0.01475 * age:dead 6 632.30 798.49 629.92 < 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > ybin <- matrix(femsmoke$y,ncol=2) > > ########### > # Poisson can be linked to binomial models. > ########### > > modbin <- glm(ybin ~ smoker*age, femsmoke[1:14,], family=binomial) > drop1(modbin,test="Chi") Single term deletions Model: ybin ~ smoker * age Df Deviance AIC LRT Pr(>Chi) 0.0000 74.996 smoker:age 6 2.3809 65.377 2.3809 0.8815 > > ########### > # Saturated binomial model > ########### > > modbinr <- glm(ybin ~ smoker+age, femsmoke[1:14,], family=binomial) > summary(modbinr) Call: glm(formula = ybin ~ smoker + age, family = binomial, data = femsmoke[1:14, ]) Deviance Residuals: Min 1Q Median 3Q Max -0.72545 -0.22836 0.00005 0.19146 0.68162 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -3.4327 0.5901 -5.817 6.00e-09 *** smokerno -0.4274 0.1770 -2.414 0.015762 * age25-34 0.1201 0.6865 0.175 0.861178 age35-44 1.3411 0.6286 2.134 0.032874 * age45-54 2.1134 0.6121 3.453 0.000555 *** age55-64 3.1808 0.6006 5.296 1.18e-07 *** age65-74 5.0880 0.6195 8.213 < 2e-16 *** age75+ 27.8073 11293.1430 0.002 0.998035 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 641.4963 on 13 degrees of freedom Residual deviance: 2.3809 on 6 degrees of freedom AIC: 65.377 Number of Fisher Scoring iterations: 20 > drop1(modbinr,test="Chi") Single term deletions Model: ybin ~ smoker + age Df Deviance AIC LRT Pr(>Chi) 2.38 65.38 smoker 1 8.33 69.32 5.95 0.01475 * age 6 632.30 683.29 629.92 < 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > > ########### > # Compare it with Poisson models. It is equivalent to the model with all > # of the three two-factor interaciton effects > ########### > > deviance(modu) [1] 2.380927 > deviance(modbinr) [1] 2.380927 > exp(-coef(modbinr)[2]) smokerno 1.533275 > modbinull <- glm(ybin ~ 1, femsmoke[1:14,], family=binomial) > deviance(modbinull) [1] 641.4963 > modj <- glm(y ~ smoker*age + dead, femsmoke, family=poisson) > deviance(modj) [1] 641.4963 > > ########## > # The two models are equivalent > ########## > > ############# > # Section 6 > ############# > > data(nes96) > xtabs( ~ PID + educ, nes96) educ PID MS HSdrop HS Coll CCdeg BAdeg MAdeg strDem 5 19 59 38 17 40 22 weakDem 4 10 49 36 17 41 23 indDem 1 4 28 15 13 27 20 indind 0 3 12 9 3 6 4 indRep 2 7 23 16 8 22 16 weakRep 0 5 35 40 15 38 17 strRep 1 4 42 33 17 53 25 > (partyed <- as.data.frame.table(xtabs( ~ PID + educ, nes96))) PID educ Freq 1 strDem MS 5 2 weakDem MS 4 3 indDem MS 1 4 indind MS 0 5 indRep MS 2 6 weakRep MS 0 7 strRep MS 1 8 strDem HSdrop 19 9 weakDem HSdrop 10 10 indDem HSdrop 4 11 indind HSdrop 3 12 indRep HSdrop 7 13 weakRep HSdrop 5 14 strRep HSdrop 4 15 strDem HS 59 16 weakDem HS 49 17 indDem HS 28 18 indind HS 12 19 indRep HS 23 20 weakRep HS 35 21 strRep HS 42 22 strDem Coll 38 23 weakDem Coll 36 24 indDem Coll 15 25 indind Coll 9 26 indRep Coll 16 27 weakRep Coll 40 28 strRep Coll 33 29 strDem CCdeg 17 30 weakDem CCdeg 17 31 indDem CCdeg 13 32 indind CCdeg 3 33 indRep CCdeg 8 34 weakRep CCdeg 15 35 strRep CCdeg 17 36 strDem BAdeg 40 37 weakDem BAdeg 41 38 indDem BAdeg 27 39 indind BAdeg 6 40 indRep BAdeg 22 41 weakRep BAdeg 38 42 strRep BAdeg 53 43 strDem MAdeg 22 44 weakDem MAdeg 23 45 indDem MAdeg 20 46 indind MAdeg 4 47 indRep MAdeg 16 48 weakRep MAdeg 17 49 strRep MAdeg 25 > nomod <- glm(Freq ~ PID + educ, partyed, family= poisson) > pchisq(deviance(nomod),df.residual(nomod),lower=F) [1] 0.2696086 > > ############## > # Main-effect model. Fit is good. > ############## > > partyed$oPID <- unclass(partyed$PID) > partyed$oeduc <- unclass(partyed$educ) > ormod <- glm(Freq ~ PID + educ + I(oPID*oeduc), partyed, family= poisson) > anova(nomod,ormod,test="Chi") Analysis of Deviance Table Model 1: Freq ~ PID + educ Model 2: Freq ~ PID + educ + I(oPID * oeduc) Resid. Df Resid. Dev Df Deviance Pr(>Chi) 1 36 40.743 2 35 30.568 1 10.175 0.001424 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > > ############## > # Linear-by-linear association model. Only 1 df in the interaction effect. > ############## > > summary(ormod)$coef['I(oPID * oeduc)',] Estimate Std. Error z value Pr(>|z|) 0.028744615 0.009061742 3.172084969 0.001513487 > apid <- c(1,2,5,6,7,10,11) > aedu <- c(1,1,1,2,2,3,3) > ormoda <- glm(Freq ~ PID + educ + I(apid[oPID]*aedu[oeduc]), partyed, family= poisson) > anova(nomod,ormoda,test="Chi") Analysis of Deviance Table Model 1: Freq ~ PID + educ Model 2: Freq ~ PID + educ + I(apid[oPID] * aedu[oeduc]) Resid. Df Resid. Dev Df Deviance Pr(>Chi) 1 36 40.743 2 35 30.929 1 9.8145 0.001731 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > > ############# > # It is also a linear-by-linear association model. The definition of scores are different. > ############# > > round(xtabs(predict(ormod,type="response") ~ PID + educ, partyed),2) educ PID MS HSdrop HS Coll CCdeg BAdeg MAdeg strDem 3.58 13.36 59.22 41.34 18.34 42.46 21.71 weakDem 2.92 11.22 51.20 36.78 16.80 40.02 21.06 indDem 1.59 6.27 29.45 21.78 10.23 25.09 13.59 indind 0.49 2.00 9.65 7.34 3.55 8.96 5.00 indRep 1.12 4.71 23.41 18.33 9.13 23.70 13.60 weakRep 1.61 6.95 35.59 28.68 14.69 39.28 23.19 strRep 1.69 7.49 39.48 32.74 17.26 47.49 28.85 > log(39.28*28.85/(47.49*23.19)) [1] 0.02858516 > log(23.70*23.19/(39.28*13.60)) [1] 0.02841092 > > apid [1] 1 2 5 6 7 10 11 > aedu [1] 1 1 1 2 2 3 3 > > ############ > # The first is log-odds of the last two rows and columns. > # Compare it with the second. > # Interpretation must be based on score values. > ############ > > round(xtabs(residuals(ormod,type="response") ~ PID + educ, partyed),2) educ PID MS HSdrop HS Coll CCdeg BAdeg MAdeg strDem 1.42 5.64 -0.22 -3.34 -1.34 -2.46 0.29 weakDem 1.08 -1.22 -2.20 -0.78 0.20 0.98 1.94 indDem -0.59 -2.27 -1.45 -6.78 2.77 1.91 6.41 indind -0.49 1.00 2.35 1.66 -0.55 -2.96 -1.00 indRep 0.88 2.29 -0.41 -2.33 -1.13 -1.70 2.40 weakRep -1.61 -1.95 -0.59 11.32 0.31 -1.28 -6.19 strRep -0.69 -3.49 2.52 0.26 -0.26 5.51 -3.85 > cmod <- glm(Freq ~ PID + educ + educ:oPID, partyed, family= poisson) > summary(cmod) Call: glm(formula = Freq ~ PID + educ + educ:oPID, family = poisson, data = partyed) Deviance Residuals: Min 1Q Median 3Q Max -1.40202 -0.47936 -0.08035 0.32315 1.48657 Coefficients: (1 not defined because of singularities) Estimate Std. Error z value Pr(>|z|) (Intercept) 1.945679 0.471322 4.128 3.66e-05 *** PIDweakDem -0.075007 0.109326 -0.686 0.492661 PIDindDem -0.560399 0.140521 -3.988 6.66e-05 *** PIDindind -1.610437 0.210250 -7.660 1.86e-14 *** PIDindRep -0.660584 0.192522 -3.431 0.000601 *** PIDweakRep -0.178992 0.211862 -0.845 0.398193 PIDstrRep -0.013421 0.241404 -0.056 0.955663 educHSdrop 1.061194 0.527977 2.010 0.044439 * educHS 2.161235 0.484489 4.461 8.16e-06 *** educColl 1.650803 0.491805 3.357 0.000789 *** educCCdeg 0.971275 0.513874 1.890 0.058744 . educBAdeg 1.722897 0.489151 3.522 0.000428 *** educMAdeg 1.281529 0.501813 2.554 0.010655 * educMS:oPID -0.312217 0.154051 -2.027 0.042692 * educHSdrop:oPID -0.194451 0.077228 -2.518 0.011806 * educHS:oPID -0.055347 0.048196 -1.148 0.250810 educColl:oPID 0.004460 0.050603 0.088 0.929760 educCCdeg:oPID -0.008699 0.060667 -0.143 0.885978 educBAdeg:oPID 0.034554 0.048782 0.708 0.478740 educMAdeg:oPID NA NA NA NA --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 626.798 on 48 degrees of freedom Residual deviance: 22.761 on 30 degrees of freedom AIC: 270.46 Number of Fisher Scoring iterations: 5 > > ########## > # Column-effect model. 6 df in interaction-effects. One is the baseline. > ########## > > anova(nomod,cmod,test="Chi") Analysis of Deviance Table Model 1: Freq ~ PID + educ Model 2: Freq ~ PID + educ + educ:oPID Resid. Df Resid. Dev Df Deviance Pr(>Chi) 1 36 40.743 2 30 22.761 6 17.982 0.006278 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > summary(cmod)$coef[14:19,] Estimate Std. Error z value Pr(>|z|) educMS:oPID -0.312216883 0.15405106 -2.02671032 0.04269205 educHSdrop:oPID -0.194451322 0.07722785 -2.51789108 0.01180598 educHS:oPID -0.055346984 0.04819553 -1.14838422 0.25080999 educColl:oPID 0.004460469 0.05060281 0.08814666 0.92976011 educCCdeg:oPID -0.008699398 0.06066725 -0.14339530 0.88597800 educBAdeg:oPID 0.034553940 0.04878224 0.70833027 0.47874018 > anova(ormod,cmod,test="Chi") Analysis of Deviance Table Model 1: Freq ~ PID + educ + I(oPID * oeduc) Model 2: Freq ~ PID + educ + educ:oPID Resid. Df Resid. Dev Df Deviance Pr(>Chi) 1 35 30.568 2 30 22.762 5 7.8068 0.1672 > > ########### > # Look at the df. > ########### > > aedu <- c(1,1,2,2,2,2,2) > ormodb <- glm(Freq ~ PID + educ + I(oPID*aedu[oeduc]), partyed, family= poisson) > summary(ormodb) Call: glm(formula = Freq ~ PID + educ + I(oPID * aedu[oeduc]), family = poisson, data = partyed) Deviance Residuals: Min 1Q Median 3Q Max -1.57015 -0.51363 -0.07621 0.40611 1.62183 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.26072 0.28643 4.402 1.07e-05 *** PIDweakDem -0.50311 0.15572 -3.231 0.001234 ** PIDindDem -1.41529 0.26481 -5.344 9.07e-08 *** PIDindind -2.89086 0.39979 -7.231 4.79e-13 *** PIDindRep -2.36537 0.49601 -4.769 1.85e-06 *** PIDweakRep -2.30701 0.61308 -3.763 0.000168 *** PIDstrRep -2.56356 0.73506 -3.488 0.000487 *** educHSdrop 1.38629 0.31008 4.471 7.79e-06 *** educHS 2.23864 0.33986 6.587 4.49e-11 *** educColl 1.95632 0.34179 5.724 1.04e-08 *** educCCdeg 1.22502 0.35012 3.499 0.000467 *** educBAdeg 2.15016 0.34041 6.316 2.68e-10 *** educMAdeg 1.56940 0.34546 4.543 5.55e-06 *** I(oPID * aedu[oeduc]) 0.20925 0.06246 3.350 0.000807 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 626.798 on 48 degrees of freedom Residual deviance: 28.451 on 35 degrees of freedom AIC: 266.15 Number of Fisher Scoring iterations: 4 > > aedu[partyed$oeduc] [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 [39] 2 2 2 2 2 2 2 2 2 2 2 > ########### > # Still the linear-by-linear association model. > ########## > > deviance(ormodb) [1] 28.45076 > deviance(ormod) [1] 30.56833