################## # Problem 1. ################## Problem 1: (a). I fit the proportational odds model with all the main effects and interaction effect and compute the goodness of the fit, which is 15.0366 based on 48-25=23 degrees of freedom. The p-value is 0.8934, which means the fit is good. (b). I fit the proportational odds model with the main effects only and compute the goodness of the fit, which is 47.728 based on 48-8=40 degrees of freedom. The p-value is 0.1874, which means the fit is good. The test shows that at least there is a significant interaction effect. (c). The test results are Model (HIC) (HI,IC,HC) (IH,HC) (I,HC) (I,H,C) Deviacne 3446.458 3448.583 3448.695 3470.483 3479.149 Df 25 19 17 11 8 Test HIC IC IH HC diff 2.1250 0.1116 21.7882 8.6662 p-value 0.9078 0.9457 0.0013 0.0341 Thus, the result shows that IH and HC are significant. I also compute the partial and margianl association but it is not required. The results are Effect Partial df P-value Margianl df p-value HIC 2.1250 6 0.9078 2.1250 6 0.9078 IC 0.1116 2 0.9457 0.2089 2 0.9008 HI 21.7174 6 0.0013 22.5094 6 0.0098 HC 7.9246 3 0.0476 8.6662 3 0.0341 ################## # Problem 2. 7.1 on textbook ################## (d) I fit a multinomial model with gender, race, ses, schtype, and the five subjects. The residual deviance is 305.8705. The model degrees of freedom is 26. I find that the sign of coefficien of science is positive. The rest coefficient of the five subject was negative. Thus, science gives unexpected coefficient. It provide an alteantive way to affect the programs. (e) The update model use the total scores of the five subject. The residual deviance is 328.6081. The model df is 26. The likelihood ratio statistic value is 328.6081-305.8705=22.73. Based on 8 degrees of freedom chi-square test, the p-value is 0.0037. Then, we prefer the previous model. (f) I use the step wise model selection for model with seperate subjects. The final model contains ses, math, science, and socty. The residual deviance is 315.55. The model degrees of freedom is 14. Compare it with model in (d). The likeilihood ratio statistic value is 315.5511-305.8705=9.6806. Based on the chisquare distribution with 12 degrees of freedom, we conclude the test is not significant. (i) The fitted probabilities are 0.64426309 0.27665609 0.07908082 for academic, general and vocation respectively. I provide two ways. The first is to direct use the R output. The second is to use coefficients. It is base on he variables are "ses", "schtyp", "math", "science" and "socst". ################## # Problem 3: 7.4 on textbook ################## (b) I use year as the explain variable. The residual deviance is 417.4496. The model df is 4. The coefficients shows that mild and severe were similar. The coefficients and intecepts are almost proportial. (c) I use log year to do the model again. The residual deviance is reduced to 408.8689. We still have the similar conclusion. The probability probabilities are 0.7922 for normal, 0.1109 for mild, and 0.0969 for severe. (e) I fit a proportional odds model. The estimate of theta1 is 9.6783. It interprets the probability of normal relative to the overall probability of mild and severe. The predicted probabilities are 0.7887 for normalk, 0.1136 for mild, and 0.0977 for severe. ############## # Output ############## # Problem 1. ############## > housesat <- read.table("c:\\data\\housesat.data",h=T) > housesat Satisfaction Influence Housing Contact Freq 1 Low Low Tower Low 21 2 Low Medium Tower Low 34 3 Low High Tower Low 10 4 Low Low Apartment Low 61 5 Low Medium Apartment Low 43 6 Low High Apartment Low 26 7 Low Low Atrium Low 13 8 Low Medium Atrium Low 8 9 Low High Atrium Low 6 10 Low Low Terraced Low 18 11 Low Medium Terraced Low 15 12 Low High Terraced Low 7 13 Low Low Tower High 14 14 Low Medium Tower High 17 15 Low High Tower High 3 16 Low Low Apartment High 78 17 Low Medium Apartment High 48 18 Low High Apartment High 15 19 Low Low Atrium High 20 20 Low Medium Atrium High 10 21 Low High Atrium High 7 22 Low Low Terraced High 57 23 Low Medium Terraced High 31 24 Low High Terraced High 5 25 Medium Low Tower Low 21 26 Medium Medium Tower Low 22 27 Medium High Tower Low 11 28 Medium Low Apartment Low 23 29 Medium Medium Apartment Low 35 30 Medium High Apartment Low 18 31 Medium Low Atrium Low 9 32 Medium Medium Atrium Low 8 33 Medium High Atrium Low 7 34 Medium Low Terraced Low 6 35 Medium Medium Terraced Low 13 36 Medium High Terraced Low 5 37 Medium Low Tower High 19 38 Medium Medium Tower High 23 39 Medium High Tower High 5 40 Medium Low Apartment High 46 41 Medium Medium Apartment High 45 42 Medium High Apartment High 25 43 Medium Low Atrium High 23 44 Medium Medium Atrium High 22 45 Medium High Atrium High 10 46 Medium Low Terraced High 23 47 Medium Medium Terraced High 21 48 Medium High Terraced High 6 49 High Low Tower Low 28 50 High Medium Tower Low 36 51 High High Tower Low 36 52 High Low Apartment Low 17 53 High Medium Apartment Low 40 54 High High Apartment Low 54 55 High Low Atrium Low 10 56 High Medium Atrium Low 12 57 High High Atrium Low 9 58 High Low Terraced Low 7 59 High Medium Terraced Low 13 60 High High Terraced Low 11 61 High Low Tower High 37 62 High Medium Tower High 40 63 High High Tower High 23 64 High Low Apartment High 43 65 High Medium Apartment High 86 66 High High Apartment High 62 67 High Low Atrium High 20 68 High Medium Atrium High 24 69 High High Atrium High 21 70 High Low Terraced High 13 71 High Medium Terraced High 13 72 High High Terraced High 13 (a) > g1 <- polr(Satisfaction~Influence*Housing*Contact,weight=Freq,data=housesat) > g1$deviance [1] 3446.458 > g1$edf [1] 25 > proportion <- g1$fit[1:24,] > observed.freq <- matrix(housesat$Freq,24) > total <- apply(observed.freq,1,sum) > predicted.freq <- total*proportion > > real.proportion <- observed.freq/total > good1 <- 2*sum(observed.freq*log(observed.freq/predicted.freq)) > good1 [1] 15.03668 > > 1-pchisq(good1,48-g1$edf) [1] 0.8933142 > g.saturated <- multinom(Satisfaction~Influence*Housing*Contact,weight=Freq,data=housesat) # weights: 75 (48 variable) initial value 1846.767257 iter 10 value 1723.797532 iter 20 value 1716.392454 iter 30 value 1715.780520 iter 40 value 1715.714937 final value 1715.710853 converged > g.saturated$deviance [1] 3431.422 > 1-pchisq(g1$deviance-g.saturated$deviance,g.saturated$edf-g1$edf) [1] 0.8933158 (b) > g <- polr(Satisfaction~Influence+Housing+Contact,weight=Freq,data=housesat) > proportion <- g$fit[1:24,] > observed.freq <- matrix(housesat$Freq,24) > total <- apply(observed.freq,1,sum) > predicted.freq <- total*proportion > > real.proportion <- observed.freq/total > good <- 2*sum(observed.freq*log(observed.freq/predicted.freq)) > good [1] 47.72764 > 1-pchisq(good-good1,g1$edf-g$edf) [1] 0.01233195 > 1-pchisq(g$deviance-g1$deviance,g1$edf-g$edf) [1] 0.01233195 > > 1-pchisq(good,48-g$edf) [1] 0.1874401 > > 1-pchisq(g$deviance-g.saturated$deviance,g.saturated$edf-g$edf) [1] 0.1874413 (c) > #### Influence:Housing:Contact > g.large <- polr(Satisfaction~Influence*Housing*Contact,weight=Freq,data=housesat) > g.small <- polr(Satisfaction~(Influence+Housing+Contact)^2,weight=Freq,data=housesat) > c(g.small$deviance-g.large$deviance,g.large$edf-g.small$edf) [1] 2.124984 6.000000 > 1-pchisq(g.small$deviance-g.large$deviance,g.large$edf-g.small$edf) [1] 0.9078523 > > #### Housing:Contact > g.large <- polr(Satisfaction~(Influence+Housing+Contact)^2,weight=Freq,data=housesat) > g.small <- polr(Satisfaction~Influence*(Housing+Contact),weight=Freq,data=housesat) > c(g.small$deviance-g.large$deviance,g.large$edf-g.small$edf) [1] 7.924571 3.000000 > 1-pchisq(g.small$deviance-g.large$deviance,g.large$edf-g.small$edf) [1] 0.04759659 > > g.large <- polr(Satisfaction~Influence+Housing*Contact,weight=Freq,data=housesat) > g.small <- polr(Satisfaction~Influence+Housing+Contact,weight=Freq,data=housesat) > c(g.small$deviance-g.large$deviance,g.large$edf-g.small$edf) [1] 8.666155 3.000000 > 1-pchisq(g.small$deviance-g.large$deviance,g.large$edf-g.small$edf) [1] 0.03407516 > > #### Influence:Housing > g.large <- polr(Satisfaction~(Influence+Housing+Contact)^2,weight=Freq,data=housesat) > g.small <- polr(Satisfaction~(Influence+Housing)*Contact,weight=Freq,data=housesat) > c(g.small$deviance-g.large$deviance,g.large$edf-g.small$edf) [1] 21.71740 6.00000 > 1-pchisq(g.small$deviance-g.large$deviance,g.large$edf-g.small$edf) [1] 0.001362224 > > g.large <- polr(Satisfaction~Influence*Housing+Contact,weight=Freq,data=housesat) > g.small <- polr(Satisfaction~Influence+Housing+Contact,weight=Freq,data=housesat) > c(g.small$deviance-g.large$deviance,g.large$edf-g.small$edf) [1] 22.50935 6.00000 > 1-pchisq(g.small$deviance-g.large$deviance,g.large$edf-g.small$edf) [1] 0.0009786183 > > #### Influence:Contact > > g.large <- polr(Satisfaction~(Influence+Housing+Contact)^2,weight=Freq,data=housesat) > g.small <- polr(Satisfaction~(Influence+Contact)*Housing,weight=Freq,data=housesat) > c(g.small$deviance-g.large$deviance,g.large$edf-g.small$edf) [1] 0.1116041 2.0000000 > 1-pchisq(g.small$deviance-g.large$deviance,g.large$edf-g.small$edf) [1] 0.9457263 > > g.large <- polr(Satisfaction~Influence*Contact+Housing,weight=Freq,data=housesat) > g.small <- polr(Satisfaction~Influence+Housing+Contact,weight=Freq,data=housesat) > c(g.small$deviance-g.large$deviance,g.large$edf-g.small$edf) [1] 0.2089536 2.0000000 > 1-pchisq(g.small$deviance-g.large$deviance,g.large$edf-g.small$edf) [1] 0.9007957 > > ########## > g2 <- polr(Satisfaction~(Influence+Housing+Contact)^2,weight=Freq,data=housesat) > g3 <- polr(Satisfaction~(Influence+Contact)*Housing,weight=Freq,data=housesat) > g4 <- polr(Satisfaction~Influence+Contact*Housing,weight=Freq,data=housesat) > g5 <- polr(Satisfaction~Influence+Contact+Housing,weight=Freq,data=housesat) > > 1-pchisq(g2$deviance-g1$deviance,g1$edf-g2$edf) [1] 0.9078523 > 1-pchisq(g3$deviance-g2$deviance,g2$edf-g3$edf) [1] 0.9457263 > 1-pchisq(g4$deviance-g3$deviance,g3$edf-g4$edf) [1] 0.001322641 > 1-pchisq(g5$deviance-g4$deviance,g4$edf-g5$edf) [1] 0.03407516 > c(g1$dev,g2$deviance,g3$deviance,g4$deviance,g5$deviance) [1] 3446.458 3448.583 3448.695 3470.483 3479.149 > c(g1$edf,g2$edf,g3$edf,g4$edf,g5$edf) [1] 25 19 17 11 8 ########## # Problem 2. 7.1 ########## hsb <- read.table("u:\\stat526\\dataset\\ascdata\\hsb.txt",h=T) summary(hsb) ###### # (d) ###### library(nnet) mod.sep <- multinom(prog~gender+race+ses+schtyp+read+write +math+science+socst,data=hsb) summary(mod.sep) mod.sep$deviance mod.sep$edf ######## # (e) ######## mod.sum <- multinom(prog~gender+race+ses+schtyp+I(read+write+math+science+socst),data=hsb) mod.sum$deviance mod.sum$edf ######## # (f) ######## step(mod.sep) ######### # (i) ######### mod.final <- multinom(prog~ses+schtyp+math+science+socst,data=hsb) summary(mod.final) mod.final$deviance mod.final$edf mod.final$fit[hsb$id==99,] hsb[hsb$id==99,] pi2overpi1 <- exp(2.587029+0.6468812-0.1212242*56+0.08209791*66-0.04441228*61) pi3overpi1 <- exp(6.687272+1.9955504-0.1369641*56+0.03941237*66-0.09363417*61) aa <- c(1,pi2overpi1,pi3overpi1) aa/sum(aa) ####################### # Some of the Oupuut ####################### # (d) ######## > mod.sep <- multinom(prog~gender+race+ses+schtyp+read+write +math+science+socst,data=hsb) # weights: 42 (26 variable) initial value 219.722458 iter 10 value 171.814970 iter 20 value 153.793692 iter 30 value 152.935260 final value 152.935256 converged > summary(mod.sep) Call: multinom(formula = prog ~ gender + race + ses + schtyp + read + write + math + science + socst, data = hsb) Coefficients: (Intercept) gendermale raceasian racehispanic racewhite seslow sesmiddle schtyppublic read write math science socst general 3.631901 -0.09264717 1.352739 -0.6322019 0.2965156 1.09864111 0.7029621 0.5845405 -0.04418353 -0.03627381 -0.1092888 0.10193746 -0.01976995 vocation 7.481381 -0.32104341 -0.700070 -0.1993556 0.3358881 0.04747323 1.1815808 2.0553336 -0.03481202 -0.03166001 -0.1139877 0.05229938 -0.08040129 Std. Errors: (Intercept) gendermale raceasian racehispanic racewhite seslow sesmiddle schtyppublic read write math science socst general 1.823452 0.4548778 1.058754 0.8935504 0.7354829 0.6066763 0.5045938 0.5642925 0.03103707 0.03381324 0.03522441 0.03274038 0.02712589 vocation 2.104698 0.5021132 1.470176 0.8393676 0.7480573 0.7045772 0.5700833 0.8348229 0.03422409 0.03585729 0.03885131 0.03424763 0.02938212 Residual Deviance: 305.8705 AIC: 357.8705 > > mod.sep$deviance [1] 305.8705 > mod.sep$edf [1] 26 > ######## > # (f) > ######## > > step(mod.sep) Start: AIC=357.87 prog ~ gender + race + ses + schtyp + read + write + math + science + socst multinom(formula = prog ~ ses + schtyp + math + science + socst, data = hsb) Coefficients: (Intercept) seslow sesmiddle schtyppublic math science socst general 2.587029 0.87607389 0.6978995 0.6468812 -0.1212242 0.08209791 -0.04441228 vocation 6.687272 -0.01569301 1.2065000 1.9955504 -0.1369641 0.03941237 -0.09363417 Residual Deviance: 315.5511 AIC: 343.5511 > ######### > # (i) > ######### > > mod.final <- multinom(prog~ses+schtyp+math+science+socst,data=hsb) > summary(mod.final) Call: multinom(formula = prog ~ ses + schtyp + math + science + socst, data = hsb) Coefficients: (Intercept) seslow sesmiddle schtyppublic math science socst general 2.587029 0.87607389 0.6978995 0.6468812 -0.1212242 0.08209791 -0.04441228 vocation 6.687272 -0.01569301 1.2065000 1.9955504 -0.1369641 0.03941237 -0.09363417 Residual Deviance: 315.5511 AIC: 343.5511 > mod.final$deviance [1] 315.5511 > mod.final$edf [1] 14 > mod.final$fit[hsb$id==99,] academic general vocation 0.64426309 0.27665609 0.07908082 > > hsb[hsb$id==99,] id gender race ses schtyp prog read write math science socst 102 99 female white high public general 47 59 56 66 61 > > pi2overpi1 <- exp(2.587029+0.6468812-0.1212242*56+0.08209791*66-0.04441228*61) > pi3overpi1 <- exp(6.687272+1.9955504-0.1369641*56+0.03941237*66-0.09363417*61) > > aa <- c(1,pi2overpi1,pi3overpi1) > aa/sum(aa) [1] 0.64426301 0.27665605 0.07908094 ################# # Problem 3. 7.4. ################# data(pneumo) pneumo ####### # (b) ###### pneumo$status <- relevel(pneumo$status,ref="normal") summary(pneumo) mod.year <- multinom(status~year,weights=Freq,data=pneumo) summary(mod.year) mod.year$edf ####### # (c) ####### mod.logyear <- multinom(status~log(year),weights=Freq,data=pneumo) summary(mod.logyear) mod.logyear$edf eta <- summary(mod.logyear)$coeff%*%c(1,log(25)) aa <- c(1,exp(eta)) aa/sum(aa) ##### # (e) ##### library(MASS) mod.prop <- polr(status~log(year),weights=Freq,data=pneumo) summary(mod.prop) eta1 <- mod.prop$zeta[1]-mod.prop$coefficients*log(25) eta2 <- mod.prop$zeta[2]-mod.prop$coefficients*log(25) p1 <- exp(eta1)/(1+exp(eta1)) p3 <- 1/(1+exp(eta2)) p2 <- 1-p1-p3 round(c(p1,p2,p3),4) #################### # Output #################### > data(pneumo) > pneumo Freq status year 1 98 normal 5.8 2 51 normal 15.0 3 34 normal 21.5 4 35 normal 27.5 5 32 normal 33.5 6 23 normal 39.5 7 12 normal 46.0 8 4 normal 51.5 9 0 mild 5.8 10 2 mild 15.0 11 6 mild 21.5 12 5 mild 27.5 13 10 mild 33.5 14 7 mild 39.5 15 6 mild 46.0 16 2 mild 51.5 17 0 severe 5.8 18 1 severe 15.0 19 3 severe 21.5 20 8 severe 27.5 21 9 severe 33.5 22 8 severe 39.5 23 10 severe 46.0 24 5 severe 51.5 > > ####### > # (b) > ###### > > pneumo$status <- relevel(pneumo$status,ref="normal") > summary(pneumo) Freq status year Min. : 0.00 normal:8 Min. : 5.80 1st Qu.: 3.75 mild :8 1st Qu.:19.88 Median : 7.50 severe:8 Median :30.50 Mean :15.46 Mean :30.04 3rd Qu.:14.75 3rd Qu.:41.12 Max. :98.00 Max. :51.50 > > mod.year <- multinom(status~year,weights=Freq,data=pneumo) > summary(mod.year) Call: multinom(formula = status ~ year, data = pneumo, weights = Freq) Coefficients: (Intercept) year mild -4.291680 0.08356529 severe -5.059849 0.10928549 Std. Errors: (Intercept) year mild 0.5214120 0.01528046 severe 0.5964319 0.01646978 Residual Deviance: 417.4496 AIC: 425.4496 > mod.year$edf [1] 4 > > ####### > # (c) > ####### > > mod.logyear <- multinom(status~log(year),weights=Freq,data=pneumo) > summary(mod.logyear) Call: multinom(formula = status ~ log(year), data = pneumo, weights = Freq) Coefficients: (Intercept) log(year) mild -8.936034 2.165374 severe -11.975104 3.067470 Std. Errors: (Intercept) log(year) mild 1.580439 0.4574870 severe 2.000447 0.5652072 Residual Deviance: 408.8689 AIC: 416.8689 > mod.logyear$edf [1] 4 > eta <- summary(mod.logyear)$coeff%*%c(1,log(25)) > aa <- c(1,exp(eta)) > aa/sum(aa) [1] 0.79219318 0.11092365 0.09688318 > ##### > # (e) > ##### > > library(MASS) > mod.prop <- polr(status~log(year),weights=Freq,data=pneumo) > summary(mod.prop) Re-fitting to get Hessian Call: polr(formula = status ~ log(year), data = pneumo, weights = Freq) Coefficients: Value Std. Error t value log(year) 2.597 0.381 6.817 Intercepts: Value Std. Error t value normal|mild 9.6763 1.3233 7.3123 mild|severe 10.5820 1.3437 7.8750 Residual Deviance: 408.5483 AIC: 414.5483 > eta1 <- mod.prop$zeta[1]-mod.prop$coefficients*log(25) > eta2 <- mod.prop$zeta[2]-mod.prop$coefficients*log(25) > > p1 <- exp(eta1)/(1+exp(eta1)) > p3 <- 1/(1+exp(eta2)) > p2 <- 1-p1-p3 > round(c(p1,p2,p3),4) normal|mild normal|mild mild|severe 0.7887 0.1136 0.0977