##################
# Problem 1.
##################
Problem 1:
(a).  I fit the proportational odds model with all the main effects and interaction effect
     and compute the goodness of the fit, which is 15.0366 based on 48-25=23 degrees of freedom. 
     The p-value is 0.8934, which means the fit is good.
     
(b). I fit the proportational odds model with the main effects only and compute the goodness of 
     the fit, which is 47.728 based on 48-8=40 degrees of freedom. 
     The p-value is 0.1874, which means the fit is good. The test shows that at least there is 
     a significant interaction effect.
(c). The test results are 

   Model    (HIC)       (HI,IC,HC)      (IH,HC)      (I,HC)     (I,H,C)
  Deviacne  3446.458      3448.583      3448.695    3470.483    3479.149
  Df         25             19            17           11         8

   Test       HIC      IC       IH       HC   
   diff      2.1250  0.1116   21.7882   8.6662
   p-value   0.9078  0.9457    0.0013   0.0341

      
Thus, the result shows that IH and HC are significant. I also compute the partial and margianl association
but it is not required.

  The results are

   Effect  Partial  df  P-value   Margianl   df  p-value
   HIC     2.1250    6  0.9078    2.1250     6   0.9078  
   IC      0.1116    2  0.9457    0.2089     2  0.9008
   HI     21.7174    6  0.0013   22.5094     6  0.0098
   HC      7.9246    3  0.0476    8.6662     3  0.0341    
        


##################
# Problem 2. 7.1 on textbook
##################

(d) I fit a multinomial model with gender, race, ses, schtype, and the five subjects. 
   The residual deviance is 305.8705. The model degrees of freedom is 26. I find that 
   the sign of coefficien of science is positive. The rest coefficient of the five
   subject was negative. Thus, science gives unexpected coefficient. It provide an 
   alteantive way to affect the programs. 

(e) The update model use the total scores of the five subject. The residual deviance is 
    328.6081. The model df is 26. The likelihood ratio statistic value is 328.6081-305.8705=22.73.
    Based on 8 degrees of freedom chi-square test, the p-value is 0.0037. 
    Then, we prefer the previous model.

(f) I use the step wise model selection for model with seperate subjects. The final model 
    contains ses, math, science, and socty. The residual deviance is 315.55. The model
    degrees of freedom is 14. Compare it with model in (d). The likeilihood ratio statistic
    value is 315.5511-305.8705=9.6806. Based on the chisquare distribution with 12 degrees
    of freedom, we conclude the test is not significant. 


(i) The fitted probabilities are 0.64426309 0.27665609 0.07908082 for academic, general and vocation respectively.
    I provide two ways. The first is to direct use the R output. 
    The second is to use coefficients. It is base on he variables are "ses", "schtyp", "math", "science" and "socst".


##################
# Problem 3: 7.4 on textbook
##################

(b) I use year as the explain variable. The residual deviance is 417.4496. The model df is 4.
    The coefficients shows that mild and severe were similar. The coefficients and intecepts are 
    almost proportial. 
   
(c) I use log year to do the model again. The residual deviance is reduced to 408.8689.
    We still have the similar conclusion.
    The probability probabilities are 0.7922 for normal, 0.1109 for mild, and 0.0969 for severe.

(e) I fit a proportional odds model. The estimate of theta1 is 9.6783. It interprets the probability of normal
    relative to the overall probability of mild and severe. 
    The predicted probabilities are 0.7887 for normalk, 0.1136 for mild, and 0.0977 for severe.


##############
# Output 
##############
# Problem 1.
##############

> housesat <- read.table("c:\\data\\housesat.data",h=T)
> housesat
   Satisfaction Influence   Housing Contact Freq
1           Low       Low     Tower     Low   21
2           Low    Medium     Tower     Low   34
3           Low      High     Tower     Low   10
4           Low       Low Apartment     Low   61
5           Low    Medium Apartment     Low   43
6           Low      High Apartment     Low   26
7           Low       Low    Atrium     Low   13
8           Low    Medium    Atrium     Low    8
9           Low      High    Atrium     Low    6
10          Low       Low  Terraced     Low   18
11          Low    Medium  Terraced     Low   15
12          Low      High  Terraced     Low    7
13          Low       Low     Tower    High   14
14          Low    Medium     Tower    High   17
15          Low      High     Tower    High    3
16          Low       Low Apartment    High   78
17          Low    Medium Apartment    High   48
18          Low      High Apartment    High   15
19          Low       Low    Atrium    High   20
20          Low    Medium    Atrium    High   10
21          Low      High    Atrium    High    7
22          Low       Low  Terraced    High   57
23          Low    Medium  Terraced    High   31
24          Low      High  Terraced    High    5
25       Medium       Low     Tower     Low   21
26       Medium    Medium     Tower     Low   22
27       Medium      High     Tower     Low   11
28       Medium       Low Apartment     Low   23
29       Medium    Medium Apartment     Low   35
30       Medium      High Apartment     Low   18
31       Medium       Low    Atrium     Low    9
32       Medium    Medium    Atrium     Low    8
33       Medium      High    Atrium     Low    7
34       Medium       Low  Terraced     Low    6
35       Medium    Medium  Terraced     Low   13
36       Medium      High  Terraced     Low    5
37       Medium       Low     Tower    High   19
38       Medium    Medium     Tower    High   23
39       Medium      High     Tower    High    5
40       Medium       Low Apartment    High   46
41       Medium    Medium Apartment    High   45
42       Medium      High Apartment    High   25
43       Medium       Low    Atrium    High   23
44       Medium    Medium    Atrium    High   22
45       Medium      High    Atrium    High   10
46       Medium       Low  Terraced    High   23
47       Medium    Medium  Terraced    High   21
48       Medium      High  Terraced    High    6
49         High       Low     Tower     Low   28
50         High    Medium     Tower     Low   36
51         High      High     Tower     Low   36
52         High       Low Apartment     Low   17
53         High    Medium Apartment     Low   40
54         High      High Apartment     Low   54
55         High       Low    Atrium     Low   10
56         High    Medium    Atrium     Low   12
57         High      High    Atrium     Low    9
58         High       Low  Terraced     Low    7
59         High    Medium  Terraced     Low   13
60         High      High  Terraced     Low   11
61         High       Low     Tower    High   37
62         High    Medium     Tower    High   40
63         High      High     Tower    High   23
64         High       Low Apartment    High   43
65         High    Medium Apartment    High   86
66         High      High Apartment    High   62
67         High       Low    Atrium    High   20
68         High    Medium    Atrium    High   24
69         High      High    Atrium    High   21
70         High       Low  Terraced    High   13
71         High    Medium  Terraced    High   13
72         High      High  Terraced    High   13

(a)

> g1 <- polr(Satisfaction~Influence*Housing*Contact,weight=Freq,data=housesat)
> g1$deviance
[1] 3446.458
> g1$edf
[1] 25
> proportion <- g1$fit[1:24,]
> observed.freq <- matrix(housesat$Freq,24)
> total <- apply(observed.freq,1,sum)
> predicted.freq <- total*proportion
> 
> real.proportion <- observed.freq/total
> good1 <- 2*sum(observed.freq*log(observed.freq/predicted.freq))
> good1
[1] 15.03668
> 
> 1-pchisq(good1,48-g1$edf)
[1] 0.8933142

> g.saturated <- multinom(Satisfaction~Influence*Housing*Contact,weight=Freq,data=housesat)
# weights:  75 (48 variable)
initial  value 1846.767257 
iter  10 value 1723.797532
iter  20 value 1716.392454
iter  30 value 1715.780520
iter  40 value 1715.714937
final  value 1715.710853 
converged
> g.saturated$deviance
[1] 3431.422
> 1-pchisq(g1$deviance-g.saturated$deviance,g.saturated$edf-g1$edf)
[1] 0.8933158

(b)


> g <- polr(Satisfaction~Influence+Housing+Contact,weight=Freq,data=housesat)
> proportion <- g$fit[1:24,]
> observed.freq <- matrix(housesat$Freq,24)
> total <- apply(observed.freq,1,sum)
> predicted.freq <- total*proportion
> 
> real.proportion <- observed.freq/total
> good <- 2*sum(observed.freq*log(observed.freq/predicted.freq))
> good
[1] 47.72764
> 1-pchisq(good-good1,g1$edf-g$edf)
[1] 0.01233195
> 1-pchisq(g$deviance-g1$deviance,g1$edf-g$edf)
[1] 0.01233195
> 
> 1-pchisq(good,48-g$edf)
[1] 0.1874401
> 
> 1-pchisq(g$deviance-g.saturated$deviance,g.saturated$edf-g$edf)
[1] 0.1874413

(c)

> #### Influence:Housing:Contact
> g.large <- polr(Satisfaction~Influence*Housing*Contact,weight=Freq,data=housesat)
> g.small <- polr(Satisfaction~(Influence+Housing+Contact)^2,weight=Freq,data=housesat)
> c(g.small$deviance-g.large$deviance,g.large$edf-g.small$edf)
[1] 2.124984 6.000000
> 1-pchisq(g.small$deviance-g.large$deviance,g.large$edf-g.small$edf)
[1] 0.9078523
> 
> #### Housing:Contact
> g.large <- polr(Satisfaction~(Influence+Housing+Contact)^2,weight=Freq,data=housesat)
> g.small <- polr(Satisfaction~Influence*(Housing+Contact),weight=Freq,data=housesat)
> c(g.small$deviance-g.large$deviance,g.large$edf-g.small$edf)
[1] 7.924571 3.000000
> 1-pchisq(g.small$deviance-g.large$deviance,g.large$edf-g.small$edf)
[1] 0.04759659
> 
> g.large <- polr(Satisfaction~Influence+Housing*Contact,weight=Freq,data=housesat)
> g.small <- polr(Satisfaction~Influence+Housing+Contact,weight=Freq,data=housesat)
> c(g.small$deviance-g.large$deviance,g.large$edf-g.small$edf)
[1] 8.666155 3.000000
> 1-pchisq(g.small$deviance-g.large$deviance,g.large$edf-g.small$edf)
[1] 0.03407516
> 
> #### Influence:Housing

> g.large <- polr(Satisfaction~(Influence+Housing+Contact)^2,weight=Freq,data=housesat)
> g.small <- polr(Satisfaction~(Influence+Housing)*Contact,weight=Freq,data=housesat)
> c(g.small$deviance-g.large$deviance,g.large$edf-g.small$edf)
[1] 21.71740  6.00000
> 1-pchisq(g.small$deviance-g.large$deviance,g.large$edf-g.small$edf)
[1] 0.001362224
> 
> g.large <- polr(Satisfaction~Influence*Housing+Contact,weight=Freq,data=housesat)
> g.small <- polr(Satisfaction~Influence+Housing+Contact,weight=Freq,data=housesat)
> c(g.small$deviance-g.large$deviance,g.large$edf-g.small$edf)
[1] 22.50935  6.00000
> 1-pchisq(g.small$deviance-g.large$deviance,g.large$edf-g.small$edf)
[1] 0.0009786183

> 
> #### Influence:Contact
> 
> g.large <- polr(Satisfaction~(Influence+Housing+Contact)^2,weight=Freq,data=housesat)
> g.small <- polr(Satisfaction~(Influence+Contact)*Housing,weight=Freq,data=housesat)
> c(g.small$deviance-g.large$deviance,g.large$edf-g.small$edf)
[1] 0.1116041 2.0000000
> 1-pchisq(g.small$deviance-g.large$deviance,g.large$edf-g.small$edf)
[1] 0.9457263
> 
> g.large <- polr(Satisfaction~Influence*Contact+Housing,weight=Freq,data=housesat)
> g.small <- polr(Satisfaction~Influence+Housing+Contact,weight=Freq,data=housesat)
> c(g.small$deviance-g.large$deviance,g.large$edf-g.small$edf)
[1] 0.2089536 2.0000000
> 1-pchisq(g.small$deviance-g.large$deviance,g.large$edf-g.small$edf)
[1] 0.9007957

> 
> ##########
> g2 <- polr(Satisfaction~(Influence+Housing+Contact)^2,weight=Freq,data=housesat)
> g3 <- polr(Satisfaction~(Influence+Contact)*Housing,weight=Freq,data=housesat)
> g4 <- polr(Satisfaction~Influence+Contact*Housing,weight=Freq,data=housesat)
> g5 <- polr(Satisfaction~Influence+Contact+Housing,weight=Freq,data=housesat)
> 
> 1-pchisq(g2$deviance-g1$deviance,g1$edf-g2$edf)
[1] 0.9078523
> 1-pchisq(g3$deviance-g2$deviance,g2$edf-g3$edf)
[1] 0.9457263
> 1-pchisq(g4$deviance-g3$deviance,g3$edf-g4$edf)
[1] 0.001322641
> 1-pchisq(g5$deviance-g4$deviance,g4$edf-g5$edf)
[1] 0.03407516
> c(g1$dev,g2$deviance,g3$deviance,g4$deviance,g5$deviance)
[1] 3446.458 3448.583 3448.695 3470.483 3479.149
> c(g1$edf,g2$edf,g3$edf,g4$edf,g5$edf)
[1] 25 19 17 11  8

##########
# Problem 2. 7.1
##########

hsb <- read.table("u:\\stat526\\dataset\\ascdata\\hsb.txt",h=T)
summary(hsb)

######
# (d)
######

library(nnet)

mod.sep <- multinom(prog~gender+race+ses+schtyp+read+write +math+science+socst,data=hsb)
summary(mod.sep)

mod.sep$deviance
mod.sep$edf

########
# (e)
########

mod.sum <- multinom(prog~gender+race+ses+schtyp+I(read+write+math+science+socst),data=hsb)
mod.sum$deviance
mod.sum$edf

########
# (f)
########

step(mod.sep)

#########
# (i)
#########

mod.final <- multinom(prog~ses+schtyp+math+science+socst,data=hsb)
summary(mod.final)
mod.final$deviance
mod.final$edf
mod.final$fit[hsb$id==99,]

hsb[hsb$id==99,]

pi2overpi1 <- exp(2.587029+0.6468812-0.1212242*56+0.08209791*66-0.04441228*61)
pi3overpi1 <- exp(6.687272+1.9955504-0.1369641*56+0.03941237*66-0.09363417*61)

aa <- c(1,pi2overpi1,pi3overpi1)
aa/sum(aa)




#######################
# Some of the Oupuut
#######################
# (d)
########

> mod.sep <- multinom(prog~gender+race+ses+schtyp+read+write +math+science+socst,data=hsb)
# weights:  42 (26 variable)
initial  value 219.722458 
iter  10 value 171.814970
iter  20 value 153.793692
iter  30 value 152.935260
final  value 152.935256 
converged
> summary(mod.sep)
Call:
multinom(formula = prog ~ gender + race + ses + schtyp + read + 
    write + math + science + socst, data = hsb)

Coefficients:
         (Intercept)  gendermale raceasian racehispanic racewhite     seslow sesmiddle schtyppublic        read       write       math    science       socst
general     3.631901 -0.09264717  1.352739   -0.6322019 0.2965156 1.09864111 0.7029621    0.5845405 -0.04418353 -0.03627381 -0.1092888 0.10193746 -0.01976995
vocation    7.481381 -0.32104341 -0.700070   -0.1993556 0.3358881 0.04747323 1.1815808    2.0553336 -0.03481202 -0.03166001 -0.1139877 0.05229938 -0.08040129

Std. Errors:
         (Intercept) gendermale raceasian racehispanic racewhite    seslow sesmiddle schtyppublic       read      write       math    science      socst
general     1.823452  0.4548778  1.058754    0.8935504 0.7354829 0.6066763 0.5045938    0.5642925 0.03103707 0.03381324 0.03522441 0.03274038 0.02712589
vocation    2.104698  0.5021132  1.470176    0.8393676 0.7480573 0.7045772 0.5700833    0.8348229 0.03422409 0.03585729 0.03885131 0.03424763 0.02938212

Residual Deviance: 305.8705 
AIC: 357.8705 
> 
> mod.sep$deviance
[1] 305.8705
> mod.sep$edf
[1] 26

> ########
> # (f)
> ########
> 
> step(mod.sep)
Start:  AIC=357.87
prog ~ gender + race + ses + schtyp + read + write + math + science + 
    socst

multinom(formula = prog ~ ses + schtyp + math + science + socst, 
    data = hsb)

Coefficients:
         (Intercept)      seslow sesmiddle schtyppublic       math    science       socst
general     2.587029  0.87607389 0.6978995    0.6468812 -0.1212242 0.08209791 -0.04441228
vocation    6.687272 -0.01569301 1.2065000    1.9955504 -0.1369641 0.03941237 -0.09363417

Residual Deviance: 315.5511 
AIC: 343.5511 

> #########
> # (i)
> #########
> 
> mod.final <- multinom(prog~ses+schtyp+math+science+socst,data=hsb)
> summary(mod.final)
Call:
multinom(formula = prog ~ ses + schtyp + math + science + socst, 
    data = hsb)

Coefficients:
         (Intercept)      seslow sesmiddle schtyppublic       math    science       socst
general     2.587029  0.87607389 0.6978995    0.6468812 -0.1212242 0.08209791 -0.04441228
vocation    6.687272 -0.01569301 1.2065000    1.9955504 -0.1369641 0.03941237 -0.09363417
Residual Deviance: 315.5511 
AIC: 343.5511 
> mod.final$deviance
[1] 315.5511
> mod.final$edf
[1] 14
> mod.final$fit[hsb$id==99,]
  academic    general   vocation 
0.64426309 0.27665609 0.07908082 
> 
> hsb[hsb$id==99,]
    id gender  race  ses schtyp    prog read write math science socst
102 99 female white high public general   47    59   56      66    61
> 
> pi2overpi1 <- exp(2.587029+0.6468812-0.1212242*56+0.08209791*66-0.04441228*61)
> pi3overpi1 <- exp(6.687272+1.9955504-0.1369641*56+0.03941237*66-0.09363417*61)
> 
> aa <- c(1,pi2overpi1,pi3overpi1)
> aa/sum(aa)
[1] 0.64426301 0.27665605 0.07908094


#################
# Problem 3. 7.4. 
#################

data(pneumo)
pneumo

#######
# (b)
######

pneumo$status <- relevel(pneumo$status,ref="normal")
summary(pneumo)

mod.year <- multinom(status~year,weights=Freq,data=pneumo)
summary(mod.year)
mod.year$edf

#######
# (c)
#######

mod.logyear <- multinom(status~log(year),weights=Freq,data=pneumo)
summary(mod.logyear)
mod.logyear$edf

eta <- summary(mod.logyear)$coeff%*%c(1,log(25))
aa <- c(1,exp(eta))
aa/sum(aa)

#####
# (e)
#####

library(MASS)
mod.prop <- polr(status~log(year),weights=Freq,data=pneumo)
summary(mod.prop)



eta1 <- mod.prop$zeta[1]-mod.prop$coefficients*log(25)
eta2 <- mod.prop$zeta[2]-mod.prop$coefficients*log(25)

p1 <- exp(eta1)/(1+exp(eta1))
p3 <- 1/(1+exp(eta2))
p2 <- 1-p1-p3
round(c(p1,p2,p3),4)

####################
# Output
####################

> data(pneumo)
> pneumo
   Freq status year
1    98 normal  5.8
2    51 normal 15.0
3    34 normal 21.5
4    35 normal 27.5
5    32 normal 33.5
6    23 normal 39.5
7    12 normal 46.0
8     4 normal 51.5
9     0   mild  5.8
10    2   mild 15.0
11    6   mild 21.5
12    5   mild 27.5
13   10   mild 33.5
14    7   mild 39.5
15    6   mild 46.0
16    2   mild 51.5
17    0 severe  5.8
18    1 severe 15.0
19    3 severe 21.5
20    8 severe 27.5
21    9 severe 33.5
22    8 severe 39.5
23   10 severe 46.0
24    5 severe 51.5
> 
> #######
> # (b)
> ######
> 
> pneumo$status <- relevel(pneumo$status,ref="normal")
> summary(pneumo)
      Freq          status       year      
 Min.   : 0.00   normal:8   Min.   : 5.80  
 1st Qu.: 3.75   mild  :8   1st Qu.:19.88  
 Median : 7.50   severe:8   Median :30.50  
 Mean   :15.46              Mean   :30.04  
 3rd Qu.:14.75              3rd Qu.:41.12  
 Max.   :98.00              Max.   :51.50  
> 
> mod.year <- multinom(status~year,weights=Freq,data=pneumo)
> summary(mod.year)
Call:
multinom(formula = status ~ year, data = pneumo, weights = Freq)

Coefficients:
       (Intercept)       year
mild     -4.291680 0.08356529
severe   -5.059849 0.10928549

Std. Errors:
       (Intercept)       year
mild     0.5214120 0.01528046
severe   0.5964319 0.01646978

Residual Deviance: 417.4496 
AIC: 425.4496 
> mod.year$edf
[1] 4
> 
> #######
> # (c)
> #######
> 
> mod.logyear <- multinom(status~log(year),weights=Freq,data=pneumo)
> summary(mod.logyear)
Call:
multinom(formula = status ~ log(year), data = pneumo, weights = Freq)

Coefficients:
       (Intercept) log(year)
mild     -8.936034  2.165374
severe  -11.975104  3.067470

Std. Errors:
       (Intercept) log(year)
mild      1.580439 0.4574870
severe    2.000447 0.5652072

Residual Deviance: 408.8689 
AIC: 416.8689 
> mod.logyear$edf
[1] 4
> eta <- summary(mod.logyear)$coeff%*%c(1,log(25))
> aa <- c(1,exp(eta))
> aa/sum(aa)
[1] 0.79219318 0.11092365 0.09688318

> #####
> # (e)
> #####
> 
> library(MASS)
> mod.prop <- polr(status~log(year),weights=Freq,data=pneumo)
> summary(mod.prop)

Re-fitting to get Hessian

Call:
polr(formula = status ~ log(year), data = pneumo, weights = Freq)

Coefficients:
          Value Std. Error t value
log(year) 2.597      0.381   6.817

Intercepts:
            Value   Std. Error t value
normal|mild  9.6763  1.3233     7.3123
mild|severe 10.5820  1.3437     7.8750

Residual Deviance: 408.5483 
AIC: 414.5483 

> eta1 <- mod.prop$zeta[1]-mod.prop$coefficients*log(25)
> eta2 <- mod.prop$zeta[2]-mod.prop$coefficients*log(25)
> 
> p1 <- exp(eta1)/(1+exp(eta1))
> p3 <- 1/(1+exp(eta2))
> p2 <- 1-p1-p3
> round(c(p1,p2,p3),4)
normal|mild normal|mild mild|severe 
     0.7887      0.1136      0.0977