Problem 1:

(a). I fit the additive model and the saturated model. The ANOVA table shows the 
p-value between them is 98.9%. Thus, all the interaction effects are not significant.

(b) The anova of the additive model shows that Sex and Intact are significant, but the race
    is not.

Thus, the final model is 

log(pi/(1-pi))=-0.26448868-0.09866467 I(Male)+0.16245245 Index(Yes)


(c) The qq-plot shows that there is a point which is a little bit away from the straight
    line. Thus, it is not a significant outlier. The dispersion parameter is 0.4845 which
    is less than one. We choose estimate of phi equal to 1.
    Thus, there is not over-dispersion in this model.

(d) The 95% confidence interval for male Hispanic person in the intact family is 
    [0.4338455,0.4662456]


Problem 2:

(a). I fit the additive Poisson GLM model. The p-value for the goodness of fit is
     0.21. Thus, the fit is good. Based on this model, we can say that the interaction 
     effect is not significant. Thus, the two variables, Rate of Satisfaction and Levels
     of Income are independent.

(b). I fit the model with two main effects as factor variables and interaction effects as
     the continuous varible. The ANOVA result shows that the p-value for the interaction
     effect is 0.0019. Thus, this ANOVA result tells us that the interaction effect is 
     significant. Therefore, the two variables are not independent.

(c). The conclusion in part (a) and (b) are inconsistence. Because the test in (b) is more
     powerful than the test in (a). Thus, if there exists the interaction effect, the test 
     in (a) can not find it, but the test in (b) can.


Problem 3:
(a). The deviance goodness of fit is 2.862 with p-value 0.4134 and 
     the Pearson goodness of fit is 2.895 with p-value 0.4081. Thus both indicate 
     the model fits the data. However since the deviance goodness of fit value of 
     the null model is very large with p-value almost 0 indicates the proportion signifiantlt
     depends on the independent variable.

(b) The confidence intervals are 
                 logistic model: (0.4281,0.6145)
                   probit model: (0.4288,0.6080)
    complementary log-log model: (0.4081,0.5954)
(c) The estimate of parameters are the same but the goodness of fit statistics are different. 
    This is caused by the definition of the saturated model. In part (a), we assume the proportion
    for items in the same cell are the same but this is not assumed in part (c).

###########
Output

1.

> high <- read.table("c:\\data\\highschool.data",h=T)
> high
   GradYes GradNo    Sex     Race Intact
1      843    982   Male    White    Yes
2      864    931 Female    White    Yes
3      346    441   Male    Black    Yes
4      360    410 Female    Black    Yes
5      237    305   Male Hispanic    Yes
6      208    259 Female Hispanic    Yes
7      168    243   Male    White     No
8      161    208 Female    White     No
9      231    337   Male    Black     No
10     225    283 Female    Black     No
11      82    128   Male Hispanic     No
12      78     98 Female Hispanic     No
> 
> g <- glm(cbind(GradYes,GradNo)~Sex+Race+Intact,fam=binomial,data=high)
> g.saturated <- glm(cbind(GradYes,GradNo)~Sex*Race*Intact,fam=binomial,data=high) 
> anova(g,g.saturated,test="Chi")
Analysis of Deviance Table

Model 1: cbind(GradYes, GradNo) ~ Sex + Race + Intact
Model 2: cbind(GradYes, GradNo) ~ Sex * Race * Intact
  Resid. Df Resid. Dev Df Deviance P(>|Chi|)
1         7     1.5397                      
2         0  5.498e-13  7   1.5397    0.9809

(b)


> anova(g,test="Chi")
Analysis of Deviance Table

Model: binomial, link: logit

Response: cbind(GradYes, GradNo)

Terms added sequentially (first to last)


       Df Deviance Resid. Df Resid. Dev P(>|Chi|)
NULL                      11    20.4154          
Sex     1   5.3300        10    15.0853    0.0210
Race    2   4.9483         8    10.1370    0.0842
Intact  1   8.5973         7     1.5397    0.0034

> g <- glm(cbind(GradYes,GradNo)~Sex+Intact,fam=binomial,data=high)
> summary(g)

Call:
glm(formula = cbind(GradYes, GradNo) ~ Sex + Intact, family = binomial, 
    data = high)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.2617  -0.5829  -0.1146   0.2772   1.0233  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.26449    0.04861  -5.441  5.3e-08 ***
SexMale     -0.09866    0.04384  -2.250  0.02442 *  
IntactYes    0.16245    0.04982   3.261  0.00111 ** 
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 20.4154  on 11  degrees of freedom
Residual deviance:  4.4103  on  9  degrees of freedom
AIC: 91.406

Number of Fisher Scoring iterations: 2


(c)

> qqnorm(residuals(g,type="pearson"))
> qqline(residuals(g,type="pearson"))
> 
> sum(residuals(g,"pearson")^2)/9
[1] 0.4894722

> x0 <- c(1,1,1)
> y0 <- sum(x0*g$coeff)
> y0.std <- sqrt(t(x0)%*%summary(g)$cov.unscaled%*%x0)
> y0.interval <- c(y0-1.96*y0.std,y0+1.96*y0.std)
> interval <- exp(y0.interval)/(1+exp(y0.interval))
> interval
[1] 0.4338455 0.4662456

####
2.
####

(a)

> job <- read.table("h:\\data\\job.data",h=T)
> job
   Freq Satisfaction Income
1    20            1      1
2    24            2      1
3    80            3      1
4    82            4      1
5    22            1      2
6    38            2      2
7   104            3      2
8   125            4      2
9    13            1      3
10   28            2      3
11   81            3      3
12  113            4      3
13    7            1      4
14   18            2      4
15   54            3      4
16   92            4      4

> options(contrasts=c("contr.treatment","contr.treatment"))
> g <- glm(Freq~factor(Satisfaction)+factor(Income),data=job,fam=poisson)
> summary(g)

Call:
glm(formula = Freq ~ factor(Satisfaction) + factor(Income), family = poisson, 
    data = job)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-1.50416  -0.67501  -0.08592   0.53800   1.51852  

Coefficients:
                      Estimate Std. Error z value Pr(>|z|)    
(Intercept)            2.65151    0.14092  18.816  < 2e-16 ***
factor(Satisfaction)2  0.55500    0.15929   3.484 0.000494 ***
factor(Satisfaction)3  1.63806    0.13874  11.806  < 2e-16 ***
factor(Satisfaction)4  1.89389    0.13617  13.908  < 2e-16 ***
factor(Income)2        0.33855    0.09118   3.713 0.000205 ***
factor(Income)3        0.13171    0.09544   1.380 0.167585    
factor(Income)4       -0.18621    0.10344  -1.800 0.071841 .  
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 445.763  on 15  degrees of freedom
Residual deviance:  12.037  on  9  degrees of freedom
AIC: 115.07

Number of Fisher Scoring iterations: 3

> 1-pchisq(12.037,9)
[1] 0.2112315


(b)
> g1 <- glm(Freq~factor(Satisfaction)+factor(Income)+I(Satisfaction*Income),data=job,fam=poisson)
> summary(g1)

Call:
glm(formula = Freq ~ factor(Satisfaction) + factor(Income) + 
    I(Satisfaction * Income), family = poisson, data = job)

Deviance Residuals: 
      Min         1Q     Median         3Q        Max  
-1.034350  -0.175387   0.005087   0.270322   0.579845  

Coefficients:
                         Estimate Std. Error z value Pr(>|z|)    
(Intercept)               2.85061    0.14826  19.228  < 2e-16 ***
factor(Satisfaction)2     0.30753    0.17547   1.753 0.079678 .  
factor(Satisfaction)3     1.13006    0.20826   5.426 5.76e-08 ***
factor(Satisfaction)4     1.11197    0.28091   3.958 7.54e-05 ***
factor(Income)2          -0.01053    0.14344  -0.073 0.941478    
factor(Income)3          -0.57692    0.24745  -2.331 0.019728 *  
factor(Income)4          -1.26389    0.36669  -3.447 0.000567 ***
I(Satisfaction * Income)  0.11194    0.03641   3.075 0.002108 ** 
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 445.7627  on 15  degrees of freedom
Residual deviance:   2.3859  on  8  degrees of freedom
AIC: 107.42

Number of Fisher Scoring iterations: 3

> anova(g,g1,test="Chi")
Analysis of Deviance Table

Model 1: Freq ~ factor(Satisfaction) + factor(Income)
Model 2: Freq ~ factor(Satisfaction) + factor(Income) + I(Satisfaction * 
    Income)
  Resid. Df Resid. Dev Df Deviance P(>|Chi|)
1         9    12.0369                      
2         8     2.3859  1   9.6509    0.0019

######
# 3.
######
# (a)
######
> y1 <- c(6,16,16,24,27)
> y2 <- c(24,14,14,6,3)
> y <- cbind(y1,y2)
> x <- c(1,2,3,4,5)
> g <- glm(y~x,fam=binomial)
> summary(g)

Call:
glm(formula = y ~ x, family = binomial)

Deviance Residuals: 
      1        2        3        4        5  
-0.5662   1.2382  -0.9830   0.1671   0.1190  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -1.9491     0.4536  -4.297 1.73e-05 ***
x             0.8149     0.1512   5.391 7.00e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 40.2226  on 4  degrees of freedom
Residual deviance:  2.8619  on 3  degrees of freedom
AIC: 24.350

Number of Fisher Scoring iterations: 3

> X2 <- sum(residuals(g,"pearson")^2)
> G2 <- g$deviance
> X2
[1] 2.895187
> G2
[1] 2.861914
> 1-pchisq(G2,3)
[1] 0.413409
> 1-pchisq(X2,3)
[1] 0.4080692
> 1-pchisq(40.2226,3)
[1] 9.558048e-09
#######
# (b)
#######
> g1 <- glm(y~x,fam=binomial)
> g2 <- glm(y~x,fam=binomial(link=probit))
> g3 <- glm(y~x,fam=binomial(link=cloglog))
> 
> a <- c(1,2.5)
> eta <- sum(a*g1$coeff)
> eta.sigma <- sqrt(t(a)%*%(summary(g1)$cov.unscaled)%*%a)
> CI.eta <- c(eta-1.96*eta.sigma,eta+1.96*eta.sigma)
> CI.p1 <- exp(CI.eta)/(1+exp(CI.eta))
> CI.p1
[1] 0.4280876 0.6144586
> 
> eta <- sum(a*g2$coeff)
> eta.sigma <- sqrt(t(a)%*%(summary(g2)$cov.unscaled)%*%a)
> CI.eta <- c(eta-1.96*eta.sigma,eta+1.96*eta.sigma)
> CI.p1 <- pnorm(CI.eta)
> CI.p1
[1] 0.4287688 0.6079522
> 
> eta <- sum(a*g3$coeff)
> eta.sigma <- sqrt(t(a)%*%(summary(g3)$cov.unscaled)%*%a)
> CI.eta <- c(eta-1.96*eta.sigma,eta+1.96*eta.sigma)
> CI.p3 <- 1-exp(-exp(CI.eta))
> CI.p3
[1] 0.4081025 0.5953910
####
# (c)
####
> g <- glm(Success~X,fam=binomial,data=nail)
> summary(g)

Call:
glm(formula = Success ~ X, family = binomial, data = nail)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.1158  -0.7469   0.4749   0.9043   1.6811  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -1.9491     0.4536  -4.297 1.73e-05 ***
X             0.8149     0.1512   5.391 7.00e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 202.69  on 149  degrees of freedom
Residual deviance: 165.33  on 148  degrees of freedom
AIC: 169.33

Number of Fisher Scoring iterations: 4

> 1-pchisq(202.69,149)
[1] 0.002275330
> 1-pchisq(165.33,148)
[1] 0.1565229