Problem 1: (a). I fit the additive model and the saturated model. The ANOVA table shows the p-value between them is 98.9%. Thus, all the interaction effects are not significant. (b) The anova of the additive model shows that Sex and Intact are significant, but the race is not. Thus, the final model is log(pi/(1-pi))=-0.26448868-0.09866467 I(Male)+0.16245245 Index(Yes) (c) The qq-plot shows that there is a point which is a little bit away from the straight line. Thus, it is not a significant outlier. The dispersion parameter is 0.4845 which is less than one. We choose estimate of phi equal to 1. Thus, there is not over-dispersion in this model. (d) The 95% confidence interval for male Hispanic person in the intact family is [0.4338455,0.4662456] Problem 2: (a). I fit the additive Poisson GLM model. The p-value for the goodness of fit is 0.21. Thus, the fit is good. Based on this model, we can say that the interaction effect is not significant. Thus, the two variables, Rate of Satisfaction and Levels of Income are independent. (b). I fit the model with two main effects as factor variables and interaction effects as the continuous varible. The ANOVA result shows that the p-value for the interaction effect is 0.0019. Thus, this ANOVA result tells us that the interaction effect is significant. Therefore, the two variables are not independent. (c). The conclusion in part (a) and (b) are inconsistence. Because the test in (b) is more powerful than the test in (a). Thus, if there exists the interaction effect, the test in (a) can not find it, but the test in (b) can. Problem 3: (a). The deviance goodness of fit is 2.862 with p-value 0.4134 and the Pearson goodness of fit is 2.895 with p-value 0.4081. Thus both indicate the model fits the data. However since the deviance goodness of fit value of the null model is very large with p-value almost 0 indicates the proportion signifiantlt depends on the independent variable. (b) The confidence intervals are logistic model: (0.4281,0.6145) probit model: (0.4288,0.6080) complementary log-log model: (0.4081,0.5954) (c) The estimate of parameters are the same but the goodness of fit statistics are different. This is caused by the definition of the saturated model. In part (a), we assume the proportion for items in the same cell are the same but this is not assumed in part (c). ########### Output 1. > high <- read.table("c:\\data\\highschool.data",h=T) > high GradYes GradNo Sex Race Intact 1 843 982 Male White Yes 2 864 931 Female White Yes 3 346 441 Male Black Yes 4 360 410 Female Black Yes 5 237 305 Male Hispanic Yes 6 208 259 Female Hispanic Yes 7 168 243 Male White No 8 161 208 Female White No 9 231 337 Male Black No 10 225 283 Female Black No 11 82 128 Male Hispanic No 12 78 98 Female Hispanic No > > g <- glm(cbind(GradYes,GradNo)~Sex+Race+Intact,fam=binomial,data=high) > g.saturated <- glm(cbind(GradYes,GradNo)~Sex*Race*Intact,fam=binomial,data=high) > anova(g,g.saturated,test="Chi") Analysis of Deviance Table Model 1: cbind(GradYes, GradNo) ~ Sex + Race + Intact Model 2: cbind(GradYes, GradNo) ~ Sex * Race * Intact Resid. Df Resid. Dev Df Deviance P(>|Chi|) 1 7 1.5397 2 0 5.498e-13 7 1.5397 0.9809 (b) > anova(g,test="Chi") Analysis of Deviance Table Model: binomial, link: logit Response: cbind(GradYes, GradNo) Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev P(>|Chi|) NULL 11 20.4154 Sex 1 5.3300 10 15.0853 0.0210 Race 2 4.9483 8 10.1370 0.0842 Intact 1 8.5973 7 1.5397 0.0034 > g <- glm(cbind(GradYes,GradNo)~Sex+Intact,fam=binomial,data=high) > summary(g) Call: glm(formula = cbind(GradYes, GradNo) ~ Sex + Intact, family = binomial, data = high) Deviance Residuals: Min 1Q Median 3Q Max -1.2617 -0.5829 -0.1146 0.2772 1.0233 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.26449 0.04861 -5.441 5.3e-08 *** SexMale -0.09866 0.04384 -2.250 0.02442 * IntactYes 0.16245 0.04982 3.261 0.00111 ** --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 20.4154 on 11 degrees of freedom Residual deviance: 4.4103 on 9 degrees of freedom AIC: 91.406 Number of Fisher Scoring iterations: 2 (c) > qqnorm(residuals(g,type="pearson")) > qqline(residuals(g,type="pearson")) > > sum(residuals(g,"pearson")^2)/9 [1] 0.4894722 > x0 <- c(1,1,1) > y0 <- sum(x0*g$coeff) > y0.std <- sqrt(t(x0)%*%summary(g)$cov.unscaled%*%x0) > y0.interval <- c(y0-1.96*y0.std,y0+1.96*y0.std) > interval <- exp(y0.interval)/(1+exp(y0.interval)) > interval [1] 0.4338455 0.4662456 #### 2. #### (a) > job <- read.table("h:\\data\\job.data",h=T) > job Freq Satisfaction Income 1 20 1 1 2 24 2 1 3 80 3 1 4 82 4 1 5 22 1 2 6 38 2 2 7 104 3 2 8 125 4 2 9 13 1 3 10 28 2 3 11 81 3 3 12 113 4 3 13 7 1 4 14 18 2 4 15 54 3 4 16 92 4 4 > options(contrasts=c("contr.treatment","contr.treatment")) > g <- glm(Freq~factor(Satisfaction)+factor(Income),data=job,fam=poisson) > summary(g) Call: glm(formula = Freq ~ factor(Satisfaction) + factor(Income), family = poisson, data = job) Deviance Residuals: Min 1Q Median 3Q Max -1.50416 -0.67501 -0.08592 0.53800 1.51852 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 2.65151 0.14092 18.816 < 2e-16 *** factor(Satisfaction)2 0.55500 0.15929 3.484 0.000494 *** factor(Satisfaction)3 1.63806 0.13874 11.806 < 2e-16 *** factor(Satisfaction)4 1.89389 0.13617 13.908 < 2e-16 *** factor(Income)2 0.33855 0.09118 3.713 0.000205 *** factor(Income)3 0.13171 0.09544 1.380 0.167585 factor(Income)4 -0.18621 0.10344 -1.800 0.071841 . --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 445.763 on 15 degrees of freedom Residual deviance: 12.037 on 9 degrees of freedom AIC: 115.07 Number of Fisher Scoring iterations: 3 > 1-pchisq(12.037,9) [1] 0.2112315 (b) > g1 <- glm(Freq~factor(Satisfaction)+factor(Income)+I(Satisfaction*Income),data=job,fam=poisson) > summary(g1) Call: glm(formula = Freq ~ factor(Satisfaction) + factor(Income) + I(Satisfaction * Income), family = poisson, data = job) Deviance Residuals: Min 1Q Median 3Q Max -1.034350 -0.175387 0.005087 0.270322 0.579845 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 2.85061 0.14826 19.228 < 2e-16 *** factor(Satisfaction)2 0.30753 0.17547 1.753 0.079678 . factor(Satisfaction)3 1.13006 0.20826 5.426 5.76e-08 *** factor(Satisfaction)4 1.11197 0.28091 3.958 7.54e-05 *** factor(Income)2 -0.01053 0.14344 -0.073 0.941478 factor(Income)3 -0.57692 0.24745 -2.331 0.019728 * factor(Income)4 -1.26389 0.36669 -3.447 0.000567 *** I(Satisfaction * Income) 0.11194 0.03641 3.075 0.002108 ** --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 445.7627 on 15 degrees of freedom Residual deviance: 2.3859 on 8 degrees of freedom AIC: 107.42 Number of Fisher Scoring iterations: 3 > anova(g,g1,test="Chi") Analysis of Deviance Table Model 1: Freq ~ factor(Satisfaction) + factor(Income) Model 2: Freq ~ factor(Satisfaction) + factor(Income) + I(Satisfaction * Income) Resid. Df Resid. Dev Df Deviance P(>|Chi|) 1 9 12.0369 2 8 2.3859 1 9.6509 0.0019 ###### # 3. ###### # (a) ###### > y1 <- c(6,16,16,24,27) > y2 <- c(24,14,14,6,3) > y <- cbind(y1,y2) > x <- c(1,2,3,4,5) > g <- glm(y~x,fam=binomial) > summary(g) Call: glm(formula = y ~ x, family = binomial) Deviance Residuals: 1 2 3 4 5 -0.5662 1.2382 -0.9830 0.1671 0.1190 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.9491 0.4536 -4.297 1.73e-05 *** x 0.8149 0.1512 5.391 7.00e-08 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 40.2226 on 4 degrees of freedom Residual deviance: 2.8619 on 3 degrees of freedom AIC: 24.350 Number of Fisher Scoring iterations: 3 > X2 <- sum(residuals(g,"pearson")^2) > G2 <- g$deviance > X2 [1] 2.895187 > G2 [1] 2.861914 > 1-pchisq(G2,3) [1] 0.413409 > 1-pchisq(X2,3) [1] 0.4080692 > 1-pchisq(40.2226,3) [1] 9.558048e-09 ####### # (b) ####### > g1 <- glm(y~x,fam=binomial) > g2 <- glm(y~x,fam=binomial(link=probit)) > g3 <- glm(y~x,fam=binomial(link=cloglog)) > > a <- c(1,2.5) > eta <- sum(a*g1$coeff) > eta.sigma <- sqrt(t(a)%*%(summary(g1)$cov.unscaled)%*%a) > CI.eta <- c(eta-1.96*eta.sigma,eta+1.96*eta.sigma) > CI.p1 <- exp(CI.eta)/(1+exp(CI.eta)) > CI.p1 [1] 0.4280876 0.6144586 > > eta <- sum(a*g2$coeff) > eta.sigma <- sqrt(t(a)%*%(summary(g2)$cov.unscaled)%*%a) > CI.eta <- c(eta-1.96*eta.sigma,eta+1.96*eta.sigma) > CI.p1 <- pnorm(CI.eta) > CI.p1 [1] 0.4287688 0.6079522 > > eta <- sum(a*g3$coeff) > eta.sigma <- sqrt(t(a)%*%(summary(g3)$cov.unscaled)%*%a) > CI.eta <- c(eta-1.96*eta.sigma,eta+1.96*eta.sigma) > CI.p3 <- 1-exp(-exp(CI.eta)) > CI.p3 [1] 0.4081025 0.5953910 #### # (c) #### > g <- glm(Success~X,fam=binomial,data=nail) > summary(g) Call: glm(formula = Success ~ X, family = binomial, data = nail) Deviance Residuals: Min 1Q Median 3Q Max -2.1158 -0.7469 0.4749 0.9043 1.6811 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.9491 0.4536 -4.297 1.73e-05 *** X 0.8149 0.1512 5.391 7.00e-08 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 202.69 on 149 degrees of freedom Residual deviance: 165.33 on 148 degrees of freedom AIC: 169.33 Number of Fisher Scoring iterations: 4 > 1-pchisq(202.69,149) [1] 0.002275330 > 1-pchisq(165.33,148) [1] 0.1565229