########
Solution
######## 
problem 1
#########
(a) The random effect model is 

     Y_{ij}=mu+beta_i+alpha_j+epsilon_{ij}, i=A,B,C,D,E, j=g1,g2,g3,g4,g5,g6,g7

  where mu and beta_i(for method) are fixed effect, alpha_j(for girder) is random effect, and
  epsilon_{ij} is error term.

  The estimates are 

   mu=0.9389, beta_A=0, beta_B=-0.0200, beta_c=-0.0451, beta_D=-0.04329, beta_D=-0.0623.

  The estimate of 

   std of grider 0.0410
   std of error  0.0442
  

 (i) The hypothesis is 

   H_0:  beta_A=...=beta_E=0  versus H_A: not all equal


   The F-value for methed is 2.0821 with df 4 and 24. The p-value is 0.1147.

   The LR p-value for LR test is 0.0799. Bootstrap p-value is 0.114 (your answer should be close to this number).
   Thus, H_0 is accepted. There is no significant different among those levels. (It is not necessary to use bootstrap method
   to compute the p-value).

 (ii) The test for girder effect is 

   H_0: sigma_g=0   (sigma_g is the std of girder) versus H_A: sigma_g>0.

   The p-value for likelihood-ratio statistic is 0.0015 (ML method) and 0.0033(REML meth0d). Bootstrap p-value is 0.002 (must be included).
   Thus, H_0 is reject. Thus, there is a significant girder effect.

(b) The p-values based on the F-table for the model with the main fxied effects of method and girder 
    are 0.1146 for method and 0.0013 for girder. Thus, method effect is not significant, but
    girder is. 

(c) The two methods are consistent.

##########
Problem 2
##########
(a) Omitted.
(b) The p-value of treat is 0.0449. It is significant.
    The p-value of block is 0.2145. It is insignificant.
(c) The first treatment has the largest value (16.25). 
    The second is the next (10.00). 
    The third is the lowest (-26.25)
(d) The method did not converge. It provides a negative F-value, which is signficant different from the previous one.
(e) It is also different, which is still caused by convergence problem.
(f) I use the likelihood ratio test. The p-value is 0.061.

##########
# Problem 3
##########

(a) Omitted.
(b) manufact has 2 levels. machine has 6 levels, speed has 2 levels. 
    manufact level A only appears in machine m1, m2, and m3.
    manufact level B only appears in machine m4, m5, and m6.
    They cannot be both estimated.
(c) I fit the model. If the same machine were test at the same speed, it should be the residual SD, which was 9.215.
    If different machines were tested at the same speed, then it should be the combination of machine and residual.
    The SD is sqrt(49.92+72.68+84.92)=14.45.

(d) I used the bootstrap method for the p-value. It was 0.005, which means that the interaction effect is significant.

(e) Variability between machine is SD=7.065. 

(f) In the nested model, variable between machine SD=12.19 and variability between manufact is SD=6.755.

##########
Output
##########
Problem 1
#########

> library(nls)
Warning message:
package 'nls' has been merged into 'stats' 
> library(lattice)
> library(nlme)
> 
> girder <- read.table("c:\\tonglinzhang\\stat526\\dataset\\Girder.dat",h=T)
> girder
   girder method strength
1      g1      A    0.878
2      g2      A    0.930
3      g3      A    0.916
4      g4      A    0.963
5      g5      A    0.929
6      g6      A    0.990
7      g7      A    0.966
8      g1      B    0.976
9      g2      B    0.958
10     g3      B    0.907
11     g4      B    0.872
12     g5      B    0.950
13     g6      B    0.925
14     g7      B    0.844
15     g1      C    0.907
16     g2      C    0.918
17     g3      C    0.861
18     g4      C    0.786
19     g5      C    0.962
20     g6      C    0.969
21     g7      C    0.867
22     g1      D    0.932
23     g2      D    0.910
24     g3      D    0.870
25     g4      D    0.838
26     g5      D    0.973
27     g6      D    0.918
28     g7      D    0.828
29     g1      E    0.949
30     g2      E    0.916
31     g3      E    0.821
32     g4      E    0.737
33     g5      E    0.975
34     g6      E    0.927
35     g7      E    0.811
> 
> ##
> # (a)
> ##
> 
> mod.reml <- lme(strength~method,random=~1|girder,data=girder)
> summary(mod.reml)
Linear mixed-effects model fit by REML
 Data: girder 
        AIC       BIC   logLik
  -68.29413 -58.48575 41.14706

Random effects:
 Formula: ~1 | girder
        (Intercept)   Residual
StdDev:  0.04101475 0.04417325

Fixed effects: strength ~ method 
                 Value  Std.Error DF  t-value p-value
(Intercept)  0.9388571 0.02278310 24 41.20850  0.0000
methodB     -0.0200000 0.02361160 24 -0.84704  0.4053
methodC     -0.0431429 0.02361160 24 -1.82719  0.0801
methodD     -0.0432857 0.02361160 24 -1.83324  0.0792
methodE     -0.0622857 0.02361160 24 -2.63793  0.0144
 Correlation: 
        (Intr) methdB methdC methdD
methodB -0.518                     
methodC -0.518  0.500              
methodD -0.518  0.500  0.500       
methodE -0.518  0.500  0.500  0.500

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-1.94844438 -0.60009384  0.03591486  0.49822467  1.75774161 

Number of Observations: 35
Number of Groups: 7 
> 
> # F-method
> 
> amod <- lme(strength~method,random=~1|girder,data=girder,method="ML")
> nmod <- lme(strength~1,random=~1|girder,data=girder,method="ML")
> anova(amod,nmod)
     Model df       AIC       BIC   logLik   Test  L.Ratio p-value
amod     1  7 -98.75623 -87.86880 56.37812                        
nmod     2  3 -98.41527 -93.74922 52.20763 1 vs 2 8.340967  0.0799
> 
> # likelihood method
> 
> 2*(logLik(amod)-logLik(nmod))
[1] 8.340967
attr(,"nall")
[1] 35
attr(,"nobs")
[1] 35
attr(,"df")
[1] 7
attr(,"class")
[1] "logLik"
> 
> # bootstrap p-value
> 
> lrstat <- numeric(1000)
> sigmaa <- 0.03641
> sigma <- 0.04746
> mu <- 0.9051
> 
> random <- rep(rnorm(7,0,sigmaa),5)
> error <- rnorm(35,0,sigma)
> rstrength <- mu+random+error
> rstrength
 [1] 0.9544562 0.8464705 1.0155808 0.9106024 0.9563436 0.9655796 0.7917269 0.8607714 0.8238111 0.9941654 0.8533801 0.8751194
[13] 0.9210014 0.8625383 0.8496021 0.8548478 1.0807228 0.9077487 0.8148547 0.9488718 0.9047379 0.9215406 0.7225758 0.9304280
[25] 0.8508462 0.9787897 0.9277308 0.8660172 0.8693693 0.7420250 0.9785317 0.8406518 0.8979515 1.0158236 0.8934263
> 
> for(i in 1:1000){
+ random <- rep(rnorm(7,0,sigmaa),5)
+ error <- rnorm(35,0,sigma)
+ rstrength <- mu+random+error
+ amodr <- lme(rstrength~method,random=~1|girder,data=girder,method="ML")
+ nmodr <- lme(rstrength~1,random=~1|girder,data=girder,method="ML")
+ lrstat[i] <- 2*(logLik(amodr)-logLik(nmodr))
+ }
> 
> mean(lrstat>8.3401)
[1] 0.114

> ### test for random effect
> 
> options(contrasts=c("contr.sum", "contr.poly"))
> 
> amod <- lme(strength~method,random=~1|girder,data=girder)
> nmod <- lm(strength~method,data=girder)
> anova(amod,nmod)
     Model df       AIC       BIC   logLik   Test  L.Ratio p-value
amod     1  7 -65.07525 -55.26687 39.53763                        
nmod     2  6 -58.44215 -50.03496 35.22107 1 vs 2 8.633107  0.0033
> 
> # likelihood method
> 2*(logLik(amod)-logLik(nmod))
[1] -23.60902
attr(,"nall")
[1] 35
attr(,"nobs")
[1] 30
attr(,"df")
[1] 7
attr(,"class")
[1] "logLik"
> 1-pchisq(2*(logLik(amod)-logLik(nmod)),1)
[1] 1
attr(,"nall")
[1] 35
attr(,"nobs")
[1] 30
attr(,"df")
[1] 7
attr(,"class")
[1] "logLik"
> 
> ##
> amod <- lme(strength~method,random=~1|girder,data=girder,method="ML")
> nmod <- lm(strength~method,data=girder)
> anova(amod,nmod)
     Model df       AIC       BIC   logLik   Test  L.Ratio p-value
amod     1  7 -98.75623 -87.86880 56.37812                        
nmod     2  6 -90.68428 -81.35219 51.34214 1 vs 2 10.07196  0.0015
> 
> # likelihood method
> 2*(logLik(amod)-logLik(nmod))
[1] 10.07196
attr(,"nall")
[1] 35
attr(,"nobs")
[1] 35
attr(,"df")
[1] 7
attr(,"class")
[1] "logLik"
> 1-pchisq(2*(logLik(amod)-logLik(nmod)),1)
[1] 0.001505429
attr(,"nall")
[1] 35
attr(,"nobs")
[1] 35
attr(,"df")
[1] 7
attr(,"class")
[1] "logLik"
> ###
> 
> mu <- nmod$coeff[1]
> mua <- c(nmod$coeff[-1],-sum(nmod$coeff[-1]))
> mua
     method1      method2      method3      method4              
 0.033742857  0.013742857 -0.009400000 -0.009542857 -0.028542857 
> sigma <- 0.06028
> 
> fixed <- rep(mua,rep(7,5))
> error <- rnorm(35,0,sigma)
> rstrength <- mu+fixed+error
> 
> lrstat <- rep(0,1000)
> for(i in 1:1000){
+ fixed <- rep(mua,rep(7,5))
+ error <- rnorm(35,0,sigma)
+ rstrength <- mu+fixed+error
+ amodr <- lme(rstrength~method,random=~1|girder,data=girder,method="ML")
+ nmodr <- lm(rstrength~method,data=girder)
+ lrstat[i] <- 2*(logLik(amodr)-logLik(nmodr))
+ }
> mean(lrstat>10.07196)
[1] 0.002
#######
Part b)
########

> g <- lm(strength~method+girder,data=girder)
> anova(g)
Analysis of Variance Table

Response: strength
          Df   Sum Sq  Mean Sq F value   Pr(>F)   
method     4 0.016251 0.004063  2.0821 0.114657   
girder     6 0.062174 0.010362  5.3105 0.001313 **
Residuals 24 0.046831 0.001951                    
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 

################
#  Problem 2.
################
> 
> eggprod <- read.table("u:\\stat526\\dataset\\ascdata\\eggprod.txt",h=T)
> eggprod
   treat block eggs
1      O     1  330
2      O     2  288
3      O     3  295
4      O     4  313
5      E     1  372
6      E     2  340
7      E     3  343
8      E     4  341
9      F     1  359
10     F     2  337
11     F     3  373
12     F     4  302
> 
> # (a) Omitted
> 
> # (b)
> 
> lmod <- aov(eggs~treat+factor(block),data=eggprod)
> summary(lmod)
              Df Sum Sq Mean Sq F value Pr(>F)  
treat          2   4212  2106.2   5.444 0.0449 *
factor(block)  3   2330   776.8   2.008 0.2145  
Residuals      6   2322   386.9                 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> 
> # (c)
> 
> library(lattice)
> library(nlme)
> 
> options(contrasts=c("contr.sum", "contr.sum"))
> g <- lme(eggs~treat,random=~1|factor(block),data=eggprod,method="ML")
> summary(g)
Linear mixed-effects model fit by maximum likelihood
 Data: eggprod 
       AIC     BIC    logLik
  114.8885 117.313 -52.44424

Random effects:
 Formula: ~1 | factor(block)
        (Intercept) Residual
StdDev:    9.872099 17.03489

Fixed effects: eggs ~ treat 
             Value Std.Error DF  t-value p-value
(Intercept) 332.75  8.045444  6 41.35881  0.0000
treat1       16.25  8.030324  6  2.02358  0.0895
treat2       10.00  8.030324  6  1.24528  0.2595
 Correlation: 
       (Intr) treat1
treat1  0.0         
treat2  0.0   -0.5  

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-1.97722972 -0.54794104 -0.03285658  0.74126662  1.65055481 

Number of Observations: 12
Number of Groups: 4 
> 
> # (d) 
> 
> library(pbkrtest)
> amod <- lmer(eggs~treat+(1|block),data=eggprod,REML=FALSE)
> nmod <- lmer(eggs~1+(1|block),data=eggprod,REML=FALSE)
> result <- KRmodcomp(amod,nmod)
Warning message:
In deviance.merMod(object, ...) :
  deviance() is deprecated for REML fits; use REMLcrit for the REML criterion or deviance(.,REML=FALSE) for deviance calculated at the REML fit
> result
F-test with Kenward-Roger approximation; computing time: 0.04 sec.
large : eggs ~ treat + (1 | block)
small : eggs ~ 1 + (1 | block)
         stat     ndf     ddf F.scaling p.value
Ftest -5.2377  2.0000  1.9210         1       1
> result$stats$Fstat
[1] -5.237667
> 
> # (e) 
> 
> run <- 1000
> lstat <- rep(0,1000)
> for(i in 1:1000){
+  y <- unlist(simulate(nmod,use.u=TRUE))
+  mod.n <- lmer(y~1+(1|block),data=eggprod,REML=FALSE)
+  mod.a <- lmer(y~treat+(1|block),data=eggprod,REML=FALSE)
+  result.sim <- KRmodcomp(mod.a,mod.n)
+  lstat[i] <- result.sim$stats$Fstat
+ }
There were 50 or more warnings (use warnings() to see the first 50)
> mean(lstat>result$stats$Fstat)
[1] 0.954
> 
> # (f) 
> 
> g.a <- lme(eggs~treat,random=~1|factor(block),data=eggprod,method="ML")
> g.n <- lme(eggs~1,random=~1|factor(block),data=eggprod,method="ML")
> 2*(logLik(g.a) - logLik(g.n))
'log Lik.' 8.424536 (df=5)
> 
> summary(g.n)
Linear mixed-effects model fit by maximum likelihood
 Data: eggprod 
      AIC      BIC    logLik
  119.313 120.7677 -56.65651

Random effects:
 Formula: ~1 | factor(block)
        (Intercept) Residual
StdDev: 0.001605146 27.17881

Fixed effects: eggs ~ 1 
             Value Std.Error DF  t-value p-value
(Intercept) 332.75  8.194718  8 40.60542       0

Standardized Within-Group Residuals:
       Min         Q1        Med         Q3        Max 
-1.6465035 -0.8278509  0.2115619  0.5243056  1.4809333 

Number of Observations: 12
Number of Groups: 4 
> 
> mu <- rep(332.75,12)
> sigmaran <- 0.001605
> sigma <- 27.17
> 
> lrstat <- rep(0,1000)
> for(i in 1:1000){
+  random <- rep(rnorm(4,0,sigmaran),3)
+  error <- rnorm(12,0,sigma)
+  y <- mu+random+error
+  gnull <- lme(y~1,random=~1|block,data=eggprod,method="ML")
+  galt <- lme(y~treat,random=~1|block,data=eggprod,method="ML")
+  lrstat[i] <- 2*(logLik(galt) - logLik(gnull)) 
+ }
> mean(lrstat>8.42)
[1] 0.061


> ###########
> # Problem 3.
> ###########
> 
> lawn <- read.table("u:\\stat526\\dataset\\ascdata\\lawn.txt",h=T)
> lawn
   manufact machine speed time
1         A      m1     L  211
2         A      m1     H  278
3         B      m4     L  205
4         B      m4     H  247
5         A      m1     L  230
6         A      m1     H  278
7         B      m4     L  217
8         B      m4     H  251
9         A      m2     L  184
10        A      m2     H  249
11        B      m5     L  169
12        B      m5     H  239
13        A      m2     L  188
14        A      m2     H  272
15        B      m5     L  168
16        B      m5     H  252
17        A      m3     L  216
18        A      m3     H  275
19        B      m6     L  200
20        B      m6     H  261
21        A      m3     L  232
22        A      m3     H  271
23        B      m6     L  187
24        B      m6     H  242
> 
> # (a) Omitted
> 
> # (b)
> g <- lm(time~manufact+machine+speed,data=lawn)
> summary(g)

Call:
lm(formula = time ~ manufact + machine + speed, data = lawn)

Residuals:
   Min     1Q Median     3Q    Max 
-12.50  -8.50  -3.00   7.50  19.25 

Coefficients: (1 not defined because of singularities)
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  230.083      2.284 100.758  < 2e-16 ***
manufact1     23.083      5.106   4.521 0.000302 ***
machine1      -3.917      6.459  -0.606 0.552255    
machine2     -29.917      6.459  -4.632 0.000238 ***
machine3      -4.667      6.459  -0.723 0.479790    
machine4      23.000      7.910   2.908 0.009804 ** 
machine5          NA         NA      NA       NA    
speed1        29.500      2.284  12.919 3.23e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 11.19 on 17 degrees of freedom
Multiple R-squared:  0.9251,    Adjusted R-squared:  0.8986 
F-statistic: 34.97 on 6 and 17 DF,  p-value: 1.183e-08

> 
> # (c) 
> amod <- lmer(time~manufact+speed+(1|manufact:machine)+(1|speed:machine),data=lawn,REML=FALSE)
> summary(amod)
Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: time ~ manufact + speed + (1 | manufact:machine) + (1 | speed:machine)
   Data: lawn

     AIC      BIC   logLik deviance df.resid 
   201.7    208.8    -94.9    189.7       18 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.1925 -0.6324 -0.2089  0.8129  1.3719 

Random effects:
 Groups           Name        Variance Std.Dev.
 speed:machine    (Intercept) 49.92    7.065   
 manufact:machine (Intercept) 72.68    8.525   
 Residual                     84.92    9.215   
Number of obs: 24, groups:  speed:machine, 12; manufact:machine, 6

Fixed effects:
            Estimate Std. Error t value
(Intercept)  230.083      4.451   51.69
manufact1     10.250      4.451    2.30
speed1        29.500      2.775   10.63

Correlation of Fixed Effects:
          (Intr) mnfct1
manufact1 0.000        
speed1    0.000  0.000 
> 
> # (d) 
> nmod <- lm(time~manufact+speed,data=lawn)
> 2*(logLik(amod) - logLik(nmod))
'log Lik.' 6.444842 (df=6)
> lrstat <- rep(0,1000)
> for(i in 1:1000){
+  y <- unlist(simulate(nmod,use.u=TRUE))
+  mod.a <- lmer(y~manufact+speed+(1|manufact:machine)+(1|speed:machine),data=lawn,REML=FALSE)
+  mod.n <- lm(y~manufact+speed,data=lawn)
+  lrstat[i] <- 2*(logLik(mod.a) - logLik(mod.n))
+ }
> 
> mean(lrstat>2*(logLik(amod) - logLik(nmod)))
[1] 0.005
> 
> # (f)
> g <- lmer(time~speed+(1|manufact)+(1|manufact:machine),data=lawn,REML=FALSE)
> summary(g)
Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: time ~ speed + (1 | manufact) + (1 | manufact:machine)
   Data: lawn

     AIC      BIC   logLik deviance df.resid 
   204.6    210.5    -97.3    194.6       19 

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.15784 -0.70450 -0.06307  0.64212  1.59863 

Random effects:
 Groups           Name        Variance Std.Dev.
 manufact:machine (Intercept) 148.75   12.196  
 manufact         (Intercept)  45.63    6.755  
 Residual                     118.19   10.872  
Number of obs: 24, groups:  manufact:machine, 6; manufact, 2

Fixed effects:
            Estimate Std. Error t value
(Intercept)  230.083      7.248   31.75
speed1        29.500      2.219   13.29

Correlation of Fixed Effects:
       (Intr)
speed1 0.000