Read in and check the Effluent Data
> eff_read.table("effluent.data",header=T) > eff(notice that the first 11 observations were from the comercial lab and the rest 11 observations were from the state lab)
Make a plot with a different label for the two labs and with lines joining the paired samples.
> win.graph() > plot(eff$bod,eff$ss,type="n",xlab="bod",ylab="ss")(There are a few plot types: "p": points; "l": lines; "b": both points and lines; "h": high density plot; "s": stairstep plot; "n": no data plotted yet)
> text(eff$bod,eff$ss,c(rep("c",11),rep("s",11))) > for(i in 1:11) lines(c(eff$bod[i],eff$bod[i+11]),c(eff$ss[i],eff$ss[i+11]))(When i=1, (eff$bod[1], eff$bod[12]) and (eff$ss[1],eff$ss[12]) are from the same pair of experimental unit, we just connected them together)
Compute the difference between the two labs.
> d_eff[1:11,3:4]-eff[12:22,3:4] > dThe mean
> mu_apply(d,2,mean) > muThe variance
> S_var(d) > SThe T^2 test that the true mean is 0 (i.e. that the difference is zero)
> t2_11 * mu %*% solve(S) %*% mu > t2The critical value is
> fcrit_(2*10/9)*qf(0.95,2,9) > fcritSo, what is your conclusion about the null hypothesis, that is , there is no difference between the comercial and the state labs.m Note that if we compute the simultaneous confindence intervals for each variable separately
>sintv1_c(mu[1]-sqrt(fcrit)*sqrt(S[1,1]/11),mu[1]+sqrt(fcrit)*sqrt(S[1,1]/11)) >sintv1 >sintv2_c(mu[2]-sqrt(fcrit)*sqrt(S[2,2]/11),mu[2]+sqrt(fcrit)*sqrt(S[2,2]/11)) >sintv2Both intervals contain zero -> the null hypothesis would be (falsely) accepted. This can be seen pictorially
> plot(ellipse(S/11,cent=mu,t=sqrt(fcrit)),type="l") > abline(h=c(mu[2]-sqrt(fcrit)*sqrt(S[2,2]/11),mu[2]+sqrt(fcrit)*sqrt(S[2,2]/11))) > abline(v=c(mu[1]-sqrt(fcrit)*sqrt(S[1,1]/11),mu[1]+sqrt(fcrit)*sqrt(S[1,1]/11))) > points(0,0)See what happens if you use the t-critical value:
> tcrit_qt(0.975,10) > abline(v=c(mu[1]-tcrit*sqrt(S[1,1]/11),mu[1]+tcrit*sqrt(S[1,1]/11)),lty=2) > abline(h=c(mu[2]-tcrit*sqrt(S[2,2]/11),mu[2]+tcrit*sqrt(S[2,2]/11)),lty=2)Thus the person who knew nothing of multivariate methods would also be misled.
The treatments are
> dog_ read.table("dog.data") > dogPlot the four groups:
> boxplot(dog$V1,dog$V2,dog$V3,dog$V4,names=c("high-no","low-no","high-yes","low-yes"))Also make a plot with lines joining the same dog
> matplot(1:4,t(dog),type="b",axes=F) > axis(2) > axis(1,at=1:4,c("high-no","low-no","high-yes","low-yes"))Notice that the lines cross much less than what would be expected by chance. This can be attributed to the dependence between measurements on the same dog.
Now compute the difference between successive columns:
> ndog_cbind(dog$V1-dog$V2,dog$V2-dog$V3,dog$V4-dog$V3) > ndogWhat graphical examination of this new matrix would be appropriate?
Compute the mean and variance:
> mu_apply(ndog,2,mean) > mu > S_var(ndog)The T^2 statistic:
> t2_19 * mu %*% solve (S) %*% mu > t2which is much bigger than the F-critical value
> fcrit_(3*18/16)*qf(0.95,3,16) > fcritso we reject the null hypothesis of equality.
We can also do this using contrast matrices and the original data:
> cm_cbind(1,diag(rep(-1,3))) > cm > omu_apply(dog,2,mean) > omu > oS_var(dog) > oS > t2_19 * t(cm %*% omu) %*% solve(cm %*% oS %*% t(cm)) %*% (cm %*% > omu) > t2and get the same result but the first method seems easier.
Apart from lack of equality, what are the conclusions? Our choice of contrasts is not good for comparing the treatment levels. A better choice is compare the average of high and low CO2, Halothane or no Halothane, and the remaining interaction effect.
> ndog_cbind(dog$V1+dog$V3-dog$V2-dog$V4,dog$V3+dog$V4-dog$V1-dog$V2,dog$V1+dog$V4-dog$V2-dog$V3) > ndogcbind is a function to form a matrix from several vectors. Recompute T^2
> mu_apply(ndog,2,mean) > mu > S_var(ndog) > S > t2_19 * mu %*% solve (S) %*% mu > t2and get the same result. Now compute the CI for each of the contrasts:
> for(i in 1:3) print(c(mu[i]-sqrt(fcrit*S[i,i]/19),mu[i]+sqrt(fcrit*S[i,i]/19)))The first shows that lower CO2 produces longer times between heartbeats. The second shows that Halothane produces longer times between heartbeats. The third shows that the interaction effect is not significant (zero is in the interval). Note that the Halothane runs occured after the no halothane runs so there could be a confounding time effect.
Compute the means and variances in the two groups:
> mu1_apply(eff[1:11,3:4],2,mean) > mu2_apply(eff[12:22,3:4],2,mean) > S1_var(eff[1:11,3:4]) > S2_var(eff[12:22,3:4])and the pooled variance:
> Spool_((11-1)*S1+(11-1)*S2)/(11+11-2) > SpoolThe T2 statistic for testing equality is
> t2_(mu1-mu2) %*% solve(2*Spool/11) %*% (mu1-mu2) >t2whereas the critical value is
> fcrit_(20*2/19)*qf(0.95,2,19) > fcritso the hypothesis of equality is rejected.
Let's take a look at the confidence region:
> plot(ellipse(2*Spool/11,cent=(mu1-mu2),t=sqrt(fcrit)),type="l") > abline(v=0,h=0,lty=2) > points(mu1[1]-mu2[1],mu1[2]-mu2[2])Here's the simultaneous confidence interval for the bod:
>sintv1_c(mu1[1]-mu2[1]-sqrt(fcrit)*sqrt(2*Spool[1,1]/11),mu1[1]-mu2[1]+sqrt(fcrit)*sqrt(2*Spool[1,1]/11)) >sintv1 >abline(v=sintv1,lty=5)Can you compute and plot the interval for ss?
Look at the ratio of corresponding entries:
> S1/S2There is a somewhat worrying difference in variation.
> ant_read.table("anteater.data",header=T) > antNow fit the model - the response is a matrix of three columns and the predictor is the location:
> g_manova(cbind(basal,occipital,nasal) ~ location, data=ant)Now compute the 4 different test statistics - note that the the Pillai trace is the default:
> summary(g) > summary(g,test="wilk") > summary(g,test="roy") > summary(g,test="hot")The "approx F." refers to the fact that all 4 test statistics are transformed to approximately match an F distribution for the computation of the p-value. Note that the p-value varies but the null is accepted by all 4 tests. This is a small sample so not finding a difference is hardly surprising.
Now compute some of the auxiliary quantities:
> gs_summary(g)The eigenvalues lambda:
> gs$ENote the last one is 0 (within rounding error).
The sum of squares: the first is the between (B) and the second is the within (W)
> gs$SSWith these quantities, we can construct the manova tables.