One Sample Multivariate Analysis

In this lab, we construct tests and confidence intervals based on single multivariate sample. We use the microwave radiation data set again. If you have not read the data into Splus, you need to save it as h:\\stat524\\Datasets\\radiation.data, open Splus and type
>rad_read.table("h:\\stat524\\Datasets\\radiation.data",header=T,row.names
=NULL)
>rad
Recall that square-root transformation is need to assure multivariate normality.
>srad_sqrt(rad)
>srad
We compute the mean and variance of srad:
>xbar_apply(srad,2,mean)
>S_var(srad)
Let's test the hypothesis that the mean vector is (0.4,0.4)'. Compute the T2 statistic:
>mu_c(0.4,0.4)
>t2_42* (xbar-mu) %*% solve(S) %*% (xbar-mu)
>t2
Assume the significance level is 5%, we compute the critical value:
>n_nrow(srad)
>p_ncol(srad)
>fcrit_((n-1)*p/(n-p))*qf(0.95,p,n-p)
>fcrit
where qf(0.95,p,n-p) gives you the 95% percentile of F distribution with degrees of freedom (p,n-p). Based on the observed statistic t2 and the critical value, what is your conclusion? Now let's compute the P-value of the observed statistic:
>pvalue_1-pf(((n-p)/((n-1)*p))*t2, p, n-p)
>pvalue
Can you draw the same conclusion?

Now let's draw the confidence region for the mean vector mu. First we need open the splus graphics window:

>win.graph()
Compute a 95% confidence region for true mean:
>plot(ellipse(S/42,centre=xbar,t=sqrt(fcrit)),type="l")
>points(xbar[1],xbar[2])
From now on, we want to calculate confidence intervals. Recall that we need critical values for student's t distribution for each individual parameter. Assume that the confidence level is 95%.
>tcrit_qt(0.975,n-1)
>tcrit
First, the 95% confidence interval for mu1 (mean of the first variable)
>intv1_c(xbar[1]-sqrt(S[1,1]/n)*tcrit, xbar[2]+sqrt(S[2,2]/n)*tcrit)
>intv1
And drawn on the plot, it looks like:
>abline(v=intv1)
The simultaneous confidence interval for mu1 is
>sintv1_c(xbar[1]-sqrt(S[1,1]*fcrit/n), xbar[1]+sqrt(S[1,1]*fcrit/n))
>sintv1
Notice that sintv1 is wider than intv1. Now draw sintv1 on the same plot:
>abline(v=sintv1,lty=2)
Notice that how these lines meet the ellipse at a tangent. Now let's construct confidence intervals for mu1-mu2. So, a'=(1,-1).
>ap_c(1,-1)
>zbar_ap%*%xbar
>zvar_ap%*% S %*% ap
The one-at-a-time interval is
>intv12_c(zbar-tcrit*sqrt(zvar/n), zbar+tcrit*sqrt(zvar/n))
>intv12
which drawn on the plot look like this
>abline(-intv12[1], 1)
>abline(-intv12[2], 1)
Because intv12[1]<=mu1-mu2<=intv12[2] implies that mu2<=-intv12[1]+mu1 and mu2>=-intv12[2]+mu1. These are the two lines we just drew.

The simultaneous intervals for mu1-mu2 are

>sintv12_c(zbar-sqrt(fcrit)*sqrt(zvar/n), zbar+sqrt(fcrit)*sqrt(zvar/n))
>sintv12
>abline(-sintv12[1],1,lty=2)
>abline(-sintv12[2],1,lty=2)
For a large sample, the chi-square may be used as an approximation to the usual F-statistic. Consider the radiation data example. If we assume that the data set is from a normal population, the critical value is
>fcrit_((n-1)*p*/(n-p))*qf(0.95,p,n-p)
If we cannot assume normality, the critical value is
>chicrit_qchisq(0.95,p)
Notice that chicrit is less than fcrit. It is because the sample size in fact is not large enough. Suppose we have 1000 observations, then
>fcrit2_((1000-1)*2/(1000-2))*pf(0.95,2,1000-2)
>fcrit2
Now chicrit and fcrit2 are very close, F distribution and chi-square distribution will lead to the same conclusion.