One Sample Multivariate Analysis

In this lab, we construct tests and confidence intervals based on single multivariate sample. We use the microwave radiation data set again. If you have not read the data into Splus, you need to save it as h:\\stat524\\Datasets\\radiation.data, open Splus and type

>rad_read.table("h:\\stat524\\Datasets\\radiation.data",header=T,row.names
=NULL)
>rad

Recall that square-root transformation is need to assure multivariate normality.

>srad_sqrt(rad)
>srad

We compute the mean and variance of srad:

>xbar_apply(srad,2,mean)
>S_var(srad)

Let's test the hypothesis that the mean vector is (0.4,0.4)'. Compute the T² statistic:

>mu_c(0.4,0.4)
>t2_42* (xbar-mu) %*% solve(S) %*% (xbar-mu)
>t2

Assume the significance level is 5%, we compute the critical value:

>n_nrow(srad)
>p_ncol(srad)
>fcrit_((n-1)*p/(n-p))*qf(0.95,p,n-p)
>fcrit

where qf(0.95,p,n-p) gives you the 95% percentile of F distribution with degrees of freedom (p,n-p). Based on the observed statistic t2 and the critical value, what is your conclusion? Now let's compute the P-value of the observed statistic:

>pvalue_1-pf(((n-p)/((n-1)*p))*t2, p, n-p)
>pvalue

Can you draw the same conclusion?

Now let's draw the confidence region for the mean vector mu. First we need open the splus graphics window:

>win.graph()

Compute a 95% confidence region for true mean:

>plot(ellipse(S/42,centre=xbar,t=sqrt(fcrit)),type="l")
>points(xbar[1],xbar[2])

From now on, we want to calculate confidence intervals. Recall that we need critical values for student's t distribution for each individual parameter. Assume that the confidence level is 95%.

>tcrit_qt(0.975,n-1)
>tcrit

First, the 95% confidence interval for mu1 (mean of the first variable)

>intv1_c(xbar[1]-sqrt(S[1,1]/n)*tcrit, xbar[2]+sqrt(S[2,2]/n)*tcrit)
>intv1

And drawn on the plot, it looks like:

>abline(v=intv1)

The simultaneous confidence interval for mu1 is

>sintv1_c(xbar[1]-sqrt(S[1,1]*fcrit/n), xbar[1]+sqrt(S[1,1]*fcrit/n))
>sintv1

Notice that sintv1 is wider than intv1. Now draw sintv1 on the same plot:

>abline(v=sintv1,lty=2)

Notice that how these lines meet the ellipse at a tangent. Now let's construct confidence intervals for mu1-mu2. So, a'=(1,-1).

>ap_c(1,-1)
>zbar_ap%*%xbar
>zvar_ap%*% S %*% ap

The one-at-a-time interval is

>intv12_c(zbar-tcrit*sqrt(zvar/n), zbar+tcrit*sqrt(zvar/n))
>intv12

which drawn on the plot look like this

>abline(-intv12[1], 1)
>abline(-intv12[2], 1)

Because intv12[1]<=mu1-mu2<=intv12[2] implies that mu2<=-intv12[1]+mu1 and mu2>=-intv12[2]+mu1. These are the two lines we just drew.

The simultaneous intervals for mu1-mu2 are

>sintv12_c(zbar-sqrt(fcrit)*sqrt(zvar/n), zbar+sqrt(fcrit)*sqrt(zvar/n))
>sintv12
>abline(-sintv12[1],1,lty=2)
>abline(-sintv12[2],1,lty=2)

For a large sample, the chi-square may be used as an approximation to the usual F-statistic. Consider the radiation data example. If we assume that the data set is from a normal population, the critical value is

>fcrit_((n-1)*p*/(n-p))*qf(0.95,p,n-p)

If we cannot assume normality, the critical value is

>chicrit_qchisq(0.95,p)

Notice that chicrit is less than fcrit. It is because the sample size in fact is not large enough. Suppose we have 1000 observations, then

>fcrit2_((1000-1)*2/(1000-2))*pf(0.95,2,1000-2)
>fcrit2

Now chicrit and fcrit2 are very close, F distribution and chi-square distribution will lead to the same conclusion.