>rad_read.table("h:\\stat524\\Datasets\\radiation.data",header=T,row.names =NULL) >radRecall that square-root transformation is need to assure multivariate normality.
>srad_sqrt(rad) >sradWe compute the mean and variance of srad:
>xbar_apply(srad,2,mean) >S_var(srad)Let's test the hypothesis that the mean vector is (0.4,0.4)'. Compute the T2 statistic:
>mu_c(0.4,0.4) >t2_42* (xbar-mu) %*% solve(S) %*% (xbar-mu) >t2Assume the significance level is 5%, we compute the critical value:
>n_nrow(srad) >p_ncol(srad) >fcrit_((n-1)*p/(n-p))*qf(0.95,p,n-p) >fcritwhere qf(0.95,p,n-p) gives you the 95% percentile of F distribution with degrees of freedom (p,n-p). Based on the observed statistic t2 and the critical value, what is your conclusion? Now let's compute the P-value of the observed statistic:
>pvalue_1-pf(((n-p)/((n-1)*p))*t2, p, n-p) >pvalueCan you draw the same conclusion?
Now let's draw the confidence region for the mean vector mu. First we need open the splus graphics window:
>win.graph()Compute a 95% confidence region for true mean:
>plot(ellipse(S/42,centre=xbar,t=sqrt(fcrit)),type="l") >points(xbar[1],xbar[2])From now on, we want to calculate confidence intervals. Recall that we need critical values for student's t distribution for each individual parameter. Assume that the confidence level is 95%.
>tcrit_qt(0.975,n-1) >tcritFirst, the 95% confidence interval for mu1 (mean of the first variable)
>intv1_c(xbar[1]-sqrt(S[1,1]/n)*tcrit, xbar[2]+sqrt(S[2,2]/n)*tcrit) >intv1And drawn on the plot, it looks like:
>abline(v=intv1)The simultaneous confidence interval for mu1 is
>sintv1_c(xbar[1]-sqrt(S[1,1]*fcrit/n), xbar[1]+sqrt(S[1,1]*fcrit/n)) >sintv1Notice that sintv1 is wider than intv1. Now draw sintv1 on the same plot:
>abline(v=sintv1,lty=2)Notice that how these lines meet the ellipse at a tangent. Now let's construct confidence intervals for mu1-mu2. So, a'=(1,-1).
>ap_c(1,-1) >zbar_ap%*%xbar >zvar_ap%*% S %*% apThe one-at-a-time interval is
>intv12_c(zbar-tcrit*sqrt(zvar/n), zbar+tcrit*sqrt(zvar/n)) >intv12which drawn on the plot look like this
>abline(-intv12[1], 1) >abline(-intv12[2], 1)Because intv12[1]<=mu1-mu2<=intv12[2] implies that mu2<=-intv12[1]+mu1 and mu2>=-intv12[2]+mu1. These are the two lines we just drew.
The simultaneous intervals for mu1-mu2 are
>sintv12_c(zbar-sqrt(fcrit)*sqrt(zvar/n), zbar+sqrt(fcrit)*sqrt(zvar/n)) >sintv12 >abline(-sintv12[1],1,lty=2) >abline(-sintv12[2],1,lty=2)For a large sample, the chi-square may be used as an approximation to the usual F-statistic. Consider the radiation data example. If we assume that the data set is from a normal population, the critical value is
>fcrit_((n-1)*p*/(n-p))*qf(0.95,p,n-p)If we cannot assume normality, the critical value is
>chicrit_qchisq(0.95,p)Notice that chicrit is less than fcrit. It is because the sample size in fact is not large enough. Suppose we have 1000 observations, then
>fcrit2_((1000-1)*2/(1000-2))*pf(0.95,2,1000-2) >fcrit2Now chicrit and fcrit2 are very close, F distribution and chi-square distribution will lead to the same conclusion.