In this note, I will show how to choose an appropriate sample size to ensure that the simultaneous confidence intervals are bounded by some length. We have discussed how to choose a sample to yield a desired power of the F-test to detect a target difference. These are the two primary ways to determine the sample size for planning purpose.

Note all simultaneous confidence intervals have the form \[\text{estimate} \pm w \times \text{standard error}.\] where \(w\) depends the method. The follow table summarizes the \(w\) value to be used.

Method Bonferroni Scheffe Tukey Dunnett
\(w\) \(t_{n-v, \alpha /(2 m)}\) \(\sqrt{(v-1) F_{v-1, n-v, \alpha}}\) \(\frac{q_{v, n-v, \alpha}}{\sqrt{2}}\) needs the multivariate t-distribution
SAS function \(tinv(1-alpha,df)\) \(finv(1-alpha,df)\) \(probmc('range',.,prob,df,v)\) \(probmc('dunnett2',.,prob,df,v-1)\)
Table A.4, p802 F value in A.6,804 q value in A.8, p814 A.10, 818

The standard error depends on the contrast as well as the \(MSE\). For example, for a pairwise difference \(\mu_i-\mu_j\) with equal sample size \(r\), the standard error is \(\sqrt{MSE\times(2/r)}\). We must have an estimate for \(MSE\). One either uses an educated guess or a confidence upper limit for \(\sigma^2\) because \(MSE\) is an estimate for \(\sigma^2\). Say, use a 90% confidence upper limit for \(\sigma^2\) to replace MSE.

Consider the trout experiment in Exercise 15 of Chap. 3. The SAS code for the analysis of variance is given below

data trout;
  do sulfa = 1 to 4;
    do rep = 1 to 10;
      input hemo @@;
      output;
  end; end;
  lines;
   6.7  7.8  5.5  8.4  7.0  7.8  8.6  7.4  5.8  7.0
   9.9  8.4 10.4  9.3 10.7 11.9  7.1  6.4  8.6 10.6
  10.4  8.1 10.6  8.7 10.7  9.1  8.8  8.1  7.8  8.0
   9.3  9.3  7.2  7.8  9.3 10.2  8.7  8.6  9.3  7.2
;
run;

proc glm data=trout;
class sulfa;
model hemo=sulfa;
lsmeans sulfa/cl adjust=Tukey;
run;

The ANOVA table is produced below.

Source DF Sum of Squares Mean Square F Value Pr>F
Model 3 26.80275000 8.93425000 5.70 0.0027
Error 36 56.47100000 1.56863889 0 0
Corrected Total 39 83.27375000

Suppose the experiment were to be repeated and we would like the 95% simultaneous confidence intervals using Tukey’s method to have a half-width 1 g per 100 ml. We will use the 90% confidence upper limit of \(\sigma^2\) for the planning purpose. Assuming equal sample size, how large the sample size \(r\) should be?

First, find the 90% confidence upper limit for \(\sigma^2\). It is given by \(SSE/\chi_{n-v, 0.90}^2=56.4710/\chi_{36, 0.90}^2=56.4710/25.6433=\) 2.2022. Then we can calculate the MSD or half-width of Tukey’s 95% simultaneous confidence intervals.

data q;
input r @@;
alpha=0.05;
v=4;
MSE=2.2022;
n=v*r;
df=n-v;
prob=1-alpha;
qT=probmc("range",.,prob,df,v);
msd=(qT/2**0.5)*(MSE*2/r)**0.5;
lines;
20 30 40
;
proc print;
run;

The SAS output is reproduced below.

Obs r alpha v MSE n df prob qT msd
1 20 0.05 4 2.2022 80 76 0.95 3.71485 1.23269
2 30 0.05 4 2.2022 120 116 0.95 3.68638 0.99878
3 40 0.05 4 2.2022 160 156 0.95 3.67263 0.86174

Read Example 4.5.1 about the bean-soaking experiment. We revise the code slightly to easily get the desired sample size.

data size;
input r @@;
alpha=0.05;
v=5;
MSE=10;
n=v*r;
df=n-v;
prob=1-alpha;
qT=probmc("range",.,prob,df,v);
msd=(qT/2**0.5)*(MSE*2/r)**0.5;
lines;
10 15 16 17 18 19
;
proc print data=size;
run;

SAS output is below. We see that a sample size 18 will yield a width of the simultaneous confidence intervals to be less than 6.

Obs r alpha v MSE n df prob qT msd
1 10 0.05 5 10 50 45 0.95 4.01842 4.01842
2 15 0.05 5 10 75 70 0.95 3.96001 3.23334
3 16 0.05 5 10 80 75 0.95 3.95308 3.12519
4 17 0.05 5 10 85 80 0.95 3.94703 3.02723
5 18 0.05 5 10 90 85 0.95 3.94170 2.93797
6 19 0.05 5 10 95 90 0.95 3.93696 2.85617