Determining Sample size

In this note, I will show how to choose an appropriate sample size to ensure that the simultaneous confidence intervals are bounded by some length. We have discussed how to choose a sample to yield a desired power of the F-test to detect a target difference. These are the two primary ways to determine the sample size for planning purpose.

Note all simultaneous confidence intervals have the form \[\text{estimate} \pm w \times \text{standard error}.\] where \(w\) depends the method. The follow table summarizes the \(w\) value to be used.

Method	Bonferroni	Scheffe	Tukey	Dunnett
\(w\)	\(t_{n-v, \alpha /(2 m)}\)	\(\sqrt{(v-1) F_{v-1, n-v, \alpha}}\)	\(\frac{q_{v, n-v, \alpha}}{\sqrt{2}}\)	needs the multivariate t-distribution
SAS function	\(tinv(1-alpha,df)\)	\(finv(1-alpha,df)\)	\(probmc('range',.,prob,df,v)\)	\(probmc('dunnett2',.,prob,df,v-1)\)
Table	A.4, p802	F value in A.6,804	q value in A.8, p814	A.10, 818

The standard error depends on the contrast as well as the \(MSE\). For example, for a pairwise difference \(\mu_i-\mu_j\) with equal sample size \(r\), the standard error is \(\sqrt{MSE\times(2/r)}\). We must have an estimate for \(MSE\). One either uses an educated guess or a confidence upper limit for \(\sigma^2\) because \(MSE\) is an estimate for \(\sigma^2\). Say, use a 90% confidence upper limit for \(\sigma^2\) to replace MSE.

Consider the trout experiment in Exercise 15 of Chap. 3. The SAS code for the analysis of variance is given below

data trout;
  do sulfa = 1 to 4;
    do rep = 1 to 10;
      input hemo @@;
      output;
  end; end;
  lines;
   6.7  7.8  5.5  8.4  7.0  7.8  8.6  7.4  5.8  7.0
   9.9  8.4 10.4  9.3 10.7 11.9  7.1  6.4  8.6 10.6
  10.4  8.1 10.6  8.7 10.7  9.1  8.8  8.1  7.8  8.0
   9.3  9.3  7.2  7.8  9.3 10.2  8.7  8.6  9.3  7.2
;
run;

proc glm data=trout;
class sulfa;
model hemo=sulfa;
lsmeans sulfa/cl adjust=Tukey;
run;

The ANOVA table is produced below.

Source	DF	Sum of Squares	Mean Square	F Value	Pr>F
Model	3	26.80275000	8.93425000	5.70	0.0027
Error	36	56.47100000	1.56863889	0	0
Corrected Total	39	83.27375000

Suppose the experiment were to be repeated and we would like the 95% simultaneous confidence intervals using Tukey’s method to have a half-width 1 g per 100 ml. We will use the 90% confidence upper limit of \(\sigma^2\) for the planning purpose. Assuming equal sample size, how large the sample size \(r\) should be?

First, find the 90% confidence upper limit for \(\sigma^2\). It is given by \(SSE/\chi_{n-v, 0.90}^2=56.4710/\chi_{36, 0.90}^2=56.4710/25.6433=\) 2.2022. Then we can calculate the MSD or half-width of Tukey’s 95% simultaneous confidence intervals.

data q;
input r @@;
alpha=0.05;
v=4;
MSE=2.2022;
n=v*r;
df=n-v;
prob=1-alpha;
qT=probmc("range",.,prob,df,v);
msd=(qT/2**0.5)*(MSE*2/r)**0.5;
lines;
20 30 40
;
proc print;
run;

The SAS output is reproduced below.

Obs	r	alpha	v	MSE	n	df	prob	qT	msd
1	20	0.05	4	2.2022	80	76	0.95	3.71485	1.23269
2	30	0.05	4	2.2022	120	116	0.95	3.68638	0.99878
3	40	0.05	4	2.2022	160	156	0.95	3.67263	0.86174

Read Example 4.5.1 about the bean-soaking experiment. We revise the code slightly to easily get the desired sample size.

data size;
input r @@;
alpha=0.05;
v=5;
MSE=10;
n=v*r;
df=n-v;
prob=1-alpha;
qT=probmc("range",.,prob,df,v);
msd=(qT/2**0.5)*(MSE*2/r)**0.5;
lines;
10 15 16 17 18 19
;
proc print data=size;
run;

SAS output is below. We see that a sample size 18 will yield a width of the simultaneous confidence intervals to be less than 6.

Obs	r	alpha	v	MSE	n	df	prob	qT	msd
1	10	0.05	5	10	50	45	0.95	4.01842	4.01842
2	15	0.05	5	10	75	70	0.95	3.96001	3.23334
3	16	0.05	5	10	80	75	0.95	3.95308	3.12519
4	17	0.05	5	10	85	80	0.95	3.94703	3.02723
5	18	0.05	5	10	90	85	0.95	3.94170	2.93797
6	19	0.05	5	10	95	90	0.95	3.93696	2.85617

Determining Sample size

Hao Zhang