Comparison of Means

The F-test in ANOVA only tests for the equality of all means. What about the specific differences among the means?

Example: The battery experiment. Four treatments are coded as 1=alkaline, name brand; \(\quad 2=\) alkaline, store brand; 3=heavy duty; name brand; \(4=\) heavy duty, store brand. Four batteries of each treatment were tested to obtain the lifetime per cent. The F-test has a p-values<0.0001 that indicates a significant difference among the treatment means. However, we might be interested in the following differences as well:

\(\mu_2-\mu_1, \mu_2-\mu_3, \mu_2-\mu_4\) or possibly \(\frac{\mu_1+\mu_2}{2}-\frac{\mu_3+\mu_4}{2}\).

Each is a contrast, and each has a specific interpretation.

Contrast

Recall:

\(\mu_i\): treatment mean; \(\tau_i\): treatment effect.

A linear combination \(\sum_{i=1}^\nu c_i\mu_i\) is a contrast if \(\sum_i c_i=0\). The vector \((c_1, c_2, \ldots, c_\nu)\) is the contrast coefficients.

For a contrast, \[ \sum_{i=1}^\nu c_i\mu_i=\sum_{i=1}^\nu c_i\tau_i. \] A contrast can be written in terms of treatment means or treatment effects. It is often expressed in terms of treatment effects particularly for two-way or multi-way models.

Common Types of Contrast

Pairwise Comparisons

A pairwise comparison is a difference between any two treatment effects \(\tau_i-\tau_j\). The contrast coefficient is \((0,0, \ldots, 1,0, \ldots,-1,0, \ldots)\) where the 1 and \(-1\) are in the \(i\)th and \(j\)th position, respectively.

Q: If there are \(v=4\) treatments, how many pairwise comparisons that need to be investigated?

Treatment Versus Control

If treatment 1 is control, then the treatment versus control contrasts are \(\tau_2-\tau_1,, \ldots, \tau_v-\tau_1\).

Difference of Averages

When the treatments divide naturally into two or more groups and the experimenter is interested in the difference of averages, the difference of averages contrasts are used. For example, in the pedestrian light experiment, it is of interest to use the contrast \(\left(\tau_2+\tau_3+\tau_4\right) / 3-\tau_1\)

Trend Contrast

Used when the levels of the treatment factor are quantitative and have a natural ordering as in the hear-lung experiment. Table A.2 in the textbook lists contrast coefficients for orthogonal polynomial trend contrasts when the levels are equally spaced and the replicates are equal.

These coefficients are obtained by fitting a regression model to the uncoded levels of the treatment factor.

Formula for Linear Trend Contrast: Suppose the treatments are \(v\) values \(x_1, x_2, \ldots, x_\nu\) in ascending order. The contrast coefficients are

\[ c_i=r_i\left(x_i-\bar{x}_{\cdot}\right), \text{ where } \bar{x}_{\cdot} =\left(\Sigma r_i x_i\right) / n \]

where \(r_i\) is the number of observations taken on the \(i\) th uncoded level \(x_i\) of the treatment factor, and \(n=\Sigma r_i\).

Example The five treatments are 50, 75, 100, 125, 150 revolutions per minutes, but the number of replicates for the treatments are \(r_1=r_3=r_5=5, r_2=3\) and \(r_4=2 .\) Now \(n=\Sigma r_i=20\), and

\[ \bar x_{\cdot}=\left(\Sigma r_i x_i\right) / n=(5(50)+3(75)\\ +5(100)+2(125)+5(150))/20=98.75 \]

The coefficients of contrast are \([-243.75,-71.25,6.25,52.50,256.25]\).

Inferences for Individual Contrast and Treatment Means

Least square estimates

A contrast \(\sum c_i \tau_i\) or \(\sum c_i \mu_i\) is estimated by \(\sum c_i \bar Y_{i\cdot}\), which has the normal distribution with mean \(\sum c_i \tau_i\) and variance \(\sigma^2\sum_i c_i^2/r_i\). The variance \(\sigma^2\) is estimated by \(MSE\) that has \(n-\nu\) degrees of freedom. Hence the inferences about the contrast involves the \(t_{n-\nu}\) distribution.

Confidence Interval

A 100(1-\(\alpha\))% confidence interval for \(\sum_ic_i\mu_i\) is

\[ \sum c_i \bar{Y}_{i \cdot} \pm t_{n-v, \alpha / 2} \sqrt{MSE \sum c_i^2 / r_i}. \]

Derived from \(\sum c_i \bar{Y}_{i \cdot}\sim N(\sum c_i \mu_i, \sigma^2 \sum c_i^2 / r_i )\). It applies to any linear combination \(\sum_ic_i\mu_i\) (e.g., each individual \(\mu_i\)).

Example: In the battery experiment, fine a \(95 \%\) confidence interval for \(\mu_2-\mu_1\). The point estimate is \(\bar{y}_{2 .}-\bar{y}_{1 .}=289.75\). The standard error of the estimator is \(\sqrt{M S E *\left(\frac{1}{r_2}+\frac{1}{r_1}\right)}=\sqrt{2367.7083 *\left(\frac{1}{4}+\frac{1}{4}\right)}=34.41\). The number of degrees of freedom is \(n-v=12\), and \(t_{12,0.025}=2.179\). The minimum significant difference is \(m s d=2.179 * 34.41=74.98\). The confidence interval is \(289.75 \pm 74.98\) or \((214.77, 364.73)\).

Hypothesis Testing

\[H_0: \sum c_i\tau_i=0, ~~ H_1: \sum c_i\tau_i\ne 0.\] reject \(H_0\) if \[ \left|\frac{\sum c_i \bar{Y}_{i\cdot}}{\sqrt{MSE \sum c_i^2 / r_i}}\right|>t_{n-v, \alpha / 2}. \]

Multiple Comparisons

Multiple Confidence Intervals

Suppose we want to have pairwise comparisons, say, \(\mu_2-\mu_1, \mu_3-\mu_2, \mu_4-\mu_1\). Therefore we would have several confidence intervals. If each one is \(95 \%\) correct, the probability that these confidence intervals are simultaneously correct is much smaller than \(95 \%\). We would like to control the probability that they are simultaneously correct. Several methods have been developed. These are called simultaneous confidence intervals.

Multiple Comparison Methods

  1. Bonferroni method for preplanned comparisons: Applied to any \(m\) preplanned estimable contrasts. Can be used for any design. Can not be used for data snooping (which means to choose the contrasts that data seem to suggest to be highly significant.)

  2. Scheffe method for all comparisons. Applied to any \(m\) contrasts. Can be used for any design, and allows for data snooping.

  3. Tukey method for all pairwise comparisons: Best for all pairwise comparisons. Can be used for completely randomized designs, randomized block designs, and balanced incomplete bock designs.

  4. Dunnett method for treatment-versus-control contrasts: Best for all treatment-versuscontrol contrasts. Can be used for completely randomized designs, randomized block designs, and balanced incomplete bock designs.

  5. Hsu method for multiple comparisons with the best treatment: Selects the best treatment and identifies those treatments that are significantly worse than the best. Can be used for completely randomized designs, randomized block designs, and balanced incomplete bock designs.

Bonferroni method:

Suppose there are \(m\) pre-planned contrasts and we want a set of \(m\) simultaneous \(100(1-\alpha) \%\) confidence intervals. The Bonferroni method makes each individual interval estimate to be a \(100\left(1-\frac{\alpha}{2 m}\right) \%\) confidence interval. Hence the intervals are given by

\[\text{estimate} \pm t_{n-v, \alpha /(2 m)} \times \text{standard error}.\]

A larger critical value is used.

Example: For the Battery Experiment, find the simultaneous \(95 \%\) confidence intervals for \(\mu_2-\mu_1, \mu_2-\mu_3, \mu_2-\mu_4\) using the Bonferroni method. Here \(m=3\), and the critical value is \(t_{12,0.05 /(2 * 3)}=2.779\). This critical coefficient is found in SAS by the function tinv:

data;
prob=1-0.05/6;
df=12;
tcv=tinv(prob, df);
proc print;
run;

The standard error of the difference between two sample means is \(\sqrt{M S E * \frac{2}{r}}=48.67 / \sqrt{2}= 34.41\) (why?). The margin errors are \(2.779 * 34.41=95.63\).

The estimates of \(\mu_2-\mu_1, \mu_2-\mu_3, \mu_2-\mu_4\) are \(\bar{y}_{2 .}-\bar{y}_{1 .}=289.75, \bar{y}_{2 .}-\bar{y}_{3 .}=427.50, \bar{y}_{2 .}-\bar{y}_{4 .}=364.25\). Hence the simultaneous 95% confidence intervals are \(289.75 \pm 95.63,427.50 \pm 95.63\) and \(364.25 \pm 95.63 .\)

Note these confidence intervals have the same margin of error because of equal sample sizes.

Two drawbacks:

  1. It is dangerous to apply the Bonferroni method after the data are collected, because one might choose some comparisons that appear to be significant. This is called data snooping.
  2. The confidence intervals become increasingly wider as \(m\) increases.

Scheffé Method

Applies to all contrasts. The simultaneous confidence intervals for \(\sum_i c_i \mu_i\) are given by \[ \sum_i c_i \bar{y}_{i \cdot} \pm \sqrt{(v-1) F_{v-1, n-v, \alpha}} \sqrt{\operatorname{MSE} \sum_i c_i^2 / r_i} \]It is based on the following result due to Henry Scheffe (1953): Let \[ K=\max _{\sum c_i=0} \frac{\left(\sum c_i \bar{Y}_{i \cdot}-\sum c_i \mu_i\right)^2}{M S E \sum c_i^2 / r_i} . \] Then \[ \frac{K}{v-1} \sim F_{v-1, n-v} \] where \((v-1)\) is the dimension of the contrasts or the number of independent contrasts, and \((n-v)\) is the degrees of freedom of MSE. Therefore we have \[ P\left(K \leq(v-1) F_{v-1, n-v, \alpha}\right)=1-\alpha, \text { or } P\left(\sqrt{K} \leq \sqrt{(v-1) F_{v-1, n-v, \alpha}}\right)=1-\alpha . \] Equivalently, \[ P\left(\left|\Sigma c_i \bar{y}_i-\Sigma c_i \mu_i\right| \leq \sqrt{(v-1) F_{v-1, n-v, \alpha}} \sqrt{M S E * \sum\left(\frac{c_i^2}{r_i}\right)} \text { for all } \sum c_i=0\right)=1-\alpha \]

Scheffé’s method usually produces wide intervals. Use it only when the number of \(m\) is very large.

Tukey’s method for all pairwise comparisons

The best method for all pairwise comparisons. It is based on \[ Q=\frac{\max _{i, j}\left|\left(\bar{Y}_{i .}-\bar{Y}_{j .}\right)-\left(\mu_i-\mu_j\right)\right|}{\sqrt{M S E / r}} \] where \(r\) is the common sample size. For each pair, the ratio has a \(t\)-distribution. This maximum has the Studentized Range Distribution, whose upper critical value is denoted by \(q_{v, n-v, \alpha}\) and is provided in Table A.8.

The confidence interval is given by \[ \text{estimate} \pm \frac{1}{\sqrt{2}} q_{v, n-v, \alpha} * \text{(standard error)}. \]

Example: In the Battery Experiment, \(v=4, n-v=12, \frac{q_{v, n-v, 0.05}}{\sqrt{2}}=\frac{4.20}{\sqrt{2}}=2.970\). The critical coefficient for the Tukey SCls for all pairwise comparison is \(2.97 .\)

Using the Bonferroni method, since there are \(m=6\) all pairwise comparisons, the critical coefficient is \(t_{12,0.05 /(2 m)}=3.152\).

For the Scheffe’s method, the critical coefficient is \[ \sqrt{(v-1) F_{v-1, n-v, 0.05}}=\sqrt{(3-1) * 3.49}=3.24 \text {. } \]

Tukey’s method is the best for ALL pairwise comparisons. If not all but only some pairwise comparisons are needed, Tukey’s method may not be the best one.

Dunnett Method for Treatment-versus-Control Comparisons

Let treatment 1 be the control, then the \((v-1)\) treatment-versus-control contrasts are \(\mu_i-\mu_1, i=1, \cdots, v\). Dunnett (1955) developed a method for MC that is based on the joint distribution of the estimators \(\bar{Y}_{i\cdot}-\bar{Y}_{1\cdot}, i=2, \cdots, v\), which is a multivariate \(t\)-distribution and depends on the correlation between the differences \(\bar{Y}_{i\cdot}-\bar{Y}_{1\cdot\cdot}\) When all sample sizes equal, the critical coefficient \(w_D\) is given in Table A.10 for two-sided SCls and Table A.9 for one sided SCls. The one-sided SCls are used to select those treatments that have higher means than the control.

SAS Example

In the battery experiment, tthere are two treatment factors each having two levels. These are battery “duty” (level \(1=\) alkaline, level 2 = heavy duty) and “brand” (level \(1=\) name brand, level \(2=\) store brand). This gives four treatment combinations coded 11, 12, 21, 22. We will recode these treatment combinations as \(1,2,3,4\). Thus, the levels of battery type are:

Level Treatment Combination
1 alkaline, name brand (11)
2 alkaline, store brand (12)
3 heavy duty, name brand (21)
4 heavy duty, store brand (22)
* battery.sas, battery experiment, Table 4.2 (page 94);
options ls=85;
DATA BATTERY;
 INPUT TYPEBAT LIFEUC ORDER;
 LINES;
  1 611 1 
  1 537 3 
  4 476 4 
  1 542 5 
  1 593 6 
  2 794 7 
  3 445 8 
  4 569 9 
  2 827 10 
  2 898 11 
  3 490 12 
  4 480 13 
  3 384 14 
  4 460 15 
  3 413 16
;
run;
PROC GLM DATA=BATTERY;
 CLASSES TYPEBAT;
 MODEL LIFEUC = TYPEBAT;
 ESTIMATE 'DUTY'
         TYPEBAT   1  1 -1 -1 /DIVISOR = 2;
 ESTIMATE 'BRAND'
         TYPEBAT   1 -1  1 -1 /DIVISOR = 2;
 lsmeans TYPEBAT/cl pdiff adjust=TUKEY alpha=0.05;
 lsmeans TYPEBAT/cl pdiff=control(“2”) adjust=Dunnett;
 lsmeans TYPEBAT/pdiff=controlL(“2”) adjust=Dunnett;
run;