Two sample t-test with SAS

  1. Idea and demo example
  2. Assumptions
  3. Compare two independent samples with t-test

Idea and demo example

The idea of two sample t-test is to compare two population averages by comparing two independent samples. A common experiment design is to have a test and control conditions and then randomly assign a subject into either one. One variable to be measured and compared between two conditions (samples).

Suppose there is a study to compare two study methods and see how they improve the grades differently. There is a new method (treament, or t) and a standard method (control, or c). Users will be randomly assigned either one method. After they are trained with the method, their performance is measured as grades. The data set is “reading.csv”. The problem is to test whether the two methods make a difference? The model you can set up for this problem is

          Grade (continuous) ~ method (categorical: 2 levels)

Open the data set from SAS.

data read;
	infile "H:\sas\data\reading.csv" dlm=',' firstobs=2;
	input method $ grade;
    run;

Checking assumptions

Two sample t-test assumes that

  1. There is one continuous dependent variable and one categorical independent variable (with 2 levels);
  2. The two samples are independent;
  3. The two samples follow normal distributions, and can be done with Normality check.

When the assumptions are not met, other methods are possible based on the two samples:

  • Two dependent samples and follow Normal distribution, suggest Paired T-test;
  • Two independent samples and does not follow Normal distribution, suggest WMW test;
  • Two dependent samples and does not follow Normal distribution, suggest Signed Rank test;

In this demo example, two samples (control and treatment) are independent, and pass the Normality check. So we continue with two sample t-test. Note that the test is two-sided (sides=2), the significance level is 0.05, and the test is to compare the difference between two means (mu1 - mu2) against 0 (h0=0).

Compare two independent samples

 
proc ttest data=read sides=2 alpha=0.05 h0=0;
 	title "Two sample t-test example";
 	class method; 
	var grade;
   run;

Reading the output

          two sample t example                                                                                                       

                                       The TTEST Procedure
 
                                        Variable:  Grade

          Method         N        Mean     Std Dev     Std Err     Minimum     Maximum

          control        5     88.6000      7.3007      3.2650     80.0000     98.0000
          treatment      5       101.6      2.0736      0.9274     99.0000       104.0
          Diff (1-2)          -13.0000      5.3666      3.3941                        

  Method        Method               Mean       95% CL Mean        Std Dev      95% CL Std Dev

  control                         88.6000     79.5350  97.6650      7.3007      4.3741  20.9789
  treatment                         101.6     99.0252    104.2      2.0736      1.2424   5.9587
  Diff (1-2)    Pooled           -13.0000    -20.8268  -5.1732      5.3666      3.6249  10.2811
  Diff (1-2)    Satterthwaite    -13.0000    -21.9317  -4.0683                                 

                   Method           Variances        DF    t Value    Pr > |t|

                   Pooled           Equal             8      -3.83      0.0050
                   Satterthwaite    Unequal      4.6412      -3.83      0.0141

                                      Equality of Variances
 
                        Method      Num DF    Den DF    F Value    Pr > F

                        Folded F         4         4      12.40    0.0318



Note that the results show both "Pooled" and "Satterthwaite" sections, which is based on sample variances check. The test on Equality of Variances is given at the end, and is repeated below,

						Equality of Variances
 
                        Method      Num DF    Den DF    F Value    Pr > F

                        Folded F         4         4      12.40    0.0318
 
 
 

Some people use the simple rule here:

  • When the p-value (shown under "Pr>F") is greater than 0.05, then the variances are equal then read the "Pooled" section of the result
  • When the p-value (shown under "Pr>F") is no more than 0.05, then the variances are unqueal then read the "Satterthwaite" section of the result

In this example, the p-value = 0.0318 < 0.05, so we should read the "Satterthwaite" section. For example

  • For confidence interval for control-treatment = (-21.9317 -4.0683)
  • For the hypothesis of comparing control and treatment, t-value=-3.83, and the p-value is 0.0141.

The conclusion is to reject the null hypothesis and that the the reading grade of two methods are significantly different.

Note that SAS perform a two-sided test, meaning the hypothesis is to compare a significant difference between two groups. If one wants to test whether one group is greater(smaller) than the other, p-value can be divided by 2. For example, the p-value/2=0.0141/2=0.007 < 0.05, hence the concluse for the one side test is to reject the hypothesis and therefore the new method improve the grading score.