One sample t-test with SAS

  1. Explain the question with an example
  2. One sample t-test procedure

Explain the question with an example

When study on a single continuous variable, one can ask two questions in general:

  • Estimating from the sample mean, what is the confidence interval of the population mean?
  • Based on the sample, how to test a hypothesis made on population mean.

The first is a confidence interval problem and the second is a hypothesis test problem. Here is an example, college-aged adults need at least 7 hours of sleep each night to stay healthy. Sleep deprivation can lead to decreased immune system function, lack of concentration, and poor memory. In the data set sleep.csv, a simple random sample of college students reports the number of hours of sleep they had last night.

  • What is a 90% confidence interval for the population average sleeping time, based on the sample?
  • Is there evidence that the true population mean hours of sleep for college students in the population is different from the 7 hours that are recommended?

Bothe problem can be answered with the same SAS procedure, ttest.

Here is the standard procedure

Open the data set from SAS. Or import with the following command.

 
  data sleeptime;
	infile "H:\sas\data\sleep.csv" dlm=',' firstobs=2;
	input time;
    run;

Then ttest can be requested as following.

 proc ttest data=sleeptime sides=2 alpha=0.1  h0=7;
      var time;
   run;
   

The VAR statement indicates that the time variable is being studied, while the h0= option specifies that the mean of the time variable should be compared to the null value 7 rather than the default of 0. The SIDES=2 option reflects the focus of the research question: whether the mean sleep time is different than 7 hours, rather than less than 7 hours (in which case you would set SIDES=L or U for one-sided test or one-sided confidence interval). The ALPHA=0.1 option requests a signifcance level of 0.1 or a confidence level of 90%. The defalut significance level is 0.05.

Reading the output of the t test

				The TTEST Procedure

                                         Variable:  time

                   N        Mean     Std Dev     Std Err     Minimum     Maximum

                  30      6.4500      1.4839      0.2709      3.5000      9.0000

                     Mean       90% CL Mean        Std Dev      90% CL Std Dev

                   6.4500      5.9897   6.9103      1.4839      1.2249   1.8989

                                      DF    t Value    Pr > |t|

                                      29      -2.03      0.0516
                        

Interpreting the result of the t-test

Summary statistics appear at the top of the output. The sample size (N), mean, standard deviation, and standard error are displayed with the minimum and maximum values of the time variable. The 90% confidence limits for the mean and standard deviation are shown next. Due to the SIDES=2 option, the interval for the mean is an double-sided interval (5.9897, 6.9103) hours.

At the bottom of the output are the degrees of freedom, statistic value, and p-value for the test. At the 0.1 significance level, this test indicates that the mean sleep time is significantly different than 7 hours (t=-2.03 and p-value=0.0516<0.1).

Checking assumption for the t-test

One sample t-test assumes normality. A UNIVARIATE procedure with the NORMAL option to numerically check the normality assumptions

On the circumstance that data is not normally distributed. An alternative test of one sample median test can be used, in which case we test the median (not mean) sleeping time is different than 7 hours. This is done with the loccount option on the proc univariate, as shown below.

	proc univariate data=sleeptime loccount mu0 = 7;
	var time;
	run;

From the output:

		Tests for Location: Mu0=7

                        Test           -Statistic-    -----p Value------

                        Student's t    t  -2.03013    Pr > |t|    0.0516
                        Sign           M        -5    Pr >= |M|   0.0987
                        Signed Rank    S       -89    Pr >= |S|   0.0645


                                   Location Counts: Mu0=7.00

                                   Count                Value

                                   Num Obs > Mu0           10
                                   Num Obs ^= Mu0          30
                                   Num Obs < Mu0           20


The test on locations compare the number of times that are shorter and greater than 7 hours, and test if the former number is significantly different (greater/lower) than the latter. Both sign test and signed rank test can be used here, and both tests are displayed together with t-test.

Specifically, the p-value of sign test is 0.0987, at a significance level of 0.1, the hypothesis is rejected, meaning the number of students who sleep less than 7 hours is different than those who sleep more than 7 hours.

Furthermore, the p-value of Signed Rank test is 0.0645, at a significance level of 0.1, the hypothesis is rejected, meaning the average sleeping time of students who sleep less than 7 hours is different than the average sleeping time of students sleep more than 7 hours. Comparing to sign test, signed rank test compares the average sleeping time, rather than the number of students who sleep less or more than 7 hours.