One way ANOVA: SAS instruction |
|
Procedure demonstrated with an example
The example and dataT-test can be used to compare two conditions in an experiment. But when there are three and more conditions, t-tests are not appropiate. For example, three reading instructions are given to 15 jubjects; then a reading test is given where the number of words per minute is recorded for each subject. The question is to test whether the three instructions makes any difference to the reading score. Since the test is for the global effect, i.e., "any difference among method A, B and C", it is sometimes known as the global test. After the global effect is confirmed, further test are needed to check what the differences are, i.e, "A greater than B or C". The test is known as multiple comparison. The data is "words.csv". Setting up the dataOpen the data set from SAS. Or import with the following command. data words; infile "H:\sas\data\words.csv" dlm=',' firstobs=2; input word method $; run; Analyzing the data, syntaxproc ANOVA data=words; title Example of one-way ANOVA; class method; model word = method; means method /hovtest welch; run; The "means" function will generate the mean value of the dependent variable ("word"); the "hovtest" option is to check assumptions for homogeneity of variances and the "welch" option is to perform Welch's test when the assumptions are not met. More details on assumption checking is given below. Reading the output of the global testExample of one-way ANOVA The ANOVA Procedure Dependent Variable: word Sum of Source DF Squares Mean Square F Value Pr > F Model 2 215613.3333 107806.6667 16.78 0.0003 Error 12 77080.0000 6423.3333 Corrected Total 14 292693.3333 R-Square Coeff Var Root MSE word Mean 0.736653 12.98256 80.14570 617.3333 Source DF Anova SS Mean Square F Value Pr > F method 2 215613.3333 107806.6667 16.78 0.0003 The means of three methods are also available as following The ANOVA Procedure Level of -------------word------------ method N Mean Std Dev A 5 786.000000 113.929803 B 5 518.000000 54.037024 C 5 548.000000 58.051701 Interpreting the result of the global testBased on the data, conduct a hypothesis test (with a 0.05 significance level) to see if there is evidence that the words per minute is significantly different among the three treatment groups. The output "source" shows source of variances are considered in the date, where "model" means effects of all of the independent variables (in this case the effect of the method). If there are more than one independent variable, for example, method and gender, to consider, the "model" part not only include the effect of individual variable but also their interactions. "Error" means what cannot be explained by the independent variables. In this case is the variance in the word-counting after the effect of method has been removed. One way ANOVA is based on F-distribution and the F test statistics value is 16.78 with a P-value of 0.0003. Since the p-value is less that 0.05, the concludion should be that the reading-instruction methods wre not all equivalent. Note that the above test is for the global effect for the question of "any difference". In order to test "what are the difference", one need to perform multiple comparison, as shown next. Checking assumptions for the global testANOVA test requires that
Among the assumptions, variable types and experiment design can be checked on going over experimental designs. The Normality can not checked with Univariate proc, and ANOVA is relatively robust even when data is not Normally distributed. The assumption of equal variances (homogeneity of variances) can be checked with "hovtest" option. Result is shown below: The ANOVA Procedure Levene's Test for Homogeneity of word Variance ANOVA of Squared Deviations from Group Means Sum of Mean Source DF Squares Square F Value Pr > F method 2 2.0668E8 1.0334E8 3.74 0.0545 Error 12 3.3121E8 27601053 As shown, the p-value is 0.0545 which is at the border line. There could be two possible solutions: with a significant level of no more than 0.05, there is no evidence to reject the hypothsis of Homogeneity; but with a significant level bigger than 0.0535 (such as 0.1), reject the hypothesis of Homogeneity. At the latter case, one should refer to the result of Welch's test, as shown below: Welch's ANOVA for word Source DF F Value Pr > F method 2.0000 10.52 0.0065 Error 7.5552 Recall that p-value from regular ANOVA (with Homogeneity) is 0.0003; while p-value (without Homogeneity) is 0.0065. At a significance level of 0.05, the conclusions are the same that method makes a significance on the results. multiple comparison, syntaxIn general, methods used to find group differences after the global test is called multiple comparsion tests, or post hoc. SAS provides a variety of tests to investigate differences between levels of the independent variables. For example, Duncan's multiple-range test, the "Student-Newman-Keuls' multiple-range test, least-significant-difference test, Tukey'sstudentized range test, Scheffe's multiple-comparison procedure, and others, each has a SAS function name (e.g., DUNCAN, SNK, LSD, TUKEY and SCHEFFE). To request a multiple comparison test, place the SAS option name for the test you want, following a slash (/) on the "means" statement. In practice, it is easier to include the request for a multiple comparison test at the same time as the global test. But note that the multiple test result should be checked only after the global effect has been confirmed . For example, if use the Student-Newman-Keuls (SNK) test, the syntax (with a significant level of 0.05) are: proc ANOVA data=words; title Example of one-way ANOVA; class method; model word = method; means method / SNK alpha=0.05; run; Reading the output of the multiple comparisonExample of one-way ANOVA The ANOVA Procedure Student-Newman-Keuls Test for word NOTE: This test controls the Type I experimentwise error rate under the complete null hypothesis but not under partial null hypotheses. Alpha 0.05 Error Degrees of Freedom 12 Error Mean Square 6423.333 Number of Means 2 3 Critical Range 110.44095 135.23025 Means with the same letter are not significantly different. SNK Grouping Mean N method A 786.00 5 A B 548.00 5 C B B 518.00 5 B Interpreting the result of the multiple comparisonUnder the "SNK grouping" column, same letter means no significant effect. For example, the C and B groups both have the letter 'B' in the grouping column and therefore not significantly different. Groups A has a letter 'A' and is therefore significantly different (p-value<0.05) from the C and B groups. Hence we conclude that method A is uperior to both methods B and C; and methods B and C are not significantly different. |
© COPYRIGHT 2010 ALL RIGHTS RESERVED tqin@purdue.edu |