Contigency test: SAS instruction |
|
Procedure demonstrated with an example
The contingency testQuestionnair and survey are common and useful way to collect information. Measurements (variables) in survey are often categorical. For example| gender (F/M) or race (White/Hispanic/African American/Other). If the question is to study the relationship between two variables| a chi-square test is possible if the following assumptions are met:
Analyzing a survey data with contingency testSuppose there is a survey where there are 8 quesetions about profile and political opinions. The survey form is shown here| and the response is recorded in "questionnaire.csv". Suppose the question is to test whether race and opinions for president are related. In the following sections| we first organize the data then run the contingency test. Open the data set from SAS. Or import with the following command. data questionnaire; infile "H:\sas\data\questionnaire.csv" dlm='|' firstobs=2; input age gender race marital education president arm city; run; proc format; value $gender '1' ='Male' '2'='Female' OTHER='Miscoded'; Value $race '1'='White' '2'='African Am.' '3'='Hispanic' '4'='Other'; Value $marital '1'='Single' '2'='Married' '3'='Widowed' '4'='Divorced'; Value $educ '1'='High Scho or Less' '2'='Two Yr. College' '3'='Four Yr. College' '4'='Graduate Degree'; Value opinion 1='Str Disagree' 2='Disagree' 3='No opinion' 4='Agree' 5='Str Agree'; Value agegroup 1='0-20' 2='21-40' 3='41-60' 4='Greater than 60'; run; data questionnaire; infile "H:\sas\data\questionnaire.csv" dlm='|' firstobs=2; input age gender $ race $ marital $ education $ president arm city; IF age GE 0 AND age LE 20 THEN agegroup=1; ELSE IF age GT 20 AND age LE 40 THEN agegroup=2; ELSE IF age GT 40 AND age LE 60 THEN agegroup=3; ELSE IF age GT 60 THEN agegroup = 4; format agegroup agegroup. gender $gender. race $race. marital $marital. education $educ. president arm city opinion.; run; Note that readability can be improved by adding lables for the variables.Futhermore| a new variable agegroup has been defined with age and labled. The agegroup is categorical variable. the IF and ELSE IF statement has a general form as following and can be used to define new variables: IF condition THEN statement; Summarizing table can be formed with the following statement. proc freq data=questionnaire; title "Frequcy Counts for Categorical Variables"; Tables gender race marital education president arm city; run; Now request a contingency test with the SAS proc freq. proc freq data=questionnaire; title "Contingency test for race and president"; tables race*president /CHISQ expected norow nocol nopercent; run; The "CHISQ" reqests a contingency test; the "expected" requests the expected values for checking the assumption; and "norow, nocol, and nopercent" hide the minor results and make the outpu more readable. Checking assumptionsPaired sample t-test assumes that
Reading the outputThe FREQ Procedure Table of race by president race president Frequency | Expected |Str Disa|Disagree|No opini|Agree | Total |gree | |on | | _________________________________________________________ White | 1 | 1 | 1 | 0 | 3 | 0.5| 0.5 | 0.5 | 1.5 | _________________________________________________________ African Am. | 0 | 0 | 0 | 2 | 2 | 0.33| 0.33| 0.33 | 1 | _________________________________________________________ Hispanic | 0 | 0 | 0 | 1 | 1 | 0.17 | 0.17 | 0.17 | 0.5| _________________________________________________________ Total 1 1 1 3 6 Statistics for Table of race by president Statistic DF Value Prob _________________________________________________________ Chi-Square 6 6.0000 0.4232 Likelihood Ratio Chi-Square 6 8.3178 0.2157 Mantel-Haenszel Chi-Square 1 3.0000 0.0833 Phi Coefficient 1.0000 Contingency Coefficient 0.7071 Cramer's V 0.7071 WARNING: 100% of the cells have expected counts less than 5. Chi-Square may not be a valid test. Sample Size = 6 Interpreting the resultIn the "Statistics for Table of race by president", the p-value of contingency test (Chi-Square test) is 0.4232, therefore at a &alphs = 0.05, do not reject the null hypothesis and thus conclude that the race and opinion on President are independent. ' Expected values of each cell are shown in the resuling table. For example, the expected number of White and Strong disagree should be 0.5 given the total count is 3 presons. Theoretically speaking, the example should not be analyzed with the contingency test because of the small sample size. If the question can be modified to a 2 by 2 table, i.e., race (white/non-white) by opinion (agree/disagree), one could consider
An option would be to use the Fisher's test . For further discussion on the methods espacially for 2 by 2 tables, please refer to Campbell (2007).
|
© COPYRIGHT 2010 ALL RIGHTS RESERVED tqin@purdue.edu |