Contigency test: SAS instruction |
|
|
Procedure demonstrated with an example
The contingency testQuestionnair and survey are common and useful way to collect information. Measurements (variables) in survey are often categorical. For example| gender (F/M) or race (White/Hispanic/African American/Other). If the question is to study the relationship between two variables| a chi-square test is possible if the following assumptions are met:
Analyzing a survey data with contingency testSuppose there is a survey where there are 8 quesetions about profile and political opinions. The survey form is shown here| and the response is recorded in "questionnaire.csv". Suppose the question is to test whether race and opinions for president are related. In the following sections| we first organize the data then run the contingency test. Open the data set from SAS. Or import with the following command.
data questionnaire;
infile "H:\sas\data\questionnaire.csv" dlm='|' firstobs=2;
input age gender race marital education president arm city;
run;
proc format;
value $gender '1' ='Male'
'2'='Female'
OTHER='Miscoded';
Value $race '1'='White'
'2'='African Am.'
'3'='Hispanic'
'4'='Other';
Value $marital '1'='Single'
'2'='Married'
'3'='Widowed'
'4'='Divorced';
Value $educ '1'='High Scho or Less'
'2'='Two Yr. College'
'3'='Four Yr. College'
'4'='Graduate Degree';
Value opinion 1='Str Disagree'
2='Disagree'
3='No opinion'
4='Agree'
5='Str Agree';
Value agegroup 1='0-20'
2='21-40'
3='41-60'
4='Greater than 60';
run;
data questionnaire;
infile "H:\sas\data\questionnaire.csv" dlm='|' firstobs=2;
input age gender $ race $ marital $ education $ president arm city;
IF age GE 0 AND age LE 20 THEN agegroup=1;
ELSE IF age GT 20 AND age LE 40 THEN agegroup=2;
ELSE IF age GT 40 AND age LE 60 THEN agegroup=3;
ELSE IF age GT 60 THEN agegroup = 4;
format agegroup agegroup.
gender $gender.
race $race.
marital $marital.
education $educ.
president arm city opinion.;
run;
Note that readability can be improved by adding lables for the variables.Futhermore| a new variable agegroup has been defined with age and labled. The agegroup is categorical variable. the IF and ELSE IF statement has a general form as following and can be used to define new variables: IF condition THEN statement; Summarizing table can be formed with the following statement. proc freq data=questionnaire; title "Frequcy Counts for Categorical Variables"; Tables gender race marital education president arm city; run; Now request a contingency test with the SAS proc freq. proc freq data=questionnaire; title "Contingency test for race and president"; tables race*president /CHISQ expected norow nocol nopercent; run; The "CHISQ" reqests a contingency test; the "expected" requests the expected values for checking the assumption; and "norow, nocol, and nopercent" hide the minor results and make the outpu more readable. Checking assumptionsPaired sample t-test assumes that
Reading the output
The FREQ Procedure
Table of race by president
race president
Frequency |
Expected |Str Disa|Disagree|No opini|Agree | Total
|gree | |on | |
_________________________________________________________
White | 1 | 1 | 1 | 0 | 3
| 0.5| 0.5 | 0.5 | 1.5 |
_________________________________________________________
African Am. | 0 | 0 | 0 | 2 | 2
| 0.33| 0.33| 0.33 | 1 |
_________________________________________________________
Hispanic | 0 | 0 | 0 | 1 | 1
| 0.17 | 0.17 | 0.17 | 0.5|
_________________________________________________________
Total 1 1 1 3 6
Statistics for Table of race by president
Statistic DF Value Prob
_________________________________________________________
Chi-Square 6 6.0000 0.4232
Likelihood Ratio Chi-Square 6 8.3178 0.2157
Mantel-Haenszel Chi-Square 1 3.0000 0.0833
Phi Coefficient 1.0000
Contingency Coefficient 0.7071
Cramer's V 0.7071
WARNING: 100% of the cells have expected counts less
than 5. Chi-Square may not be a valid test.
Sample Size = 6
Interpreting the resultIn the "Statistics for Table of race by president", the p-value of contingency test (Chi-Square test) is 0.4232, therefore at a &alphs = 0.05, do not reject the null hypothesis and thus conclude that the race and opinion on President are independent. ' Expected values of each cell are shown in the resuling table. For example, the expected number of White and Strong disagree should be 0.5 given the total count is 3 presons. Theoretically speaking, the example should not be analyzed with the contingency test because of the small sample size. If the question can be modified to a 2 by 2 table, i.e., race (white/non-white) by opinion (agree/disagree), one could consider
An option would be to use the Fisher's test . For further discussion on the methods espacially for 2 by 2 tables, please refer to Campbell (2007).
|
| © COPYRIGHT 2010 ALL RIGHTS RESERVED tqin@purdue.edu |