|
|
Contingency procedure demonstrated with an example
The contingency testContingency table test is used when both dependent and independent variables are categorical. It is usually used to check relationship between two variables.
Analyzing simple data with countsThe easiest way to carry contingency test is when counts are availiable for each categories, for example, if the frequencies are given below, Frequency table:
Then one can use the counts, rather than the original data files to run the contingency test. The counts can be inputed and analyzed as below. data simple; input opinion $ gender $ count; datalines; yes female 55 yes male 50 no female 65 no male 80 ; proc freq data=simple noprint; tables opinion*gender / chisq nocol norwo nopercent expected; weight count; run; The "chisq" option requests a chi-squre test, and "nocol", "norow", "noprecent" simplify the output and "expected" requests the expected values. The "weight" statement tells the precedure how many subjects there are for each combination of gender and opinion.
The FREQ Procedure
The FREQ Procedure
Table of opinion by gender
opinion gender
Frequency
Expected female male Total
-----------------------------------
no 65 80 145
69.6 75.4
-----------------------------------
yes 55 50 105
50.4 54.6
-----------------------------------
Total 120 130 250
Statistics for Table of opinion by gender
Statistic DF Value Prob
------------------------------------------------------
Chi-Square 1 1.3920 0.2381
Likelihood Ratio Chi-Square 1 1.3926 0.2380
Continuity Adj. Chi-Square 1 1.1059 0.2930
Mantel-Haenszel Chi-Square 1 1.3865 0.2390
Phi Coefficient -0.0746
Contingency Coefficient 0.0744
Cramer's V -0.0746
Fisher's Exact Test
----------------------------------
Cell (1,1) Frequency (F) 65
Left-sided Pr <= F 0.1465
Right-sided Pr >= F 0.9046
Table Probability (P) 0.0511
Two-sided Pr <= P 0.2508
Sample Size = 250
The results include a contingency table with observed and expected values. A list of tests are performed where the first one is the classic Chi-square test. With a big p-value of 0.2381 we do not reject the hypothesis so gender and opinion is not associated. Note that the expected values for each combination is big(69.6, 75.4, 50.4, 54.6) so the assumption is met and conclusions are sound. Otherwise, we should use Fisher's exact test in the end of the output where p-value is 0.2508. Analyzing comprehensive questionanir data with contingency test, without knowing the countsAn important application of contingency test is to analyze questionnair and survey data where measurements (variables) are often categorical. Suppose there is a survey where there are 8 quesetions about profile and political opinions. The survey form is shown here, and the response is recorded in "questionnaire.csv". Suppose the question is to test whether race and opinions for president are related. In the following sections, we first organize the data then run the contingency test. Open the data set from SAS. Or import with the following command.
data questionnaire;
infile "H:\sas\data\questionnaire.csv" dlm='|' firstobs=2;
input age gender race marital education president arm city;
run;
proc format;
value $gender '1' ='Male'
'2'='Female'
OTHER='Miscoded';
Value $race '1'='White'
'2'='African Am.'
'3'='Hispanic'
'4'='Other';
Value $marital '1'='Single'
'2'='Married'
'3'='Widowed'
'4'='Divorced';
Value $educ '1'='High Scho or Less'
'2'='Two Yr. College'
'3'='Four Yr. College'
'4'='Graduate Degree';
Value opinion 1='Str Disagree'
2='Disagree'
3='No opinion'
4='Agree'
5='Str Agree';
Value agegroup 1='0-20'
2='21-40'
3='41-60'
4='Greater than 60';
run;
data questionnaire;
infile "H:\sas\data\questionnaire.csv" dlm='|' firstobs=2;
input age gender $ race $ marital $ education $ president arm city;
IF age GE 0 AND age LE 20 THEN agegroup=1;
ELSE IF age GT 20 AND age LE 40 THEN agegroup=2;
ELSE IF age GT 40 AND age LE 60 THEN agegroup=3;
ELSE IF age GT 60 THEN agegroup = 4;
format agegroup agegroup.
gender $gender.
race $race.
marital $marital.
education $educ.
president arm city opinion.;
run;
Note that readability can be improved by adding lables for the variables.Futhermore| a new variable agegroup has been defined with age and labled. The agegroup is categorical variable. the IF and ELSE IF statement has a general form as following and can be used to define new variables: IF condition THEN statement; Summarizing table can be formed with the following statement. proc freq data=questionnaire; title "Frequcy Counts for Categorical Variables"; Tables gender race marital education president arm city; run; Now request a contingency test with the SAS proc freq. proc freq data=questionnaire; title "Contingency test for race and president"; tables race*president /CHISQ expected norow nocol nopercent; run; The "CHISQ" reqests a contingency test; the "expected" requests the expected values for checking the assumption; and "norow, nocol, and nopercent" hide the minor results and make the outpu more readable. Checking assumptionsPaired sample t-test assumes that
Reading the output
The FREQ Procedure
Table of race by president
race president
Frequency |
Expected |Str Disa|Disagree|No opini|Agree | Total
|gree | |on | |
_________________________________________________________
White | 1 | 1 | 1 | 0 | 3
| 0.5| 0.5 | 0.5 | 1.5 |
_________________________________________________________
African Am. | 0 | 0 | 0 | 2 | 2
| 0.33| 0.33| 0.33 | 1 |
_________________________________________________________
Hispanic | 0 | 0 | 0 | 1 | 1
| 0.17 | 0.17 | 0.17 | 0.5|
_________________________________________________________
Total 1 1 1 3 6
Statistics for Table of race by president
Statistic DF Value Prob
_________________________________________________________
Chi-Square 6 6.0000 0.4232
Likelihood Ratio Chi-Square 6 8.3178 0.2157
Mantel-Haenszel Chi-Square 1 3.0000 0.0833
Phi Coefficient 1.0000
Contingency Coefficient 0.7071
Cramer's V 0.7071
WARNING: 100% of the cells have expected counts less
than 5. Chi-Square may not be a valid test.
Sample Size = 6
Interpreting the resultIn the "Statistics for Table of race by president", the p-value of contingency test (Chi-Square test) is 0.4232, therefore at a &alphs = 0.05, do not reject the null hypothesis and thus conclude that the race and opinion on President are independent. ' Expected values of each cell are shown in the resuling table. For example, the expected number of White and Strong disagree should be 0.5 given the total count is 3 presons. Theoretically speaking, the example should not be analyzed with the contingency test because of the small sample size. If the question can be modified to a 2 by 2 table, i.e., race (white/non-white) by opinion (agree/disagree), one could consider
An option would be to use the Fisher's test . For further discussion on the methods espacially for 2 by 2 tables, please refer to Campbell (2007).
|
|||||||||||||||||||||
| © COPYRIGHT 2010 ALL RIGHTS RESERVED tqin@purdue.edu |