Contingency procedure demonstrated with an example

  1. The contingency test
  2. Analyzing data with contingency test
  3. Output, interpretation and assumption checking

The contingency test

Contingency table test is used when both dependent and independent variables are categorical. It is usually used to check relationship between two variables.

  1. No more than 20% of the expcted value for each cell is less than 5, otherwise Fisher's exact test(discussed at the end) should be used.
  2. Samples must be independent, for example, when checking the gender (Female/Male) effect with some opinions (yes/no), the female and male must be independently selected. On the other hand, if the female and male are dependent (as in husbands and wives), the McNemar's test should be used.

Analyzing simple data with counts

When checking the opinion and gender, a data is generated as opinion.csv.

A common situation is when counts are availiable for each categories, for example, if the frequencies are given below,

Frequency table:
Counts gender total
female male
opinionyes5550105
no6580105
total120130250

Then one can use the counts, rather than the original data files to run the contingency test. The counts can be inputed and analyzed as below.

	data simple;
	input opinion $ gender $ count;
	datalines;
	yes female 55
	yes male 50
	no female 65
	no male 80
	;
	
	proc freq data=simple;
	tables opinion*gender / chisq nocol norow nopercent expected;
	weight count;
	run;

The "chisq" option requests a chi-squre test, and "nocol", "norow", "noprecent" simplify the output and "expected" requests the expected values.

The "weight" statement tells the precedure how many subjects there are for each combination of gender and opinion.

When the counts are not given, a similar test can done as below, but results remain the same.

data simple2;
	infile "H:\sas\data\opinion.csv" dlm=',' firstobs=2;
	input  opinion $ gender $;
	run;

proc freq data=simple2;
	tables opinion*gender / chisq nocol norow nopercent expected;
	run;	

Output, interpretation and assumption checking

 The FREQ Procedure

                                    The FREQ Procedure

                                   Table of opinion by gender

                               opinion     gender

                               Frequency
                               Expected female  male      Total
                               -----------------------------------
                               no            65     80     145
                                           69.6    75.4 
                               -----------------------------------
                               yes           55      50     105
                                           50.4    54.6 
                               -----------------------------------
                               Total         120      130      250


                            Statistics for Table of opinion by gender

                     Statistic                     DF       Value      Prob
                     ------------------------------------------------------
                     Chi-Square                     1      1.3920    0.2381
                     Likelihood Ratio Chi-Square    1      1.3926    0.2380
                     Continuity Adj. Chi-Square     1      1.1059    0.2930
                     Mantel-Haenszel Chi-Square     1      1.3865    0.2390
                     Phi Coefficient                      -0.0746
                     Contingency Coefficient               0.0744
                     Cramer's V                           -0.0746


                                      Fisher's Exact Test
                               ----------------------------------
                               Cell (1,1) Frequency (F)        65
                               Left-sided Pr <= F          0.1465
                               Right-sided Pr >= F         0.9046

                               Table Probability (P)       0.0511
                               Two-sided Pr <= P           0.2508

                                        Sample Size = 250


The results include a contingency table with observed and expected values. A list of tests are performed where the first one is the classic Chi-square test. With a big p-value of 0.2381 we do not reject the hypothesis so gender and opinion is not associated.

Note that the expected values for each combination is big(69.6, 75.4, 50.4, 54.6) so the assumption is met and conclusions are sound. Otherwise, we should use Fisher's exact test in the end of the output where p-value is 0.2508.