GOF test with SAS

  1. Explain the question with an example
  2. GOF test procedure

Explain the question with an example

When study on a single categorical variable, a typical question is on inference for proportions. For example, a sample of race is drawn from an area where 4 races are identified, 1 for 'Hispanic'; 2 for 'Asian'; 3 for 'African American' and 4 for 'White'. Let's suppose that the hypothesis is the general population consists of 10% Hispanic, 10% Asian, 10% African American and 70% White folks.

The original data is at "race.csv". Note that it displays races for each subject, but a similar test is also possible when only the frequency counts for each race is available, as demonstrated later in the section.

Here is the standard procedure for GOF test

Open the data set from SAS. Or import with the following command, note that the "$" statement shows race is a categorical, not numeric variable.

 
  data race;
	infile "H:\sas\data\race.csv" dlm=',' firstobs=2;
	input race $;
    run;

Then GOF test can be requested with a "freq" procedure.

   proc freq data = race;
   tables race / chisq testp=(10 10 10 70);
 run;
   

Reading the output of the GOF test

				 The FREQ Procedure

                                                       Cumulative    Cumulative
                  race        Frequency     Percent     Frequency      Percent
          
                  africa            20       10.00            20        10.00
                  asian             11        5.50            31        15.50
                  hispanic          24       12.00            55        27.50
                  white            145       72.50           200       100.00


                                         Chi-Square Test
                                      for Equal Proportions
                                      
                                      Chi-Square   242.4400
                                      DF                  3
                                      Pr > ChiSq     <.0001

                                        Sample Size = 200



These results show that racial composition in our sample does not differ significantly from the hypothesized values that we supplied (chi-square with three degrees of freedom = 5.0286, p = .1697).

Checking assumption for the GOF test

Chi-squre Goodness-of-fit test assumes a reasonably no less than n*p, for example, n*Pr(Hispanic)=n*Pr(Asian)=n*Pr(Africa america)=200(0.1)=20 and n*Pr(White)=200(0.7)=140. All greater than 5, so GOF is good to use.

Run GOF on frequencies

GOF test can also be done when one has only frequencies,as shown in the following table.

 
		race	frequency
		------------------
		1	24
		2	11
		3	20
		4	145

The frequency needs to read into SAS, followed by the proc "freq".

	data chisq;
		input race $ count;
	datalines;
	hispanic 24 
	asian 11
	africa 20
	white 145
	;
 
	proc freq data=chisq;
		tables race / chisq;
		weight count;
	run;

The same output shows up as before.