GOF test with SASExplain the question with an exampleWhen study on a single categorical variable, a typical question is on inference for proportions. For example, a sample of race is drawn from an area where 4 races are identified, 1 for 'Hispanic'; 2 for 'Asian'; 3 for 'African American' and 4 for 'White'. Let's suppose that the hypothesis is the general population consists of 10% Hispanic, 10% Asian, 10% African American and 70% White folks. The original data is at "race.csv". Note that it displays races for each subject, but a similar test is also possible when only the frequency counts for each race is available, as demonstrated later in the section. Here is the standard procedure for GOF testOpen the data set from SAS. Or import with the following command, note that the "$" statement shows race is a categorical, not numeric variable. data race; infile "H:\sas\data\race.csv" dlm=',' firstobs=2; input race $; run; Then GOF test can be requested with a "freq" procedure. proc freq data = race; tables race / chisq testp=(10 10 10 70); run; Reading the output of the GOF testThe FREQ Procedure Cumulative Cumulative race Frequency Percent Frequency Percent africa 20 10.00 20 10.00 asian 11 5.50 31 15.50 hispanic 24 12.00 55 27.50 white 145 72.50 200 100.00 Chi-Square Test for Equal Proportions Chi-Square 242.4400 DF 3 Pr > ChiSq <.0001 Sample Size = 200 These results show that racial composition in our sample does not differ significantly from the hypothesized values that we supplied (chi-square with three degrees of freedom = 5.0286, p = .1697). Checking assumption for the GOF testChi-squre Goodness-of-fit test assumes a reasonably no less than n*p, for example, n*Pr(Hispanic)=n*Pr(Asian)=n*Pr(Africa america)=200(0.1)=20 and n*Pr(White)=200(0.7)=140. All greater than 5, so GOF is good to use. Run GOF on frequenciesGOF test can also be done when one has only frequencies,as shown in the following table. race frequency ------------------ 1 24 2 11 3 20 4 145 The frequency needs to read into SAS, followed by the proc "freq". data chisq; input race $ count; datalines; hispanic 24 asian 11 africa 20 white 145 ; proc freq data=chisq; tables race / chisq; weight count; run; The same output shows up as before.
© COPYRIGHT 2010 ALL RIGHTS RESERVED tqin@purdue.edu |