In this document, we will illustrate how to run z test and t test for materials in Chapter 8 and Chapter 9. The z test is available in a separated package TeachingDemos. You are required to install it first.

install.packages('TeachingDemos')

After you install it, you need to load the package by

library(TeachingDemos)

We have z.test and t.test now to run appropriate test and potential confidence intervals.

### 1. one sample Z test for known variance, or large sample test for mean/population proportion

The following simulate a sample of size 15 from Normal distribution with mean 10, and standard deviation 5. Note that we will assume the standard deviation is given to us.

x<-rnorm(15, mean=10, sd=5)

We will test $$H_0: \mu=11$$ vs $$H_1: \mu \neq 11$$

z.test(x, mu=11, sd=5, alternative="two.sided")
##
##  One Sample z-test
##
## data:  x
## z = -1.0719, n = 15.000, Std. Dev. = 5.000, Std. Dev. of the
## sample mean = 1.291, p-value = 0.2838
## alternative hypothesis: true mean is not equal to 11
## 95 percent confidence interval:
##   7.085856 12.146461
## sample estimates:
## mean of x
##  9.616159

We will use the GPA data set to illustrate the large sample test. The GPA data contains 224 observations and 7 variables. We will focus on the gpa variable in the GPA data. It is not important to check normality since the sample size is large, but we do it anyway to get a better understanding of the data.

gpa<-read.table('http://www.stat.purdue.edu/~lingsong/teaching/2015spring/data/gpa.txt', header=T, as.is=T)
x<-gpa$gpa qqplot(qnorm(ppoints(224), mean=mean(x), sd=sd(x)), sort(x), xlab='Theoretical Quantile', ylab='Sample Quantile', asp=1) abline(0, 1, col=2, lwd=2) This plot suggests that the major part of the data is from normal, but the tails may not be a very nice fit. In the following, we will perform the following hypothesis: $$H_0: \mu=5$$ vs. $$H_1: \mu<5$$ using the standard deviation in the data. z.test(x, mu=5, sd=sd(x), alternative='less') ## ## One Sample z-test ## ## data: x ## z = -7.0048, n = 224.000, Std. Dev. = 0.779, Std. Dev. of the ## sample mean = 0.052, p-value = 1.237e-12 ## alternative hypothesis: true mean is less than 5 ## 95 percent confidence interval: ## -Inf 4.72088 ## sample estimates: ## mean of x ## 4.635223 The z test for testing proportion can be done by prop.test. The following uses similar a Gallup poll for 2000 calls, with 1050 support candidate A. We want to check whether the proportion is the same as .5: $$H_0: p=.5$$ vs. $$H_1: p \neq .5$$. prop.test(x=1050, n=2000, p=.5, alternative='two.sided') ## ## 1-sample proportions test with continuity correction ## ## data: 1050 out of 2000, null probability 0.5 ## X-squared = 4.9005, df = 1, p-value = 0.02685 ## alternative hypothesis: true p is not equal to 0.5 ## 95 percent confidence interval: ## 0.5028373 0.5470658 ## sample estimates: ## p ## 0.525 ### 2. One sample $$t$$ test The following load a class data set, and then perform some one sample comparison. We will use the height as input. class<-read.table('http://www.stat.purdue.edu/~lingsong/teaching/2015spring/data/class.txt', header=T, as.is=T) x<-class$height

We will perform a normality test first.

qqplot(qnorm(ppoints(length(x)), mean=mean(x), sd=sd(x)), sort(x), xlab='Theoretical Quantile', ylab='Sample Quantile', asp=1)
abline(0, 1, col=2, lwd=2)

It shows that normal assumption is satisfied. Then we will test whether the height is taller than 60. $$H_0: \mu=60$$ vs. $$H_1: \mu>60$$.

t.test(x, mu=60, alternative='greater')
##
##  One Sample t-test
##
## data:  x
## t = 1.9867, df = 18, p-value = 0.0312
## alternative hypothesis: true mean is greater than 60
## 95 percent confidence interval:
##  60.29718      Inf
## sample estimates:
## mean of x
##  62.33684

### 3 Two-sample t test

Let us use the class data to compare whether boys and girls on average have the same height or not. Let $$\mu_1$$ be the average height of the boys, and $$\mu_2$$ be the average height of the girls, the hypotheses are: $$H_0: \mu_1-\mu_2=0$$ vs. $$H_1: \mu_1-\mu_2 \neq 0$$. We will check the normality of both groups.

x1<-x[class$sex=='M'] x2<-x[class$sex=='F']
qqplot(qnorm(ppoints(length(x1)), mean=mean(x1), sd=sd(x1)), sort(x1), xlab='Theoretical Quantile', ylab='Sample Quantile', main='Boy heights', asp=1)
abline(0, 1, col=2, lwd=2)

qqplot(qnorm(ppoints(length(x2)), mean=mean(x2), sd=sd(x2)), sort(x2), xlab='Theoretical Quantile', ylab='Sample Quantile', main='Girl heights', asp=1)
abline(0, 1, col=2, lwd=2)