This page contains related R codes for STAT 51100 Section 010. Please get data sets from http://www.stat.purdue.edu/~lingsong/teaching/2017fall/data/. We will use the class data to illustrate materials used in Chapter 1.

The following directly read the data from web. You can also download all the files to your local drive and then run R there.

``classdata<-read.table('http://www.stat.purdue.edu/~lingsong/teaching/2017fall/data/class.txt', header=T, as.is=T)``

You can see class data set by directly type it

``classdata``
``````##       name sex age height weight
## 1    Alice   F  13   56.5   84.0
## 2    Becka   F  13   65.3   98.0
## 3     Gail   F  14   64.3   90.0
## 4    Karen   F  12   56.3   77.0
## 5    Kathy   F  12   59.8   84.5
## 6     Mary   F  15   66.5  112.0
## 7    Sandy   F  11   51.3   50.5
## 8   Sharon   F  15   62.5  112.5
## 9    Tammy   F  14   62.8  102.5
## 10  Alfred   M  14   69.0  112.5
## 11    Duke   M  14   63.5  102.5
## 12   Guido   M  15   67.0  133.0
## 13   James   M  12   57.3   83.0
## 14 Jeffrey   M  13   62.5   84.0
## 15    John   M  12   59.0   99.5
## 16  Philip   M  16   72.0  150.0
## 17  Robert   M  12   64.8  128.0
## 18  Thomas   M  11   57.5   85.0
## 19 William   M  15   66.5  112.0``````

When the data is huge, we may only want to take a look of the first several rows. The following function head will be very useful

``head(classdata)``
``````##    name sex age height weight
## 1 Alice   F  13   56.5   84.0
## 2 Becka   F  13   65.3   98.0
## 3  Gail   F  14   64.3   90.0
## 4 Karen   F  12   56.3   77.0
## 5 Kathy   F  12   59.8   84.5
## 6  Mary   F  15   66.5  112.0``````

In R, another important function will be help, which can be used to find how to use functions. For example

``help(head)``

will return how to use head function.

Now we will go to descriptive statistics and data visualization part. Note that the class data contains names, sex, age, height and weight 5 variables, where name is the id and sex is a categorical data. The other three variables are continuous. We will draw the barchart for sex, and histogram for height. We can also generate stem-and-leaf display in R.

The following generate stem-and-leaf display for the variables weight and height

``stem(classdata\$weight)``
``````##
##   The decimal point is 1 digit(s) to the right of the |
##
##    4 | 1
##    6 | 7
##    8 | 3445508
##   10 | 0332233
##   12 | 83
##   14 | 0``````
``stem(classdata\$height)``
``````##
##   The decimal point is 1 digit(s) to the right of the |
##
##   5 | 1
##   5 | 67789
##   6 | 033344
##   6 | 557779
##   7 | 2``````

The following generates histogram for height.

``hist(classdata\$height, freq=FALSE, xlab='Height', main='Histogram of Height')``

You can specifiy the intervals by using the option breaks. One easy way is to put the number of intervals in. The R code will directly use a nearby number to draw histogram. Or you can specifically set the intervals.

Another histogram function in the lattice package can draw histogram as well

``````library(lattice)
histogram(classdata\$height, xlab='Height', main='Histogram of Height')``````

We can also draw barchart for gender. Before it, we will use table function to return a frequency table.

``table(classdata\$sex)``
``````##
##  F  M
##  9 10``````
``barplot(table(classdata\$sex), xlab='Sex', ylab='Frequency')``

Another barchart function in the lattice package can directly draw a bar chart as well

``barchart(classdata\$sex, horizontal=FALSE, xlab='Sex', ylab='Frequency')``

Several functions return descriptive statistics:

Mean

``mean(classdata\$height)``
``## [1] 62.33684``

Median

``median(classdata\$height)``
``## [1] 62.8``

Trimmed Mean

``mean(classdata\$height, trim=0.1)``
``## [1] 62.41765``

first quartile, median, and the third quartile

``quantile(classdata\$height, c(.25, .5, .75))``
``````##   25%   50%   75%
## 58.25 62.80 65.90``````

Variance

``var(classdata\$height)``
``## [1] 26.2869``

Standard deviation

``sd(classdata\$height)``
``## [1] 5.127075``

We then can draw boxplot directly

``boxplot(classdata\$height, main='Boxplot of Height')``