Introduction To Computing With Data
STAT 490M, Fall 2009

Lecture: MWF, 9:30 AM -- 10:20 AM, in REC 315
(STAT 49000-005; Banner CRN 37884)

Computer Laboratory: Tues, 9:30 AM -- 10:20 AM, in SC 183
(STAT 49000-006; Banner CRN 37891)

Professor: Mark Daniel Ward
Email: mdw@purdue.edu
Office: MATH 540
Phone: 765-496-9563
Office hours: MTWF, 7:30 AM -- 8:20 AM, in MATH 540

Grader: InKyung Choi
Email: ichoi@stat.purdue.edu
Office: MATH G171
Phone: 765-496-3049
Office hours: Tuesdays and Thursdays, 2:00 PM -- 2:50 PM, in MATH G171


This course is based on the topics advocated at the recent Workshop on Integrating Computing into the Statistics Curricula, organized by Mark Hansen (UCLA), Deborah Nolan (UC Berkeley), and Duncan Temple Lang (UC Davis), and sponsored by the National Science Foundation and CAUSE. Dr. Ward acknowledges and thanks the organizers for sharing their materials from previous courses they have taught on similar topics.

This semester we will use the R platform for visualizing data.

Some interesting websites for data are given here. Additional sources of data or data representations are very welcome: Course description: click here

Course policy: click here

Homework: (subject to small changes)
Outline of Topics
Week 1: Mon, Aug 24
Day 1 lecture
An overview of the course;
how to get R software;
how to start using R software.
Tues, Aug 25 (Lab)
Assigned Project 1
Wed, Aug 26
Day 3 lecture
Help features in R,
getting familiar with
variables and functions.
Watched the video
called "Did You Know?"
Thu, Aug 27
Fri, Aug 28
Day 4 lecture
seq, rep, parameters,
Booleans, NA, NaN,
ways to index a vector
Week 2: Mon, Aug 31
Day 5 lecture
Factors and tapply,
with biological examples.
Tues, Sep 1 (Lab)
Assigned Project 2
Wed, Sep 2
Day 7 lecture
More examples about
factors and tapply
Thu, Sep 3
Fri, Sep 4
Day 8 lecture
Overview: matrix, array,
data.frame, and list.
Also made a map of Indiana
with colors for class residents:
indianamaps.R code
indiana.jpg picture
Week 3: Mon, Sep 7 (no lecture)
Labor Day
Tues, Sep 8 (Lab)
Assigned Project 3
Wed, Sep 9
(no class)
Thu, Sep 10
Fri, Sep 11
Day 10 lecture
Reading data from external files,
and managing graphics.
Sample files:
anotherdatafile.txt
mydatafile.txt
trythis.dat
trythis2.dat
Week 4: Mon, Sep 14
discussed pages 28-41 (from Ch 2)
and all of Chapters 3 and 4
in Creating More Effective Graphs
by Naomi B. Robbins (Wiley, 2005)
Tues, Sep 15 (Lab)
Assigned Project 4
Wed, Sep 16
discussed selections from
The Elements of Graphing Data
by William S. Cleveland
(Hobart Press, 2004)
Thu, Sep 17
Fri, Sep 18
discussed selections from
The Elements of Graphing Data
by William S. Cleveland
(Hobart Press, 2004)
Week 5: Mon, Sep 21
(no class)
Tues, Sep 22 (Lab)
Assigned Project 5
Wed, Sep 23
Day 16 lecture
Examples relevant to Project 5
Examples about a subset of a data.frame.
Thu, Sep 24
Fri, Sep 25
Day 17 lecture
More examples relevant to Project 5.
We also introduce the apply function.
Week 6: Mon, Sep 28
Day 18 lecture
We prepare to work with baby names.
We extract data from the web
directly in R. We also write our
first function, which is used to
remove commas from a string.
Tues, Sep 29 (Lab)
An extra day to work
on Project 5.
Wed, Sep 30
Day 20 lecture
We parse baby name data.
We also use grep and nchar to
explore the 2008 baby names.
Thu, Oct 1
dinner at
Ward family home
Fri, Oct 2
Day 21 lecture
Assigned Project 6
We download and parse all
129 years of baby name data,
and begin to explore it too.
We use sapply, lapply, agrep, unique
functions for the first time.
Week 7: Mon, Oct 5
Day 22 lecture
Some example functions: ISBN,
Faro shuffle, Monty Hall,
and a mystery problem
Tues, Oct 6 (Lab)
Wed, Oct 7
Day 24 lecture
An introduction to UNIX

Also see:
Learning the Unix Operating System
Fifth Edition by Jerry Peek,
Grace Todino-Gonguet, and John Strang
(O'Reilly, 2001)
Thu, Oct 8
Fri, Oct 9
(no class)
Week 8: Mon, Oct 12 (no lecture)
October Break
Tues, Oct 13 (no lecture)
October Break
Wed, Oct 14
Day 25 lecture
Some examples of UNIX utilities

See: List of UNIX utilities
on Wikipedia

Here are a few of these that
you might find particularly helpful:
short list
Thu, Oct 15
Fri, Oct 16
Day 26 lecture
We began to discuss XML,
which is discussed in
Chapter 6 of the book:
Learning XML, 2nd Edition
by Erik T. Ray. We discussed how
to parse XML code using R,
with the XML library in
Duncan Temple Lang's
Omega project.

Some examples of how to
extract data from the web
Week 9: Mon, Oct 19
Day 27 lecture
More examples of how to extract data from the web

See also Chapter 6 of the book:
Learning XML, 2nd Edition
Tues, Oct 20 (Lab)
Assigned Project 7
Wed, Oct 21
Day 29 lecture
Even more examples of how to extract data from the web
We used two example files today:
cdcatalog.xml
bookexample.xml

See also Chapter 6 of the book:
Learning XML, 2nd Edition
Thu, Oct 22
Fri, Oct 23
Day 30 lecture part 1
(extracting results from the Presidential election), and
Day 30 lecture part 2
(extracting college ranking data)

See also Chapter 6 of the book:
Learning XML, 2nd Edition

Election, improved version 1
Election, improved version 2
Week 10: Mon, Oct 26
Day 31 lecture
Analyzing a large iTunes file
using Dr. Ward's iTunes
file as an example: iTunesMusicLibrary.xml

We also debugged some
student XML code in class
Tues, Oct 27 (Lab)
Wed, Oct 28
Introduced Dr. Ward's
MapFunction.txt and studied
some MapExamples.txt
(See also the discussion in Project 8.)

We also debugged more
student code in class.
Thu, Oct 29
Fri, Oct 30
Provided in-class debugging
for student questions about
Project 7 on XML
Week 11: Mon, Nov 2
Introduced Dr. Ward's
CartogramFunction.txt
and studied some
CartogramExamples.txt
(See also the discussion in Project 8.)
Tues, Nov 3 (Lab)
Assigned Project 8
Wed, Nov 4
Day 37 lecture
More examples of making maps
(not found in Project 8)
Thu, Nov 5
Fri, Nov 6
Day 38 lecture
Introduction to mysql

using Sean Lahman's
baseball archive
Week 12: Mon, Nov 9
Day 39 lecture
Using mysql from R

Here are some more examples
about mysql from the
SQL book: examples
and from the
baseball database: examples
Tues, Nov 10 (Lab)
Wed, Nov 11
Dr. Ward meeting with
Ozgur Delemen,
Adriana Vars (in the afternoon)
Thu, Nov 12
Fri, Nov 13
Dr. Ward meeting with
Jake Libauskas,
Catherine Cao,
Jonathan Blair,
Xin Lu Tan
Week 13: Mon, Nov 16
Dr. Ward meeting with
Wee Poh Ee,
Miles Kopcke
Tues, Nov 17 (Lab)
Wed, Nov 18
Dr. Ward meeting with
Katie Poon,
Peter Cheng,
Dan Kinnett
Thu, Nov 19
Fri, Nov 20
final project plan
due at 5 PM
Week 14: Mon, Nov 23
(no class)
Tues, Nov 24
(no class)
Wed, Nov 25 (no lecture)
Thanksgiving Vacation
Thu, Nov 26 (no lecture)
Thanksgiving Vacation
Fri, Nov 27 (no lecture)
Thanksgiving Vacation
Week 15: Mon, Nov 30
final project presentation
by Miles Kopcke
Tues, Dec 1 (Lab)
Wed, Dec 2
final project presentation
by Ozgur Delemen,
Wee Poh Ee
Thu, Dec 3
Fri, Dec 4
final project presentation
by Xin Lu Tan,
Katie Poon
Week 16: Mon, Dec 7
final project presentation
by Jake Libauskas,
Jonathan Blair
Tues, Dec 8 (Lab)
Wed, Dec 9
final project presentation
by Peter Cheng,
Adriana Vars
Thu, Dec 10
Fri, Dec 11
final project presentation
by Dan Kinnett,
Catherine Cao
Final project submission due at 5 PM on Friday, December 18