# Past Topic Courses

**STAT 529K** Bayesian Applied Decision Theory (Banner Course Number: 52900) **Semester: **Summer**Prerequisites: **STAT 517 or equivalent**Credits: **3**Primary Audience: **Applied Statisticians and other disciplines who use data to make decisions.**Description: Offered Maymester.** The Bayesian Decision Theoretic Model, various loss (utility) functions and practical problems. Admissibility, minimax procedures. Selecting the prior and computations for the posterior. Hierarchical Bayesian and empirical Bayesian models, Markov Chain, Monte Carlo (MCMC) techniques. Robust Bayesian methods; sequential Bayesian models. Throughout the course practical examples will be introduced with the emphasis on understanding how to apply the theoretical concepts.

Schedule and Textbook Information for Fall 2015

**STAT 598A** Introduction to Machine Learning (Banner Course Number: 59800)

**Semester: **Spring

**Prerequisites: **Background in Linear Algebra, and basics of Probability. Programming ability in a high level language such as C/C++ is a must. Familiarity with UNIX is assumed.

**Credits: **3

**Primary Audience: **Graduate students and advanced undergraduates with background in probability, linear algebra, and programming (see prerequisites).

**Description: **With the availability of cheap storage devices our ability to collect and store large amounts of data is increasing exponentially. Machine learning is a branch of applied statistics which aims to bring to bear tools from statistics in the analysis of such large datasets. This course is a biased journey through some of dominant concepts in machine learning. Topics to be covered include:

- Density Estimation
- Exponential families of distributions
- Directed and Undirected Graphical models
- Structured Learning
- Optimization for Machine Learning
- Kernels

Schedule and Textbook Information for Summer 2015

**STAT 598AE** Advanced Basic Statistics (Banner Course Number: 59800AE) **Semester: **Spring **Prerequisites: **STAT 528 or equivalent**Credits: **3**Primary Audience:** PhD students in statistics

**Description: **Sequential analysis and related procedures. Invariance, including highly reasonable examples where it can Be shown that all invariant procedures are uniformly bad. Discussion infinite parametric problems, mistakenly called Non-parametric in most of the literature, such as density estimation, regression function estimation, semi-parametric Problems, etc. Some of these have reasonable solutions, but for others, the situation is questionable. There will also be A discussion of large data problems from a decision viewpoint. There will be a few other items discussed.

Any mathematical problems which arise will be discussed; they should not provide difficulties, although they may suggest that the student learn more about those subjects

Schedule and Textbook Information for Fall 2015

**STAT 598AP** Computing For Big Data Analysis (Banner Course Number: 59800AP) **Semester: **Fall **Prerequisites: **Some experience in programming in R, C and JAVA; knowledge in linear models and iterative algorithms, such as EM and Markov chain Monte Carlo.**Credits: **3**Primary Audience: **Restricted to graduate students in the Statistics Ph.D. Program**Description: **This topic course is mainly for graduate students in Statistics who are interested in computing for big data analysis. It focuses on functional/R-style programming and object-oriented programming in Scala, basics of single-machine parallel computing in Scala, and multiple-machine distributed computing in Apache Spark. it also covers basics of Apache Hadoop, internals of Spark, and a modified version of Spark that allows for Distributed Interactive Statistical Computing (DISC). All the materials will be discussed in the context of computational statistics, including data visualization, statistical modeling, and statistical inference, when appropriate. Students are expected to work on relevant term projects.

Schedule and Textbook Information for Fall 2015

**STAT 598AQ** Advanced Topics in Bayesian Learning (CS 59000 BLN) (Banner Course Number: 59800)

**Semester: **Spring

**Prerequisites: **Computational statistics or Statistical Machine Learning or permission of instructor.

**Credits: **1-5

**Primary Audience: **Graduate students who are interested in Bayesian computation and modeling for big data analysis

**Description: **This seminar course will include lectures by the instructor, presentations of readings by students, and possibly guest lectures. This course covers selected topics on machine learning for big data analysis. Tentative topics include:

- Approximate inference: variational Bayes (VB), expectation propagation, online Bayesian inference
- Adaptive Monte Carlo methods: slice sampler, hybrid Monte Carlo, parallel computation and Monte Carlo methods, etc.
- Random projection methods
- Nonparametric Bayesian models for graph and multiway data analysis

Schedule and Textbook Information for Summer 2015

**STAT 598C** Statistical Methods for Bioinformatics and Computational Biology (Banner Course Number: 59800)

**Semester: **Fall

**Prerequisites: **STAT 512 and STAT 514. Prior experience with R is recommended.

**Credits: **3

**Primary Audience: **Graduate students in both Statistics and Life Sciences

**Description: **The course discusses statistical methods and algorithms for analysis of high-throughput experiments in molecular biology, such as transcriptomics, proteomics and metabolomics. The course is appropriate for graduate students with both statistics and life sciences background.

The course introduces relevant biological concepts, and describes the existing high-throughput technologies and biological questions that these technologies can help answer. Then it discusses statistical methods and models used at various stages of analysis, as well as their implementation in statistical software R and Bioconductor.

The course is project-driven and provides hands-on experience with data analysis, critical review of literature, and communication of the results. At the end of the course the students will be able to perform independent analysis of biological data in an interdisciplinary environment such as a pharmaceutical company or a research lab.

A related course was previously offered in Fall 2011.

Schedule and Textbook Information for Summer 2015

**STAT 598CC** Data Analysis Techniques in Earth and Atmospheric Sciences (Banner Course Number: 59800)

**Semester: **Spring

**Prerequisites: **

**Credits: **3

**Primary Audience: **Graduate

**Description: **Application of statistical techniques to analyze and interpret data containing substantial information about the dynamics of our planet Earth. Emphasis on fundamentals with elements of atmospheric/climate time series analysis and weather and climate extremes (necessary for understanding current research) interwoven with computer-intensive bootstrap methods (which work for complex data sets typical in geosciences).

Schedule and Textbook Information for Summer 2015

**STAT 598CL** Statistical Foundations and Inferential Models (Banner Course Number: 59800)

**Semester: **Fall

**Prerequisites: **

**Credits: **3

**Primary Audience: **

**Description: **Converting experience to knowledge is one of the most valuable intellectual activities of humankind. The field of statistics, concerning how and why such transitions should be made, has nonetheless been surprisingly strange, because it does not have a solid foundation. Most noticeable is the lack of a sound theory of reasoning with uncertainty.

Most of the materials for this course grew out of a recent attempt, by the instructor and his collaborators, to understand the foundations of statistical inference and its applications in scientific activities so that principled methods can be developed for very-high-dimensional statistical problems. Taking a new look at statistical inference at the foundational level is no doubt an ambitious task. The intention of this topics course is to share our experience in developing both deep intuition and scientific attitudes toward problem-solving.

This topic course consists of three parts:

- Review of the two dominant schools of thought, namely, Bayesian and frequentist, with full understanding of what they are and what they are not,
- Introduction to an alternative but promising way of thinking, called Inferential Models, towards development of statistics for scientific discovery, and
- Presentation of currently focused topics (and term projects) in the development of Inferential Models.

For more information on Inferential Models, see the two web pages:

http://www.stat.purdue.edu/~chuanhai/

http://homepages.math.uic.edu/~rgmartin/

Schedule and Textbook Information for Summer 2015

**STAT 598DG** Statistical Machine Learning (Banner Course Number: 59800)

**Semester: **Fall

**Prerequisites: **Calculus, basic linear algebra and probability, or permission of instructor.

**Credits: **

**Primary Audience: **Students who are interested in Bayesian learning, graphical models, or computational statistics.

**Description: **This introductory course will cover many concepts, models, and algorithms in machine learning. Topics include classical supervised learning (e.g, support vector machine), unsupervised learning, graphical models, and recent development in the machine learning field such as deterministic approximate inference (e.g., variational Bayesian methods) and Gaussian processes. While this course will give students the basic ideas and intuition behind modern machine learning methods, the underlying theme in the course is probabilistic inference.

Schedule and Textbook Information for Summer 2015

**STAT 598F** Malliavin Calculus, Nourdin-Peccati Analysis, and Stein's Method (Banner Course Number: 59800)

**Semester: **Spring

**Prerequisites: **Preferably MA/STAT 638 and MA/STAT 639

**Credits: **3

**Primary Audience: **PhD students Statistics and Mathematics with a good background in probability theory, at the level of MA/STAT 532 or MA/STAT 538.

**Description: **We will discover basics of the Stochastic Calculus of Variations, also known as the Malliavin calculus. This will include auxiliary elementary material in functional analysis, as needed, as given in the textbook's appendix. The Malliavin calculus will then be applied to a new research direction (started by Ivan Nourdin and Giovanni Peccati in 2008) by which computational on Wiener space and in specific Wiener chaos, enable sharp estimations of distances between probability laws, with use in proving new normal convergence theorems and other types of comparisons. Some of the techniques rely on extensions of Stein's method combined with the Malliavin calculus.

Schedule and Textbook Information for Summer 2015

**STAT 598HR1** Sequential Analysis & Variance (Banner Course Number: 59800)

**Semester: **Spring

**Prerequisites: **

**Credits: **

**Primary Audience: **

**Description: **

Schedule and Textbook Information for Summer 2015

**STAT 598HZ** Modern Applied Statistics (Banner Course Number: 59800)

**Semester: **Spring

**Prerequisites: **STAT 503 or STAT 512

**Credits: **3

**Primary Audience: **Graduate Students in Agriculture and Science

**Description: **This course covers a wide range of topics that are most useful in agricultural, ecological, environmental, and natural resources sciences. Some topics are: analysis of categorical data, linear mixed effects models, spatial experiments and spatial data analysis, resampling methods (bootstrap), Markov chain Monte Carlo, Bayesian analysis, and spline-smoothing.

This course exploits the power of the computing language R, and BUGS for the Bayesian analysis of complex statistical models using Markov chain Monte Carlo (MCMC) methods, both of which are free downloadable at http://www.r-project.org and http://www.mrc-bsu.cam.ac.uk/software/bugs/. Emphases are given to not only the understanding of the models and methods but also the use of computer to solve complex real problems.

Schedule and Textbook Information for Summer 2015

**STAT 598HZ1** Spatio-temporal Extremes and Point Processes (Banner Course Number: 59800)

**Semester: **Fall

**Prerequisites: **Instructor's permission

**Credits: **1

**Primary Audience: **Graduate students in Statistics who are interested in spatial statistics and applications in climate and environment.

**Description: **This is more like a reading course though some faculty and students may lead the discussion by giving presentations. The seminar covers fundamental aspects of spatio-temporal statistics such as spatio-temporal covariance functions, estimation and computing for large spatio-temporal data sets. An emphasis is given to inferences on extremes, which is a very active research topic now given the extreme weather the world experiences. It is appropriate for any statistics graduate students who like to get some exposure to this research area.

Schedule and Textbook Information for Summer 2015

**STAT 598I** Computational Finance Seminar (Banner Course Number: 59800)

**Semester: **Spring

**Prerequisites: **None

**Credits: **1

**Primary Audience: **1. MS or MBA students enrolled in the Computational Finance Program and 2. PhD in Statistics/Mathematics interested in Mathematical/Computational Finance Issues

**Description: **The weekly Computational Finance Seminar brings in academic and industrial leaders from the financial world, providing valuable contacts for students as well as exposure to current research problems in mathematical finance. The topics discussed at the seminar include numerical methods for option pricing, calibration and computational methods for fixed income instruments, local volatility models, calibration of jump-diffusion models, portfolio management, and risk management. It fosters exchanges among faculty and students from different departments at Purdue and also from different Universities in the area. The seminar, together with the Core Computational Finance courses, creates a strong sense of CF community at Purdue.

Schedule and Textbook Information for Summer 2015

**STAT 598JS1** Programming in R With C (Banner Course Number: 59800)

**Semester: **Spring

**Prerequisites: **STAT 512

**Credits: **3

**Primary Audience: **

**Description: **STAT 598 is an introductory course for non-Statistics majors and people who wish to learn programming in the popular statistical software package known as R; it includes supplemental programming in C. This introductory course will also cover the interface between R and C In general, the program in C is much faster than the program in R alone. The key idea is to combine R and C efficiently so that computation time to is preserved, and programs/analyses runs fast enough to perform heavy computations.

R has many nice built-in functions that can be used very easily. This course covers R programming and data manipulations in R. It also covers basic C programming. Several computational algorithms, including numerical optimizations, numerical integrations, Monte Carlo simulations, and tree based methods will be covered.

Schedule and Textbook Information for Summer 2015

**STAT 598K** Climate Time Series Analysis (Banner Course Number: 59800)

**Semester: **Fall

**Prerequisites: **An introductory statistics course at the level of STAT 511.

**Credits: **3

**Primary Audience: **Graduate and strong upper-level undergraduate students from geosciences, statistics, physics, engineering, and finance.

**Description: **The course in time series analysis combining traditionally taught basics with topics of central importance in current weather and climate research: trends, long memory, extremes, nonlinear time series, chaos and complexity.

- Introduction
- Chaos and complexity in atmospheric dynamics: deterministic and probabilistic descriptions
- Random variables and processes
- Statistical computing with R. Monte Carlo, bootstrap and subsampling methods

- Introductory time series analysis
- Stationarity. Linear and nonlinear time series
- Estimation and forecasting for ARMA and ARIMA models
- Confidence and prediction intervals
- Subsampling confidence intervals for nonlinear time series

- Spectral analysis
- Long-memory time series
- Trends
- Deterministic vs stochastic trends
- Smoothing in time series
- Subsampling confidence bands for trends

- Extremes
- Classical extreme value theory
- Extremes of stationary time series

- Time series generated by chaotic dynamical systems
- Predictability of weather and climate

Schedule and Textbook Information for Summer 2015

**STAT 598M** Applied Statistics in Biomedical Literature (BME 695) (Banner Course Number: 59800)

**Semester: **Spring

**Prerequisites: **Graduate student standing in Statistics or BME. Credit by examination is not available for this course.

**Credits: ** 1

**Primary Audience: **

**Description: **Literature relating to current research topics in Biomedical Engineering is presented, reviewed, and critically analyzed. Papers will be selected to cover a diverse set of advanced applied statistical analysis techniques and may include linear and non-linear models, mixed models, and methods for the analysis of categorical data. Students will work in small groups that include both biomedical engineering and statistics graduate students to understand the rationale behind the experiment design and the selection of the appropriate statistical analysis method. Alternative experiment design and statistical data analysis methods will be considered for their appropriateness, limitations, and shortcomings by the student groups and presented to class for discussion.

**Course Objective:** Students will be able to consistently and critically review technical literature in biomedical engineering for the appropriateness of the experiment design and use of statistical methods.

**Composition of the Student Population:** It is anticipated that this course will have limited enrollment (max 10 students). We would like to have equal numbers of STAT and BME graduate students. There must be at least 3 students enrolled from each discipline for the course to be offered.

**Class format:** Meets for 100 minutes for 8 weeks. At least 8 different journal articles will be considered.

**Required text: ** Assigned current literature in biomedical engineering with applied statistics.

**Assessment:** Pass/Fail based on attendance, group analysis and oral presentation of article, and meaningful participation in discussions. Each student will be involved in a group that leads the analysis and discussion of at least 2 papers over the course of the 8 weeks.

**Faculty Support:** The first time the course will be taught, Ann Rundell will be the BME faculty and George McCabe will be the STAT faculty. If the course is successful, it would be best to have one BME and one STAT faculty jointly teach the course each time it is offered.

**Rationale for course:** This course is intended to provide a multi-disciplinary educational experience that will prepare our graduate students on the appropriate use of statistical methods in the context of biomedical engineering. It will encourage our students to better appreciate the complexities associated with designing and executing an effective experiment and selecting an appropriate statistical method, in addition to providing an opportunity for students from these two disciplines to work as a team.

In addition, the course will partially fulfill the mission of the HHMI grant awarded to Purdue to infuse statistical training within the life sciences at the graduate level.

Schedule and Textbook Information for Summer 2015

**STAT 598N** Various Forms of Bayes (Banner Course Number: 59800)

**Semester: **Fall

**Prerequisites: **STAT 528 or equivalent

**Credits: **3

**Primary Audience: **PhD students

**Description: **When one speaks of Bayesian analysis, it is far from clear what is meant. One clear case is the use of the well-known Bayes Theorem to get posterior probabilities from prior probabilities and the likelihood function of the information provided by the data. This can be extended to the use of loss as a multiplier of probability; the resulting weight measure is equally usable with the Bayes approach.

Even the most classical statistician will agree to the above, if one assumes expected loss is to be minimized, but there are major problems in practice. One is that it is rarely possible to pinpoint one's prior, or even one's loss. Another is that the computation of the optimal procedure may require the use of computers much larger than the size of the universe, operating far more rapidly than the laws of physics will let them, So approximations must be made.

This is most easily seen in high-dimensional problems, including the so-called "non-parametric" ones, which are really "infinite-parametric". There are formal priors which are easy to work with, and there are priors based on what might be considered reasonable models. The intersection of these two classes appears to be at best small, and in most cases even empty.

As for priors, much has been made of "objective" or "non-informative" priors. This is at best an approximation, as the user's prior, not the statisticians, should be used. But the situation is even worse; some of these "priors", which have infinite total measure, can be shown to be uniformly improvable. The use of infinite priors is technically incorrect, but the procedures may be good approximations. This is not the same as finitely additive priors, for which the interchange of integration needed rarely holds. These are sometimes the same as non-informative priors, but not always so.

These will be discussed at length, as well as the ways to get around the problems, sometimes by using procedures which are not really Bayes, even though they look it. This is not as crazy as it seems; we do not have the impossible computers described above.

So we will discuss various methods, including, but not limited to, prior Bayes, objective Bayes, empirical Bayes, hierarchical Bayes, robustness, by which I mean the lack of sensitivity to assumptions which one does not wish to make, etc. We will also discuss methods of computing, and what needs to be done to make them work well.

Schedule and Textbook Information for Summer 2015

**STAT 598R** Statistical Methods for Association Mapping (Banner Course Number: 59800)

**Semester: **Fall

**Prerequisites: **STAT 511, STAT 512 or equivalent

**Credits: **3

**Primary Audience: **Graduate students in Statistics and Biomedical Sciences

**Description: **This course focuses on the analysis of data from genetic association studies including population-based and family-based studies. The basic biological concepts and the procedure for pre-processing data will be introduced, then both single-maker and multi-marker analysis methods will be discussed. In addition, some of the newly developed methods for association analysis of various types of data will be covered. The primary objective of this course is for students to be able to perform association analysis using appropriate statistical methods.

Schedule and Textbook Information for Summer 2015

**STAT 598SK** Probabilistic Graphical Models (Banner Course Number: 59800)

**Semester: **Spring

**Prerequisites: **A basic course on probability (e.g., STAT 516/STAT 519) is required; programming experience (e.g., STAT 545) is required, each student must be comfortable carrying out a medium size implementation (>500 lines of code) in a high-level programming language; a course in linear algebra is recommended; a course in machine learning (e.g., STAT 598A/STAT 598N/CS578) would be helpful but is not required.

**Credits: **3

**Primary Audience: **PhD and advanced MS or undergraduate students in quantitative sciences and engineering (e.g., Statistics, Computer Science, CS/E). Must have interest in analysis/understanding of high-dimensional data.

**Description: **Probabilistic graphical models provide a convenient framework for modeling of joint distributions by utilizing graphs to represent the dependence among the variables. The course introduces several such frameworks, including Bayesian (belief) networks, Markov random elds, and covers topics related to representation, exact and approximate inference, and parameter and structure estimation in models for high-dimensional data.

Schedule and Textbook Information for Summer 2015

**STAT 598T** Applied Spatial Statistics (Banner Course Number: 59800)

**Semester: **Spring

**Prerequisites: **A graduate course in statistics or probability.

**Credits: **3

**Primary Audience: **Students who are interested in analyzing spatial data.

**Description: **This course covers a wide range of statistical models and methods for data that are collected at different spatial locations and perhaps at different times. These data are called spatial or spatio-temporal data, which are prevalent in many scientific disciplines such as agronomy, plant pathology, forestry and natural resources, environmental and health studies, climatology, geology, biosecurity, etc. Due to the advance in technology, massive spatial data are collected in various disciplines, which do require novel methods to process and analyze. Consequently, spatial statistics is currently one of the most active research areas in statistics. This course will introduce the classical methods as well as some newly developed ones, and will provide ample hands-on activities. The programming language R and a few packages for analyzing spatial data will be introduced. The primary objective is for students to be able to identify appropriate methods and analyze spatial data in their research.

Schedule and Textbook Information for Summer 2015

**STAT 598X** Introduction to Experimental Design (Banner Course Number: 59800)

**Semester: **Summer

**Prerequisites: **STAT 501 and STAT 502 or equivalent

**Credits: **3

**Primary Audience: **Education, Social Sciences, Behavioral Sciences and Life Sciences, but not Mathematical Sciences or Engineering

**Description: **This course focuses on teaching Statistics and Experimental Design topics using a conceptual approach, rather than through mathematical derivation. Major experimental designs will be covered, with particular emphasis on Completely Randomized, Randomized Complete Block, and Split Plot Designs. Students will be exposed to the entire experimental process from selecting the appropriate design, conducting proper randomization, analyzing data (using SAS), and interpreting the results. Students will read journal articles to see real world examples of well designed and poorly designed experiments. Student will then be asked to critique these experiments with their fellow classmates.

This course can serve as a lead-in to Design of Experiments (STAT 514), or as a substitute for those students who do not require the mathematical rigor presented in STAT 514. This course will be offered as a distance education course, yet it is still applicable to all on-campus students. All lectures and course material will be available online through the Purdue Blackboard Vista website. ANOVA and regression will be reviewed at the beginning of this course, however it is assumed that all students have had at least one course on these topics, and have a working knowledge of this material at the level of STAT 501 and STAT 502 or STAT 512.

Schedule and Textbook Information for Summer 2015

**STAT 690** Seminar (Banner Course Number: 69000)

**Semester: **Fall Spring Summer

**Prerequisites: **Designator Required Course: Students must contact the department office to obtain a two digit instructor designator code

**Credits: **1-3

**Primary Audience: **

**Description: **Individual Study

Schedule and Textbook Information for Summer 2015

**STAT 690M** Mathematical Statistics Seminar (Banner Course Number: 69000)

**Semester: **Fall

**Prerequisites: **

**Credits: **1

**Primary Audience: **Graduate students and faculty

**Description: **The goal of these seminars is to provide a platform for faculty members and graduate students to discuss important problems in contemporary mathematical statistics as well as their on-going research. At the end of the seminars on a particular theme, groups of people with common interest in research problems in that topic are expected to work together. The form of the seminars is intended to be informal, so that attendees can interact with each other freely. Graduate students may have opportunities to give presentations.

Schedule and Textbook Information for Summer 2015

**STAT 695** Seminar in Mathematical Statistics (Banner Course Number: 69500)

**Semester: **Fall Spring Summer

**Prerequisites: **

**Credits: **1-3

**Primary Audience: **

**Description: **Individual Study that meets 3 times per week for 50 minutes per meeting for 16 weeks.

Schedule and Textbook Information for Summer 2015

**STAT 695AD** Regularization in High Dimensional Statistics (Banner Course Number: 69500)

**Semester: **Spring

**Prerequisites: **

- Probability equivalent to MA/STAT 519
- One prior course on statistics equivalent to any of STAT 517, STAT 528 or STAT 529.

**Credits: **3

**Primary Audience: **PhD students in Mathematics, Statistics, Computer Science, and Theoretical Engineering with knowledge of graduate probability.

**Description: **Meaning and motivation for regularization; Early examples of regularization; Bayesian connections; Regularization in modern forms: Aggregation; Boosting; Penalized estimation; Thresholding; Lasso; Covariance regularization; Sparse PCA; Learning; Model selection.

**Principal references:** Papers of Bickel, Buhlmann, Bunea, Cai, Candes,DeVore, Devroye, Donoho, Fan, Groeneboom, Johnstone, Juditsky, Picard, Ritov, Stein, Tibshirani, Tsybakov, van de Geer, van der Vaart, Yu, and their coauthors.

Schedule and Textbook Information for Summer 2015

**STAT 695H** High Dimensional Data Analysis (Banner Course Number: 69500)

**Semester: **Fall

**Prerequisites: **STAT 528 or equivalent

**Credits: **3

**Primary Audience: **Graduate Students

**Description: **High-throughput technologies generate massive and high-dimensional data in many different research fields. Many regularization/shrinkage approaches have been proposed to effiectively analyze such data. This course will introduce basic ideas behind proposed methods, and discuss further methodology development. Lectures and discussions will cover the following topics:

- James-Stein Shrinkage Estimator and Its Generalization
- Threshold Estimator for Sparse Parameters
- Empirical Bayes Threshold and Its Generalization
- Principal Component Analysis (PCA) and Sparse PCA
- Variable Selection I: LASSO and Its Variants
- Variable Selection II: Supervised Dimension Reduction
- Statistical Test of Massive Hypotheses
- Variable Selection III: Evaluation of Predictor Significance
- Choice of Tuning Parameters
- Data Visualization

**Assignments:** There will be two types of assignments, (1) class presentations — each student is expected to present at least one paper in class; (2) class project — students can choose to compare different methods, empirically analyze real data, or work on your newly proposed methods following course discussion.

**Final Grade:** Your final grade will depend on the following components with these proportions: class participation(30%), class presentation (35%) and class project (35%).

**In the Event of a Major Campus Emergency:** Course requirements, deadlines and grading percentages are subject to changes that may be necessitated by a revised semester calendar or other circumstances. Here are ways to get information about changes in this course: course web page, my email address and office phone.

Schedule and Textbook Information for Summer 2015

**STAT 695B** Dimension Reduction (Banner Course Number: 69500)

**Semester: **Fall

**Prerequisites: **STAT 528 or equivalent

**Credits: **3

**Primary Audience: **Graduate students.

**Description: **Dimension reduction plays an important role in analyzing massive and high-dimensional data in different research fields. This course will review different methods of dimension reduction, and further discuss recent methodology development. Lectures and discussions will cover the following topics:

- Unsupervised Dimension Reduction
- Supervised Dimension Reduction
- Sufficient Dimension Reduction
- Sliced Inverse Regression
- Kernel Dimension Reduction
- Sparsity in Dimension Reduction
- Data Visualization

**Assignments:** There will be two types of assignments, (1) class presentations (each student is expected to present at least one paper in class); (2) class project (students can choose to compare different methods, empirically analyze real data, or work on your newly proposed methods following course discussion).

**Final Grade:** Your final grade will depend on the following components with these proportions: class participation (30%), class presentation (35%) and class project (35%).

In the Event of a Major Campus Emergency: Course requirements, deadlines and grading percentages are subject to changes that may be necessitated by a revised semester calendar or other circumstances. Here are ways to get information about changes in this course: course web page, my email address and office phone.

Schedule and Textbook Information for Summer 2015

**STAT 695C** Bayesian Statistics: Foundations, Methods, Modeling, Inference, Computing, and Applications (Banner Course Number: 69500)

**Semester: **Fall

**Prerequisites: **Knowledge of mathematics, probability, and statistics at the level of MA 174 Multivariable Calculus, STAT 311 Introductory Probability, and STAT 350 Introduction to Statistics.

**Credits: **3

**Primary Audience: **Graduate students in statistics, computer science, or any field where the analysis of data is important.

**Description: **Bayesian statistics has become the most practical general approach to complete statistical modeling and inference in today's world of increasingly large and complex datasets. In the 1950s, Bayesian methods were seldom used, but usage has been on the rise since then and today, they appear in vast numbers of applications.

The reason for the growth is that once a model is formulated, general principles provide methods for estimating unknown quantities and characterizing the accuracy by statistical distributions. Unlike the classical sampling theory approach to modeling and inference, there is no need to face immensely challenging problems of inventing new methods of estimation and determining sampling distributions for each type of model.

The Bayesian methods, while straightforward, do present computational challenges, but 40 years of research have resulted in many successful computational methods such as Markov Chain Monte Carlo, that have made the Bayesian approach practical.

The course will have an emphasis on using Bayesian methods in practice. It covers the following:

- The foundations of Bayesian statistics
- History of its development
- Problems with sampling theory solved by the Bayesian approach
- Bayesian models (prior and likelihood)
- Methods associated with uses in practice such as model building, visualization, and posterior predictive model checking
- A Bayesian theory of model building
- Bayesian methods in machine learning
- Applications from the published literature
- Large complex datasets

Participants will be expected to attend all lectures, to solve assigned problems in the course text, and to present certain solutions to the problems in class. There will be no exams.

Schedule and Textbook Information for Summer 2015

**STAT 695F** R/RHIPE/Hadoop (Banner Course Number: 69500F) **Semester: **Fall **Prerequisites: **Knowledge of basic probability and statistics, and mathematics through calculus and linear algebra. No previous knowledge of R, Hadoop,or RHIPE is needed. **Credits: **3**Primary Audience: **Graduate students in university departments where data are analyzed.**Description: **This course has two components: (1) The Divide and Recombine (D&R) statistical approach to large complex data; (2) The Tessera computational environment that implements D&R, allowing a data analyst to carry out deep analysis of big data using D&R. Deep analysis means that the data are analyzed in detail at their finest granularity, and the analyst has access to any of the 1000s of methods of statistics, machine learning, and visualization for use in the analysis.

Tessera has R at the front end. All analyst programming is in R. At the back end is the Hadoop distributed file system (HDFS) and parallel compute engine (MapReduce). Hadoop runs the analyst's R commands to carry out the D&R computations. Tessera software packages merge R and Hadoop, enabling communication between the two, and making programming D&R easy.

Students will have access to a Hadoop cluster provided by the Rosen Center for Advanced Computing, and with the Tessera software stack installed. Reading materials and lectures will be provided electronically.

Schedule and Textbook Information for Fall 2015

Course Page for Fall 2014

**STAT 695G** Objective Bayes and Model Selection (Banner Course Number: 69500)

**Semester: **Fall

**Prerequisites: **STAT 519 and STAT 528.

**Credits: **3

**Primary Audience: **Doctoral students

**Description: **Evaluation is through three sets of assignments that would give you a choice from a broad spectrum of problems, on numerical computation simulation, mathematical or theoretical problems and applied or methodological questions.

There is no textbook for the course. However, I will use An Introduction to Bayesian Analysis Theory and Methods by Ghosh, Delampady and Samanta (Springer 2006) as a reference. You don't need to buy the book. I will give notes and have a copy of notes (in the course file in the Math Library) as we go along.

**Objective Bayesian Analysis Syllabus**

- Decision theoretic formulation of statistical Inference. Predictive formulation of statistical Inference to the three paradigms of statistics: Classical statistics, Bayesian Analysis, Data Mining/Machine Learning/Statistical Learning, Paradoxes in classical statistics, Likelihood Principle (LP), coherence. Birnbaum's theorem on LP, Finetti's theorem on coherence. Rationality principles leading to Bayesian Analysis Subjective and Objective Bayesian Analysis.
- Choice of priors, de Finetti's representation theorem on exchangeable sequence, algorithmic constructions of objective priors, common criticism of such priors and answers, elicitation of a subjective prior. Introduction to common inference problems, BIC, MCMC
- Laplace approximation, Bayesian asymptotics, Frequentist validation of Bayesian Analysis via posterior consistency, Schwartz's theorem on posterior consistence (without proof), Posterior normality, the Kadane-Kass-Tierney approximations to posterior calculations. Derivation of BIC via Laplace approximation. Choice of sample size for Bayesian testing problems.
- Bayesian Testing for sharp null and composite null. Comparison and P-values and posterior probabilities, the BBS (Bayarri-Berger-Sellke) calibration of P-values for sharp nulls, Bayesian P-values,
- Difficulties in Objective Bayes testing, the Berger Pericchi solution through Intrinsic Bayes Analysis, Intrinsic Priors, O'Hagan's Fractional Bayes Analysis, A comprehensive approach to these methods.
- High dimensional, Parametric and Hierarchical, Bayesian problems (Estimation, Testing, Model selection). High dimensional Bayesian testing and MCMC
- Three applications (Disease mapping via spatial Bayesian analysis, Bayesian nonparametric regression via wavelets and Dirichlet multinomial allocation)

Schedule and Textbook Information for Summer 2015

**STAT 695JFL** Stoch Mdl Wth Pt & JMP Process (Banner Course Number: 69500)

**Semester: **Fall

**Prerequisites: **STAT 519/MATH 519 or equivalent. STAT 532 recommended

**Credits: **3

**Primary Audience: **Graduate students in Mathematics and Statistics who are interested in stochastic modeling.

**Description: **In simple terms, a stochastic point process is a random distribution of points in an Euclidean space, which may or not evolve through time (temporal/space point process). In recent years, there has been a renewed attention to this type of processes due to their natural application in finance, environmental sciences, analysis of extreme values, and others.

This course is aimed at covering as formally as possible (depending on the background of the audience) the statistical and probabilistic tools for the construction and analysis of general point processes, from the familiar temporal/spatial Poisson point process to double stochastic models such as Cox processes, self-exciting type processes such as Hawkes processes, and cluster type processes. The connection of point processes with Levy and renewal processes, as well as with queuing and extreme values theories will also be explored. Towards the second part of the course, we will review some more advanced, yet important, problems related to point processes such as filtering, optimal control, and likelihood estimation.

The course evaluation is through assignments and a final project, which will be chosen by the students (with the instructor's guidance and approval) according to their personal research interests and background.

Schedule and Textbook Information for Summer 2015

**STAT 695JG** Business Analytics, Causal Analysis, and Modeling (Banner Course Number: 69500)

**Semester: **Fall

**Prerequisites: **STAT 528 and STAT 519

**Credits: **

**Primary Audience: **

**Description: **To introduce students to Business Analytics, Causal Analysis, and Modeling human behavior via Kahneman's Nobel prize winning work on realistic models of human behavior in various transactions, including financial transactions. I think of this also as a sort of Business Analytics. Full notes will be given in class.

Schedule and Textbook Information for Summer 2015

**STAT 695M** Mathematical Statistics Seminar (Banner Course Number: 69500)

**Semester: **Fall

**Prerequisites: **

**Credits: **

**Primary Audience: **

**Description: **

Schedule and Textbook Information for Summer 2015

**STAT 695MZ** Nonparametric Statistics and Machine Learning (Banner Course Number: 69500)

**Semester: **Fall

**Prerequisites: **STAT 528

**Credits: **3

**Primary Audience: **Graduate students interested in nonparametric statistical methods and machine learning algorithms

**Description: **This is an introductory course in nonparametric statistical methods and machine learning algorithms.

Schedule and Textbook Information for Summer 2015

**STAT 695N** Nonparametric function estimation via penalty smoothing (Banner Course Number: 69500)

**Semester: **Fall

**Prerequisites: ** STAT 516/STAT 519, STAT 517/STAT 528, STAT 525, STAT 526

**Credits: **3

**Primary Audience: **MS and PhD students

**Description: **Nonparametric function estimation, otherwise known as smoothing, has seen extensive developments in the literature. Assisted by the ample desktop and laptop computing power we enjoy today, smoothing methods are now finding their ways into everyday data analysis by practitioners.

In this course, we will explore a family of methods known as penalty smoothing; the methods are closely related to ridge regression/Bayes modeling, kernel-based learning, and Tikhonov regularization. Using a certain functional ANOVA decomposition, modular multivariate models can be constructed for regression, (conditional) density estimation, and hazard rate estimation problems, with the additive models being special cases. Software tools in R will be discussed, which implement model selection and data analytical techniques such as cross-validation, Kullback-Leibler projection, Bayesian confidence intervals, etc.

Students interested in the course should have a working knowledge of statistical inference (STAT 517/STAT 528), linear models (STAT 525), generalized linear models (STAT 526), and matrix algebra. The course work includes homework problems and projects, where the projects can be in a variety of forms depending on personal interests/preferences.

The course will be more theoretical than your typical applied course and more applied/computational than your typical theoretical course.

Schedule and Textbook Information for Summer 2015

**STAT 695O** Large Scale Data Analysis (Banner Course Number: 69500)

**Semester: **Fall

**Prerequisites: **Basic knowledge of statistical and computational methods

**Credits: **3

**Primary Audience: **Graduate students in statistics and other fields who are interested in data analysis

**Description: **It covers several specific large-scale data problems and general statistical methods for analyzing large scale data. These include:

- Missing data problems and multiple imputation methods in sample survey and multivariate time series,
- Image analysis and particle detection in Electron Microscopy,
- Large-scale multiple testing and identifying significantly expressed genes,
- Variable selection and the
*p*>>*n*problem, - Dynamic thresholding and online inference,
- Network traffic and long-range dependence models,
- Stock returns and model building,
- Large-scale multinomial inference and genome-wide association studies,
- Complex data structures, meta analysis, and hierarchical modeling,
- Accelerated lifetime data analysis,
- Object tracking and particle filter, and
- Observational study and causal inference.

Schedule and Textbook Information for Summer 2015

**STAT 695R** Asymptotic Statistics and Empirical Processes (Banner Course Number: 69500)

**Semester: **Fall Spring

**Prerequisites: **STAT 519 and STAT 528.

**Credits: **3

**Primary Audience: **PhD students in Statistics. Other PhD students in Econ, Math, etc. who do not have the prerequisite but wish to take the course, should first consult instructor.

**Description: **The special topic class is composed of two sections: Asymptotic Statistics and Introduction to Empirical Processes.

Asymptotic statistics is the study of large sample properties and approximations of statistical tests, estimators and procedures. In general, the goal is to learn how well a statistical procedure will work in a variety of settings much more diverse than what we can even begin to simulate. Hence the most critical goal of asymptotic research is to verify the validity of statistical procedures in useful generality. Topics we will cover include functional delta method, the bootstrap, semiparametric statistics, rates of convergence for nonparametric estimators, *M*-estimators, *Z*-estimators, Bayesian procedure, and many other areas.

The goal of second section is to introduce students with a background in mathematical statistics, to empirical processes. These powerful research techniques are surprisingly useful for studying large sample properties of statistical estimates from realistically complex models as well as for developing new and improved approaches to statistical inference. The course will develop in each student the technical skills to enable application of empirical process and semiparametric methods in statistics. Other areas to be covered include stochastic convergence in metric spaces, Brownian motion and Brownian bridges, Gaussian Processes, Glivenko-Cantelli and Donsker theorems and entropy calculations.

Schedule and Textbook Information for Summer 2015

**STAT 695S** Machine Learning Reading Group (Banner Course Number: 69500)

**Semester: **Fall

**Prerequisites: **A course in machine learning or data mining, permission of instructor

**Credits: **1

**Primary Audience: **PhD students in sciences or engineering with interest in Machine Learning

**Description: **Machine learning is an interdisciplinary research area which concerns with development of computationally efficient statistical techniques for understanding of patterns in large data sets. This intent of the course is to discuss the most recent developments in this vibrant research field.

The course will be organized as a series of participants' presentations of the preselected recent papers from the top machine learning venues (e.g., ICML, UAI, NIPS, JMLR, MLJ). Each meeting will consist of a presentation of 1-2 papers followed by a group discussion of the main ideas.

Each participant is expected to present at least once during the semester. Students will be evaluated (P/NP) based on their presentation and overall participation.

Schedule and Textbook Information for Summer 2015

**STAT 695T** Data Visualization (Banner Course Number: 69500)

**Semester: **Spring

**Prerequisites: **Knowledge of basic probability, mathematics through calculus and linear algebra, and basic statistics including least-squares fitting of parametric functions to data. No previous knowledge of data visualization is needed.

**Credits: **3

**Primary Audience: **Graduate students in university departments where data are analyzed.

**Description: **The course content will focus fundamentally on how to analyze data. Through many case studies, it will present visualization methods, going through a number of standard numerical methods and models for statistical analysis, showing how visualization enhances these methods and models. This illustrates the use of the visualization methods, and demonstrates why they are essential to valid analyses that preserve the information in data. This material is largely based on the book "Visualizing Data" which is provided to participants by the instructor. In addition, lectures will cover the lattice graphics system in R, which can be used to carry out all methods discussed in the course. To support this, a certain number of classes will consist of labs in which participants will use lattice. More information can be found on http://ml.stat.purdue.edu/stat695t/.

**Background:** Visual displays allow us to explore data to see overall patterns and to see detailed behavior; no other approach can compete in revealing the structure of data so thoroughly. Analyses without visualization run the risk of using inappropriate methods and models for the data, which can result in missing important information in the data. This is amply illustrated in the course.

**Participant Responsibilities: ** Participants are expected to attend class. Homework will consist of carrying out data visualizations using lattice graphics. There will be no tests. More information can be found on http://ml.stat.purdue.edu/stat695t/.

Schedule and Textbook Information for Summer 2015

**STAT 695V.F13** The R Language (Banner Course Number: 69500)

**Semester: **Fall

**Prerequisites: **Knowledge of basic probability and statistics, and mathematics through calculus and linear algebra. No previous knowledge of R is needed.

**Credits: **3

**Primary Audience: **Graduate students in university departments where data are analyzed.

**Description: **The course will follow the book Software for Data Analysis: Programming with R by John Chambers, a designer of S, and the winner of the ACM award described below. The book is available electronically at lib.purdue.edu. In the last two weeks of the course, lectures will cover RHIPE, the R package that allows, wholly from within R, the analysis of complex big data on a cluster.

**Background:** R is an interactive language for data analysis. Its superb design makes it a very effective environment for "programming with the data" and tailoring analyses to the data much more efficiently than can be done with lower-level languages. R is the public domain version of the S language, which won the ACM Software System Award in 1998 because it would "forever alter the way people analyze, visualize, and manipulate data". Other winners are Unix, the World Wide Web, Visicalc, ... , so you get the idea of the company it keeps. R is very widely used, has a very effective core development group, and has a vast number of user contributed packages that add up to, by far, the largest collection of numerical and visualization methods of any software environment for statistics and machine learning. The lattice graphics package, which implements the trellis display framework, is a very effect system for data visualization.

**Participant Responsibilities:** Participants are expected to attend class. Homework will consist of mastering a number of detailed matters about R capabilities and functions through reading the Chambers book and doing some exercises; this is necessary because it is not possible to learn a language without using it. Participants will also do exercises in class, which is held in a lab. There will be no tests. For more information on the course see http://ml.stat.purdue.edu/stat695v.

Schedule and Textbook Information for Summer 2015

**STAT 695W** Bayesian Nonparametrics (Banner Course Number: 69500)

**Semester: **Fall

**Prerequisites: **

**Credits: **

**Primary Audience: **

**Description: **

- Graduate Programs
- How to Apply
- Faculty Research Groups
- Graduate Courses
- Career Development
- Contact Us