### Topic Lectures for Spring 2016

**STAT 598LZ** Functional Data Analysis

**Semester: **Spring **Prerequisites: **STAT 512 or permission of instructor**Credits: **3**Primary Audience: **Ph.D. and Master students in the Statistics department**Description: **Functional Data Analysis can be viewed as an extension of multivariate statistics in that is used to analyze samples of curves, or other functional observations. There is a large amount of functional data (i.e. **big data**) emerging in many scientific fields, such as chemometrics, proteomics, remote sensing, brain imaging, etc. Because the dimensionality of these data is very high, proper analysis is very important.

This is an introductory course for graduate students (Masters and Ph.D.), who wish to learn modern data exploration techniques, nonparametric smoothing, and functional data analysis approaches. Functional data analysis, basic smoothing techniques (kernel smoothing, loess, smoothing spline, etc.), curve registration, visualization of functional data, mean estimation, functional principal component analysis, functional linear models, and functional classification approaches will be introduced. Imaging analysis methods along with big data challenges will be discussed in the course as well.

Schedule and Textbook Information for Spring 2016

**STAT 598RP**Introduction to Monte Carlo Methods

**Semester: **Spring

**Prerequisites: **Stochastic Processes (at the level of MA 532)

**Credits: **3

**Primary Audience: **Graduate students and advanced undergraduates with background in stochastic processes.

**Description: **This course covers basic Monte Carlo methodology. The primary motivating setting will be computational finance, but application to numerous other contexts will be evident. We will start with methods for random number, random variate, and random process generation. This will be followed by a treatment of variance reduction ideas (e.g., importance sampling, control variates, splitting), response surface

methods, and bootstrapping. The second half of the course will cover simulation optimization, starting with derivative estimation and then a fairly detailed treatment of stochastic approximation and other gradient-based optimization methods. Certain stylized versions of the option pricing problem will be covered as motivation. The student will be expected to write a non-trivial paper by the conclusion of the semester’s study.

Schedule and Textbook Information for Spring 2016

**STAT 598W** Design and Analysis of Financial Algorithms (Banner Course Number: 59800)

**Semester: **Spring **Prerequisites: **MA 516/STAT 541**Credits: **3**Primary Audience: ****Description: **Information technology (IT) has become a major function in the financial industry. The industry has been employing various software and programming languages to process and maintain the data, to price equity and fixed income derivatives, and to predict stock and index movements. In this course, students will learn basics of C/C++, R and Excel VBA, which are some of the most useful programming tools in financial firms. They will apply their skills to coding and analyzing modern financial algorithms for pricing, hedging, or portfolio optimization.

Schedule and Textbook Information for Spring 2016

**STAT 598Z** Introduction to Computing for Statisticians (Banner Course Number: 59800) **Semester: **Spring **Prerequisites: **Only masters students in the department of Statistics are allowed to enroll. All others require permission from the instructor. **Credits: **3**Primary Audience: **MS students in statistics with little or no exposure to computing concepts and programming**Description: **The objective of this course is to introduce concepts in computing to statisticians. The first part of the course will concentrate on introducing Python and teaching the students basic programming constructs. The next part will concentrate on working with these programming constructs for implementing statistical algorithms with emphasis on sampling and density estimation. The final part of the course will introduce students to convexity and convex optimization. Modern machine learning methods which use these concepts for analysis of large datasets will be briefly introduced.

The course will feature an equal number of lectures and hands-on laboratory sessions. The exams will be a mix of theory and hands-on programming. Students will also get valuable experience in practical data analysis via a course project.

Schedule and Textbook Information for Spring2016

**STAT 695HP** High-Performance Computing for Analysis of Big Data and for High Computational Complexity of Analytic Methods. Banner Course Number: (STAT69500-WC1)

**Semester:** Spring 2016**Prerequisites:** Knowledge of basic probability, basic statistics including least-squares fitting of parametric functions to data, and mathematics through calculus and linear algebra**Credits:** 3**Primary Audience:** Graduate students in university departments where data are analyzed.

**Description:** the much-used term "big data" carries with it a notion of computational performance for the deep analysis of big datasets. But for data analysis, another critical performance for the deep analysis of big datasets. But for data analysis, another critical performance factor is the computational complexity of the analytic routines used in the analysis. Datasets with high complexity have a wide range of sizes. Furthermore, the hardware power available to the data analyst is another critical factor. High-performance computing for complexity, and hardware power by the Divide and Recombine (D&R) statistical approach, and the Tessera D&R software implementation that makes programming D&R easy. Tessera has R at the front end, and the Hadoop distributed database and compute engine at the back end.

In this course, which is hands-on, participants will learn (1) basic concepts of D&R; (2) use of the Tessera software system; (3) basics of distributed parallel computational environments, and the Hadoop environment; (4) a rigorous, scientific framework for computational performance measurement and analysis of distributed parallel computational environments.

Students will have access to a Hadoop cluster; it is provided by the Rosen Center for Advanced Computing and has the Tessera software stack installed. Reading materials and lectures will be provided electronically.

**Participant Responsibilities:** Participants are expected to attend class, and to carry out class project homework. There will be no tests. More information can be found on. http://ml.stat.purdue.edu/STAT695DR/

STAT695HP is taught in the Spring term. There is a related course taught in the Fall term: STAT695DR Banner Course Number (STAT 69500 - WC2) Divide and Recombine (D&R) for the Analysis of Big Data and for High Computational Complexity of Analytic Methods. Both courses present the basics of D&R and the use of Tessera software. STAT695HP goes deeply into computation. STAT695DR goes deeply into analytic methods, for example, optimizing statistical understanding and modeling of the data, and optimizing statistical accuracy.

Schedule and Textbook Information for Spring 2016

**STAT 695SC** Sparse Representations and Signal Recovery (Banner Course Number: 69500B)

Semester: Spring** **(Even Years)

Prerequisites: Linear Algebra and Probability

Credits: 3

Primary Audience: Ph.D. students in statistics and engineering

**Description:** Sparse representation is a cornerstone in statistics and engineering. The heart of model, however, is a simple and long lasting mathematical problem: Given a full rank matrix A with more columns than rows, how do we find a solution with the fewest number of non-zeros in an under-determined system Ax=b? Interestingly, the answer to this question has now become the core of many statistics and engineering problems, such as regression, prediction, denoising, restoration, interpolation and extrapolation, compression, sampling, detection, recognition, etc. In this course, we will explore the fundamentals of the theory behind sparse representations. We will study the uniqueness of sparse solution, pursuit algorithms, convex relaxation, and analyze the performance. Applications in signal and image processing will be discussed.

**STAT 695T** Data Visualization (Banner Course Number: 69500) **Semester: **Spring **Prerequisites: **Knowledge of basic probability, mathematics through calculus and linear algebra, and basic statistics including least-squares fitting of parametric functions to data. No previous knowledge of data visualization is needed.**Credits:** 3**Primary Audience: **Graduate students in university departments where data are analyzed.**Description: **The course content will focus fundamentally on how to analyze data. Through many case studies, it will present visualization methods, going through a number of standard numerical methods and models for statistical analysis, showing how visualization enhances these methods and models. This illustrates the use of the visualization methods and demonstrates why they are essential to valid analyses that preserve the information in data. This material is largely based on the book "Visualizing Data" which is provided to participants by the instructor. In addition, lectures will cover the lattice graphics system in R, which can be used to carry out all methods discussed in the course. To support this, a certain number of classes will consist of labs in which participants will use lattice. More information can be found on http://ml.stat.purdue.edu/stat695t/.

Background: Visual displays allow us to explore data to see overall patterns and to see detailed behavior; no other approach can compete in revealing the structure of data so thoroughly. Analyses without visualization run the risk of using inappropriate methods and models for the data, which can result in missing important information in the data. This is amply illustrated in the course.

**Participant Responsibilities: **Participants are expected to attend class. Homework will consist of carrying out data visualizations using lattice graphics. There will be no tests. More information can be found on http://ml.stat.purdue.edu/stat695t/.

### Topic Lectures for Fall 2016

**STAT 598MZ**Topics on Nonparametric Statistics, Deep Learning, and Artificial Intelligence (Banner Course Number: 59800)

**Semester: **Fall **Prerequisites: **STAT 525 Intermediate Statistical Methodology, STAT 528 Introduction to Mathematical Statistics, and STAT 545 Introduction to Computational Statistics**Credits: **3**Primary Audience: **Graduate students with a sufficient background in statistics.**Description: **The course covers a wide range of topics related to learning from data. In addition to standard statistical methods, those in machine learning, deep learning, and data-driven artificial intelligence will be emphasized. The topics include nonparametric models and methods for high dimensional regression and classification, classification and regression trees and ensemble methods, support vector machines and kernel methods, convolutional and recurrent neural networks, and reinforcement learning. The lectures will focus on ideas, concepts, and methods, instead of technical details.

Schedule and Textbook Information for Fall 2016

**STAT 598TZ**Applied spatial Statistics (Banner Course Number: 59800)

**Semester: **Fall **Prerequisites: **STAT 511 or equivalent courses**Credits: **3**Primary Audience: **Graduate Students in Statistics, Ecology or Environmental Sciences, Economy, Agriculture, and Etc.**Description: **The goal of this course is to introduce standard statistical methods for spatial (including spatiotemporal) data. In general, spatial data can be classified into (a) geostatistical data; (b) lattice (or aggregated unit) data; and (c) spatial point data. The geostatistical data are often used to descript variation a spatial random fields, which may have wide applications in environmental and earth sciences, agriculture, and many other related fields. The lattice data are used to describe occurrences of events that aggregated over spatial units, which may have wide applications in public health and social sciences. The spatial point data are used to describe spatial pattern of events occurred over a space, where each event is given by their space-time coordinates. Methods of spatial point data may have wide applications in natural hazard studies. The course will focus on geostatistical data (about 40% of the effort), lattice data (about 30% of the effort), and spatial point data (about 30% of the effort) with potential applications to health and environmental sciences. Extensive statistical methods and models for spatial statistical data will be well-explored. Numerical algorithms will be provided based on the standard R packages. The course is intended for graduate students in Statistics, Environments, Social Sciences, and other related fields.

Schedule and Textbook Information for Fall 2016

**STAT 695DR** Divide and Recombine (D&R) for Analysis of Big Data and for High Computational Complexity of Analytic Methods (Banner Course Number 69500-WC2)

**Semester:** Fall 2016**Prerequisites:** Knowledge of basic probability, basic statistics including least-squares fitting of parametric functions to data, and mathematics through calculus and linear algebra**Credits:** 3**Primary Audience:** Graduate students in university departments where data are analyzed.

**Description:** D&R is a statistical framework for big data and the high computational complexity of data analytic methods that often occur for data, big or small. D&R is designed to exploit distributed parallel computational environments to achieve feasible and practical computation. The data are divided into subsets, an analytic method is applied to each subset, an analytic method is applied to each subset, and the subset outputs are recombined. Much of the computation is embarrassingly parallel, no communication between different processes, the simplest parallel computation. D&R statistics research seeks division and recombination methods with good statistical accuracy. Tessera D&R software has R at the front end, and the Hadoop distributed database and parallel compute engine at the back end executing the D&R R code.

In this course, which is hands-on, participants will learn (1) basic concepts of D&R; (2) use of the Tessera software system; (3) statistical methods for division and recombination to optimize statistical understanding and modeling of the data and to optimize statistical accuracy. (4) methods of data visualization and use of the Trellis Display software system.

Students will have access to a Hadoop cluster; it is provided by the Rosen Center for Advanced Computing and has the Tessera software stack installed. Reading materials and lectures will be provided electronically.

Participant Responsibilities: Participants are expected to attend class, and to carry out class project homework. There will be no tests. More information can be found on http://ml.stat.purdue.edu/STAT695DR/

STAT695DR is taught in the Fall term. There is a related course taught in the Spring term: STAT695HP Banner Course Number (STAT 695 - WC1) High-Performance Computing for the Analysis of Big Data and for High Computational Complexity of Analytic Methods. Both courses present the basics of D&R and the use of Tessera software. STAT695HP goes deeply into analytic methods. STAT695HP goes deeply into computation, for example, basics of distributed parallel computational environments, and a rigorous, scientific framework for computational performance measurement and analysis of these environments.

Schedule and Textbook Information for Fall 2016