Divide and Recombine in the Cloud - Department of Statistics - Purdue University Skip to main content

Divide and Recombine in the Cloud

Room | Time

Description:

One of the most important components of data science is computational environments for data analysis. A language written specifically for programming data, like R or Python, and an operating system (OS) constitute an environment capable, for many datasets, of deep analysis, i.e., at their finest granularity, not just summary statistics. For big data, when the analytic methods have high computational complexity, when hardware for analysis is inadequate, or any combination of these, just the language and an OS can be insufficient.

This experiential workshop teaches high-performance computing for the analysis of big, complex data. It emphasizes the Divide & Recombine (D&R) approach to analyzing big and complex data in which data are divided into subsets, analytic methods applied, and the outputs recombined. Participants will learn to use D&R and program on state-of-the-art high-performance and cloud computing environments, enabled by software such as data.table (fast multi-threading data analysis), DeltaRho (R-RHIPE-Hadoop MapReduce workflow for big data), and Torch (fast array computation with GPU acceleration and a neural networks library).

Through real-world examples and hands-on exercises derived from the instructors' research, students will be able to 1) understand the principles of selecting computational environments depending on the data storage and analytic methods and 2) apply D&R to data analysis tasks that constitute a complex workflow in the cloud.

Prerequisites: Working knowledge of basic probability and statistics, and mathematics through multivariate calculus and linear algebra. Having taken one college-level programming language class. The workshop mainly uses R with some Python scripts incorporated, so some familiarity with either R or Python is required.

 

Purdue Department of Statistics, 150 N. University St, West Lafayette, IN 47907

Phone: (765) 494-6030, Fax: (765) 494-0558

© 2023 Purdue University | An equal access/equal opportunity university | Copyright Complaints

Trouble with this page? Disability-related accessibility issue? Please contact the College of Science.