Title: "Workflow for a Statistical Data Analysis Project"
Speaker: Sanvesh Srivastava, Department of Statistics, Purdue University
Place: HORT 117; January 25, 2011, Tuesday, 4:30pm


Data analysis is an inseparable part of a statistician's career. A statistical data analysis is generally followed by reporting the results, interpretations, and conclusions in a scholarly article or a report. With the increasing complexity of data and statisti- cal procedures, organizing this process into a workflow is of utmost importance. This ensures transparent, re-usable, reproducible, and efficient data analysis and reporting of results. A good workflow often results in coherent organization and greater productivity. In this presentation I will introduce the concepts of workflow in a data analysis project, organization of the involved programs in logical units or building blocks, keep- ing the project in a version control system, automatic backup of the project, and reproducibility of the data analysis. I will also point to useful open-source softwares which greatly simplify these tasks. The presentation is applicable for all data analysis projects and is not specific to particular programming languages or platforms.

Associated Reading:

[1] Galili, T. 2010. Managing a statistical analysis project -guidelines and best practices, R-statistics blog.
[2] Smith, D.M. 2010. A workflow in R, Revolution Analytics Blog.
[3] Healy, K. 2011. Choosing Your Workflow Applications.

Click here for a full schedule of BIOINFORMATICS SEMINARS, past and present.