Title: "Selecting Proper Degrees of Sparsity for Sparse CCA with Applications to Analyzing Multiple Data Sets in QTL Mapping"
Speaker: Tilman Achberger, Department of Statistics, Purdue University
Place: HORT 117; April 26, 2011, Tuesday, 4:30pm


In recent years, a number of proposals have been made to create penalized variants of canonical correlation analysis (sparse CCA), which improve the interpretability of the results from the traditional CCA methodology by restricting the linear combinations of variables to contain a relatively small number of variables. A critical challenge in using sparse CCA is determining the proper degrees of sparsity (number of zero coefficients in the linear combinations) for each data set under consideration. A clustering approach is proposed to improve the degree of sparsity estimates obtained by cross-validation, which is typically biased in favor of including too many non-zero variables. A novel application of sparse CCA is proposed in the context of quantitative trait loci (QTL) mapping, where groups of variables (traits) sharing common sources of variation between two data sets are selected for further analysis. Simulation studies are presented to demonstrate the performance. Results from experimental data in the model organism Arabidopsis thaliana are shown.

Associated Reading:

1. Parkhomenko, E., D. Tritchler, and J. Beyene (2009). Sparse canonical correlation analysis with application to genomic data integration. Statistical Applications in Genetics and Molecular Biology 8 (1), Article 1.

2. Buescher, E., T. Achberger, I. Amusan, A. Giannini, C. Ochsenfeld, A. Rus, B. Lahner, O. Hoekenga, E. Yakubova, J. F. Harper, M. L. Guerinot, M. Zhang, D. E. Salt, and I. R. Baxter (2010). Natural genetic variation in selected populations of Arabidopsis thaliana is associated with ionomic differences. PLoS ONE 5 (6), e11081.

Click here for a full schedule of BIOINFORMATICS SEMINARS, past and present.