Title: "Multi-scale approaches for analyses of functional phenotypes arising from high-throughput sequencing assays"
Speaker: Heejung Shim, Department of Statistics, Purdue University

Place: LILLY Hall G126
Date: September 8, 2015; Tuesday
Time: 4:30pm

Abstract: Identification of differences between multiple groups in molecular and cellular phenotypes measured by high-throughput sequencing assays is frequently encountered in genomics applications. For example, common problems include testing for association between genetic variants and gene expression using RNA-seq data and testing for differences in chromatin accessibility across tissues/conditions using DNase-seq or ATAC-seq data. These high-throughput sequencing data provide high-resolution measurements on how traits vary along the whole genome in each sample. However, typical analyses fail to exploit the full potential of these high-resolution measurements, instead aggregating the data at coarser resolutions, such as genes, or windows of fixed length.

In this talk, I will present two multi-scale approaches that more fully exploit the high-resolution data. In the first part of my talk, I will introduce a wavelet-based approach and demonstrate that the proposed wavelet-based approach has more power than simpler window-based approaches in identification of genetic variants associated with chromatin accessibility. I will also illustrate how the estimated shape of the genotype effect can help in understanding the potential mechanisms underlying the identified associations. The second part will discuss potential limitations of the wavelet based approach in analyses of data sets with small sample sizes or low sequencing depths. To address these issues, I will present another approach that models the count nature of the sequencing data directly using multi-scale models, and demonstrate that the proposed models have substantially more power than the wavelet-based approach in analyses of data sets with small sample sizes or low sequencing depths. While we developed these methods with specific applications to sequencing data in mind, these methods have natural applications for analysis of many functional phenotypes.


Associated reading:
1. Shim, H and Stephens, M, Wavelet-based genetic association analysis of functional phenotypes arising from high-throughput sequencing assays, Ann. Appl. Stat. 9 (2015), no. 2, 665-686

2. Degner, J. F., Pai, A. a., Pique-Regi, R., Veyrieras, J.-B., Gaffney, D. J., Pickrell, J. K., De Leon, S., Michelini, K., Lewellen, N., Crawford, G. E., Stephens, M., Gilad, Y. and Pritchard, J. K. (2012). DNaseI sensitivity QTLs area major determinant of human expression variation. Nature 482 390-4



Click here www.stat.purdue.edu/~doerge/BIOINFORM.D/FALL15/sem.html for a full schedule of BIOINFORMATICS SEMINARS, past and present.