Session 16 - Department of Statistics - Purdue University Skip to main content

Statistical genomics data analysis

Speaker(s)

  • Xianran Li (Kansas State)
  • Gabriel Murillo (UC Riverside)
  • Yuehua Cui (Michigan State University)
  • Shreyartha Mukherjee (Iowa State University)
  • Stephen Stanhope (University of Chicago)
  • Shaoyu Li (St. Jude Children's Research Hospital)

Description: TBA 

Schedule

Sat, June 23 - Location: STEW 214

TimeSpeakerTitle
1:30 - 1:55PM Xianran Li Genome-wide association studies identify genic and non-genic contributions to quantitative trait variation in maize
Abstract: The genomic distribution of trait-associated SNPs (TASs) discovered in genome-wide association studies (GWAS) can provide insights into the genetic architecture of complex traits and contribute to the design of future studies. Here we report on a set of GWAS in maize that identified TASs underlying five quantitative traits (leaf length, leaf width, upper leaf angle, days to anthesis, and days to silking) measured across a large panel of samples, and examined the characteristics of the discovered TASs. A set of mostly genic SNPs was generated through an analysis of RNA-seq reads. This SNP set was complemented with a set of maize HapMap SNPs that contains approximately equal proportions of genic and non-genic SNPs. TASs were identified with a genome scan while controlling for polygenic background effects. The diverse functions of TAS-implicated candidate genes indicate that complex genetic networks shape these traits. More importantly, we found that TASs were enriched in the non-genic regions, particularly within a 5 kb window upstream of genes, but depleted in nonsynonymous sites, suggesting that alterations in protein sequence may be quantitatively less important than changes in gene regulation in shaping the natural variation in these traits. Consistent with these findings, TASs collectively explained 44~59% of the total phenotypic variation across maize quantitative traits, and on average, 79% of the explained variation could be attributed to TASs located in genes or within 5 kb upstream of genes (which together comprise only 13% of the genome).
2:00 - 2:25PM Gabriel Murillo GeMS: HTS SNP calling which accounts for sample preparation errors
Abstract: A review of the available single nucleotide polymorphism (SNP) calling procedures for Illumina high-throughput sequencing (HTS) platform data reveals that most rely mainly on base-calling and mapping quality values as sources of error when calling SNPs. Thus errors not involved in base-calling or alignment, such as those in genomic sample preparation, are not accounted for. A novel method of consensus and SNP calling, Genotype Model Selection (GeMS), is given which accounts for the errors that occur during the preparation of the genomic sample. Simulations and real data analyses indicate that GeMS has the best performance balance of sensitivity and positive predictive value among the tested SNP callers. Future work on multiple sample and mutation library SNP calling will also be introduced. The GeMS software package can be downloaded from https://sites.google.com/a/bioinformatics.ucr.edu/xinping-cui/home/software.
2:30-2:55PM Yuehua Cui Varying coefficient model for nonlinear gene-environment interaction
Abstract: The genetic influences on complex diseases generally depend on the joint effects of disease variants, environment factors, as well as their interplays. Gene-environment (G×E) interactions play vital roles in determining an individual disease risk, but the underlying machinery is poorly understood. Genes could respond to environmental stimuli in a linear or non-linear fashion, leading to complicated interaction patterns. The commonly assumed linear assumption in the current regression-based framework to examine the G×E interaction, thus could be easily violated by the nature of non-linear G×E interaction. In this talk, I will introduce a generalized varying coefficient model to model non-linear G×E interaction with continuous or binary disease traits. A non-parametric regression function is fitted to the varying coefficients. A group of statistical tests is proposed to elucidate the machinery of non-linear G×E interaction. The utility of the proposed method is illustrated via simulation and real data analysis.
3:00-3:30PM Break
3:30-3:55PM Shreyartha Mukherjee spliceR: Detecting and quantifying Allele-Specific-Expression from RNA-seq data
Abstract: Allelic specific expression (ASE) is a vital factor in phenotypic variability and for the development of complex traits. Some genes display allelic disparity in gene expression that is transmitted by Mendelian or non-Mendelian inheritance and this discrepancy may be associated to effects like heterosis, variation in yield, uniformity in plants and complex traits and diseases in animals. It is of great interest to study how genetic and epigenetic modifications lead to transcriptional variation and how transcriptional variation affects the phenotype. Differential allele expression may be controlled by changes to the nucleotide sequence and regulatory elements, such as single nucleotide polymorphisms (SNPs), insertions and deletions, and studies indicate that these variations are rampant across the genomes and tissues. Such variants in the coding regions of genes may alter the structure and function of the gene product. Recent studies have shown that preferential expression of alleles is widespread in mammals. Non-imprinted autosomal genes exhibit allelic imbalance at the transcript level in mouse hybrids (Cowles et al., 2002) and humans (Yan et al., 2002), and such expression produces proteins associated with diseases. Hence a solid understanding of classification and functional annotation of allele-specifically expressed genes is vital to recognize the extent of functionally important regulatory variation. This will help us identify candidate haplotypes and the correlation between their genetic sequences and heterotic traits. The physiological vigor and variations in general health of an organism is strongly associated with the extent of variation of parental gametes. In our study we propose a novel approach (spliceR) to study allele-specific expression and identify alleles that are preferentially expressed across genetic backgrounds and levels of inbreeding.
4:00-4:25PM Stephen Stanhope Mixed models and score tests for association studies of binary traits with risk covariates in populations of related individuals
Abstract: Although much progress has been made on developing effective and tractable methods for performing association studies of quantitative traits with risk covariates in populations of related individuals, there has been substantially less attention paid to analogous methods for binary traits. This is likely due to the comparative difficulty of dealing with binary traits within the mixed model regression frameworks that are typically used as the basis of such methods, which requires both that significant effort be made in parameter estimation and that score test-based techniques be used to evaluate associations. Compounding such difficulties is the increasing focus on performing not only marker-based studies of associations, but also gene enrichment, gene x environment, or marker x marker analyses. In this talk, we describe GLOMS (Genome-wide LOgistic mixed model / Multivariate Score test), a parallelized, computationally efficient and powerful system for performing a wide variety of association studies of binary traits with risk covariates in populations of related individuals. We provide examples of its performance in controlled simulation studies, and show how GLOMS can be applied to analyses of hypertension in the Hutterites, a population related though a dense 13-generation pedigree.
4:30-4:55PM Shaoyu Li Identify Gene-Centric Gene-Gene Interactions underlying complex traits
Abstract: Genome-wide association studies (GWAS) have discovered thousands of genetic variants associated with complex human diseases and improved our understanding of genetic architecture of diseases dramatically in the past decade. However, the mystery of "missing heritability" has exposed modern human genetics to a new challenge. Gene-gene interaction is prevailingly considered as an important contribution to the missing heritability. Despite the current available statistical and computational approaches, detecting gene-gene interactions underlying complex traits remains a big challenge. The merits of gene-centric analysis have been demonstrated while not yet been widely applied to epistasis detection studies. In this work, we propose a model based kernel machine method to identify gene-gene interactions by treating genes as testing units, which is termed gene-centric. Simulation studies were conducted to evaluate computational feasibility and statistical power of the approach. The proposed method provides a conceptual framework for identifying gene-gene interactions, which could lead to novel biological insights of the genetic architecture of complex traits.

Purdue Department of Statistics, 150 N. University St, West Lafayette, IN 47907

Phone: (765) 494-6030, Fax: (765) 494-0558

© 2023 Purdue University | An equal access/equal opportunity university | Copyright Complaints

Trouble with this page? Disability-related accessibility issue? Please contact the College of Science.