Session 014 - Department of Statistics - Purdue University Skip to main content

Recent Advances in Biostatistics

Organizer: Faming Liang, Distinguished Professor of Statistics, Purdue University

 

Speakers

  • Kai Yu, Senior Investigator, Biostatistics Branch, National Institutes of Health
  • Jian Zhang, Professor of Statistics, Programme Lead for Financial Mathematics, Statistics Research Group Lead, University of Kent
  • Nan Lin, Professor, Department of Mathematics and Statistics, Washington University in St. Louis
  • Bochao Jia, Senior Research Scientist, Eli Lilly and Company

Speaker Title
Kai Yu A general statistical framework for integrating individual-level and summary-level data

Abstract:  Statistical inferences are typically based on detailed individual-level data. As an increasing amount of summary data becomes available from published literature and disease registries, there is a need for a new statistical framework to integrate all relevant information from both summary-level and individual-level data. This talk introduces a framework that integrates individual-level data from an "internal" study with summary data from "external" studies. The framework can incorporate summary data from various working models adopted by multiple external studies. Theoretical and empirical results demonstrate the advantages of this procedure, providing a powerful tool for making more robust and generalizable statistical inferences by combining information from multiple sources.

Jian Zhang Modelling Loss of Complexity in Intermittent Time Series

Abstract: Physiologic systems are regulated by interacting control mechanisms that operate in various spatial and temporal ways. The outputs of these systems often exhibit complex fluctuations that are not simply due to noise but contain structural information about the underlying dynamics. Quantifying complexity, the amount of fluctuations in a physiologic system, has attracted significant research attention over the past decades. The present study developed a nonparametric approach called nCp for modelling loss of complexity in intermittent time series. The nCp technique consists of two steps. First, a nonlinear autoregressive model and its complexity for each time series segment were obtained using a Bayesian Information Criterion (BIC). Next, change-points in complexity were detected in these models by using the pruned exact linear time (PELT) method of Killick et al.(2012). Using simulations and compared to the popular method ApEN, the nCp’s performance was assessed for its 1) ability to localise complexity change-points in intermittent time series; 2) ability to faithfully estimate underlying nonlinear models; 3) robustness to different SNR conditions.  The performance of the proposal was then examined in a real analysis of fatigue-induced changes in the complexity of human motor outputs in physical exercises. The results demonstrated that the proposed method substantially outperformed the ApEn in accurately detecting complexity changes in intermittent time series segments. The results also highlighted the problem of distorted time-series segmentation in the existing methods. This is based on a joint work with Drs Li, Winters and Burnely.

Nan Lin Distributed quantile regression for longitudinal big data

Abstract: Weighted quantile regression (WQR) is an effective tool for analyzing longitudinal data with heterogeneity, especially for its mild distributional requirement on the data. For small or moderate data, the WQR estimation problem is traditionally solved by linear programming algorithms, such as the interior point (IP) method. However, when applied to big data, especially high-dimensional big data, the IP method is often computationally too expensive or infeasible due to its full matrix factorization in every iteration. We propose a parallel algorithm, WQR-ADMM, for WQR in distributed longitudinal big data based on the multi-block alternating direction method of multipliers, and establish its convergence property. Simulation studies demonstrate that WQR-ADMM is faster than IP in big data, particularly for the cases where the dimension p is large, and has favorable estimation accuracy in both non-distributed and distributed environments. We further illustrate the practical performance of WQR-ADMM by analyzing a Beijing air quality data set.

Bochao Jia A Graphic-Embedded Neural Network Method for Target Identification in Autoimmune Diseases

Abstract: High-throughput methods such as RNA-Seq has been shown to be useful to understand the disease mechanisms. However, such gene expression data are so complicated, and the high dimensionality of such data typically posed limitation on using the traditional modeling for identifying the target genes, which are highly important to the disease. Recent development allowed deep learning to overcome such limitations and allowed prediction modelling between gene expressions and disease activities using all genes expressed simultaneously. However, the traditional deep learning approach will ignore the biological basis in modeling. To tackle this problem, we proposed a Graphic-Embedded Neural Network (GENN) approach to better understand the disease-specific gene associations and also identify the potential targets for the disease. The proposed GENN contains four steps of analysis. First, we applied a Graph Embedding Deep Feedforward Network (GEDFN) to integrate the known gene regulatory network structures into the deep neural network architecture. The goal of GEDFN is to obtain the weight matrix between each pair of the associated genes. Then, a pruning strategy were used to remove some weak connections among the associated genes based on the weight matrix in order to obtain a disease-specific gene network. Next, we employed the Random Walk with Restart (RWR) method given the disease-specific gene network structure to calculate the proximities of the target gene to all the rest genome. Finally, a meta-analysis was conducted by the weighted sum of the proximities and known gene utilities across the whole genome. The final output of this step will be the ranking for all genes for a particular disease. A simulation study will be given to evaluate the performance of our proposed algorithm and a real example of Lupus trial data demonstrates the feasibility of applying GENN on gene expression data and baseline disease activities for identifying Systemic lupus erythematosus (SLE) targets.

Purdue Department of Statistics, 150 N. University St, West Lafayette, IN 47907

Phone: (765) 494-6030, Fax: (765) 494-0558

© 2023 Purdue University | An equal access/equal opportunity university | Copyright Complaints

Trouble with this page? Disability-related accessibility issue? Please contact the College of Science.