Session 09 - Department of Statistics - Purdue University Skip to main content

Recent Developments in Machine Learning

Organizer: Faming Liang

Speakers

  • Nianqiao (Phyllis) Ju, Assistant Professor of Statistics, Purdue University
  • Yiyuan She, Professor of Statistics, Florida State University
  • Duchwan Ryu, Associate Professor and Director of Graduate and Undergraduate Studies, Northern Illinois University
  • Christian Robert, Professor, Université Paris Dauphine and University of Warwick

Speaker Title
Nianqiao (Phylis) Ju Data Augmentation MCMC for Bayesian Inference from Privatized Data

Abstract: Differentially private mechanisms protect privacy by introducing additional randomness into the data. When the data analyst has access only to the privatized data, it is a challenge to perform valid statistical inference on parameters underlying the confidential data. Specifically, the likelihood function of the privatized data requires integrating over the large space of confidential databases and is typically intractable. For Bayesian analysis, this results in a posterior distribution that is doubly intractable, rendering traditional MCMC techniques inapplicable. We propose an MCMC framework to perform Bayesian inference from the privatized data, which is applicable to a wide range of statistical models and privacy mechanisms. Our MCMC algorithm augments the model parameters with the unobserved confidential data, and alternately updates each one conditional on the other. For the potentially challenging step of updating the confidential data, we propose a generic approach that exploits the privacy guarantee of the mechanism to ensure efficiency. We give results on computational complexity, acceptance rate, and mixing properties of our MCMC.

This talk is based on joint work with Jordan Awan, Robin Gong, and Vinayak Rao 

Yiyuan She Slow Kill for Big Data Learning

Abstract: Big-data applications often involve huge numbers of observations and features,   which creates new challenges for variable selection and parameter estimation. This paper presents  a novel ``slow kill''  technique that uses nonconvex constrained optimization, adaptive  $\ell_2$-shrinkage, and increasing learning rates. The fact that the problem size can decrease during the slow kill iterations makes it particularly effective for large-scale variable screening.  The interaction between statistics and optimization yields useful insights into how to control quantiles, stepsize, and shrinkage parameters in order to relax the regularity conditions that ensure the desired level of statistical accuracy. Experimental results on real and synthetic data show that slow kill outperforms state-of-the-art algorithms in various situations while being computationally efficient for large-scale data. This is joint work with Jiahui Shen and Adrian Barbu.

Duchwan Ryu

Bayesian Functional Data Analysis over Dependent Windows to Identify Differentially Methylated Regions

Abstract: We consider a Bayesian functional data analysis for observations measured as extremely long sequences. Segmentizing the sequence into a number of small windows with manageable length, the windows may not be independent especially when they are neighboring to each other. We propose a utilization of Bayesian smoothing splines to estimate individual functional patterns within each window. To address the dependent structure of windows, we establish transition models for parameters involved in each window and consider a dynamic model of functional pattern. By applying the dynamically weighted particle filter we estimate the functional patterns over all windows. Based on the estimated functional patterns the functional difference between groups of individuals at each window is evaluated by Bayes Factor and identifies windows with different functional patterns by groups. We examine the proposed method through simulation studies and apply it to identify differentially methylated genetic regions in TCGA lung adenocarcinoma data.

Christian Robert

Evidence estimation in finite and infinite mixture models and applications

Abstract: Estimating the model evidence - or mariinal likelihood of the data - is a notoriously difficult task for finite and infinite mixture models and we reexamine here different Monte Carlo techniques advocated in the recent literature, as well as novel approaches based on Geyer (1994) reverse logistic regression technique, Chib (1995) algorithm, and Sequential Monte Carlo (SMC). Applications are numerous. In particular, testing for the number of components in a finite mixture model or against the fit of a finite mixture model for a given dataset has long been and still is an issue of much interest, albeit yet missing a fully satisfactory resolution. Using a Bayes factor to find the right number of components K in a finite mixture model is known to provide a consistent procedure. We furthermore establish the consistence of the Bayes factor when comparing a parametric family of finite mixtures against the nonparametric 'strongly identifiable' Dirichlet Process Mixture (DPM) model. Joint work with Adrien Hairault (Paris Dauphine) and Judith Rousseau (Oxford)

Purdue Department of Statistics, 150 N. University St, West Lafayette, IN 47907

Phone: (765) 494-6030, Fax: (765) 494-0558

© 2023 Purdue University | An equal access/equal opportunity university | Copyright Complaints

Trouble with this page? Disability-related accessibility issue? Please contact the College of Science.