Exploring Statistical Sciences Research, Fall 2009
STAT 598V: Exploring Statistical Sciences Research
XML Feed
- Click here to view the past seminars.
- Click here to remove abstracts from this view.
Schedule
Wednesday, August 26, 2009, 04:30 PM in REC 315
Professor Mary Ellen Bock, Department of Statistics, Purdue University
Organizational Meeting
Wednesday, September 2, 2009, 04:30 PM in REC 315
Professor Lingsong Zhang, Department of Statistics, Purdue University
Some Considerations of High Dimensional Classification Problems
Classification is one of classical problems in statistics and machine learning. High dimensional data sets are very popular in many fields. Classical classification methods encounter a lot of challenges for high dimensional data sets. In this talk, we will talk some considerations of applying classification methods at this context. In particular, we will discuss support vector machine and distance weighted discrimination method for illustration. Some improved variations of these methods will also be discussed as potential solutions for some of the considerations.
Wednesday, September 9, 2009, 04:30 PM in REC 315
Assistant Professor Xiao Wang, Department of Statistics, Purdue University
Nonparametric and Semiparametric Inference with Applications in Astronomy and Reliability
This talk deals with nonparametric and semiparametric methods that arise in two major areas of application: astronomy and reliability engineering. In both of these areas, extensive amounts of data are now routinely being collected. Nonparametric and semiparametric methods are especially useful in such environments.
The first part of this talk focuses on mapping the distribution of dark matter in galaxies close to the Milky Way. Current estimates are that majority of matter in the universe is dark, and its physical constitution remains a matter of controversy among astronomers. The problem of dark matter raises many philosophical and methodological questions about the process of confirming scientific hypotheses in contexts where existing theory generates a wide range of alternative explanations of the available empirical data. To address these and related questions, I will introduce a nonparametric method to estimate the dark matter distributions.
The second part of this talk focuses on degradation modeling in reliability. Traditional analysis in reliability focuses on collecting and modeling time-to-failure data. This poses difficulties in high-reliability applications where there are few failures and high degrees of censoring. Fortunately, advances in sensing technologies are making it possible to collect extensive amount of data on degradation and performance-related measures associated with systems and components. I will introduce a semiparametric likelihood method to study different types of Levy processes for degradation data.
Some other interesting theoretical and applied problems will also be discussed if I have time.
Wednesday, September 16, 2009, 04:30 PM in REC 315
Professor Bruce Craig, Department of Statistics, Purdue University
The calibration of two susceptibility tests based on interval censored data with measurement error
Drug dilution (MIC) and disk diffusion (DIA) are the two common tests used by clinicians to determine pathogen susceptibility to antibiotics. For each of these tests, two drug-specific breakpoints classify the unknown pathogen as either susceptible, intermediate, or resistant to the drug. While MIC breakpoints are largely based on the pharmacokenetics and pharmacodynamics of the drug, comparable DIA breakpoints are not as straightforward to calculate. Current Clinical and Laboratory Standards Institute (CLSI) guidelines require a scattergram of test results for numerous pathogens and the DIA breakpoints are based on limiting the classification discrepancies. This approach, however, is limited by the fact that certain experimental errors are ignored. I will discuss this error-in-variables problem and then describe a hierarchical model, which factors in the uncertainty of both tests, the drug-specific relationship between the two tests, as well as the underlying distribution of pathogens. For the drug-specific relationship between these two tests, I propose both a parametric and nonparametric model. A loss function is then used to determine the DIA breakpoints.
This is joint work with the CLSI Subcommittee on Antimicrobial Susceptibility Testing.
Wednesday, September 23, 2009, 04:30 PM in REC 315
Assistant Professor SVN Vishwanathan, Department of Statistics, Purdue University
Bundle Methods for Regularized Risk Minimization: Upper and lower bounds
Machine learning poses data driven optimization problems. Computing the function value and gradients for these problems is challenging because they often involve thousands of variables and millions of training data points. Therefore, a lot of recent research has focused on designing specialized optimization algorithms for such problems. In this talk, I will present a high level overview of one such algorithm that we recently developed. Our algorithm BMRM (Bundle Methods for Regularized Risk Minimization) not only has good practical performance but also sports strong theoretical guarantees, which I will discuss. The talk will be broadly accessible and will have plenty of fun pictures and illustrations!
Wednesday, September 30, 2009, 04:30 PM in REC 315
Professor R. W. Doerge, Department of Statistics and Department of Agronomy, Purdue University
Statistics, Epigenomics, and Cancer
Epigenomics is the study of heritable changes in genome function that occur in the absence of changes to the DNA sequence itself, and is considered by many to be the second-code of instruction that affects gene activity. Although, this change in behavior of DNA is heritable, it is not well understood by the scientific and medical communities. To date it has been well established that some genes have methyl (chemical) groups attached to their DNA complex, while others do not. A change in methylation status may modify the behavior of a gene, and as a result it may modify the proteins that a gene is responsible for producing. These methylation changes have been linked to known cancer events.
Initially, we studied epigenomic changes in the model plant organism Arabidopsis1 using DNA microarray technology. Proof of concept for this approach was established analytically using statistical methods based on linear models that tested hypotheses of both methylation changes and histone modification changes between two Arabidopsis mutants known to be different in their heterochromatic structure. Although there are similarities (e.g., array, dye, treatment effects) in the statistical models that are used to test differential (gene) expression changes, testing for epigenomic modifications is quite different, and is a good example of when testing the incorrect hypothesis will lead to the wrong conclusion.
Based on the promising results from our studies in plants, we have also investigated epigenomic changes in humans2, as related to cancer. Using a unique microarray platform for cytosine methylation profiling, the DNA methylation landscape of the human genome was monitored at numerous sites across the human genome. Specific targets displayed cell line and tumor specific differential methylation when compared with normal brain samples, suggesting they may have utility as biomarkers. Both novel and well established statistical analyses have enabled epigenomic investigations by identifying loci associated with carcinogenesis.
References:
- Z. Lippman, A.-V. Gendrel, M. Black, M. Vaughn, N. Dedhia, W.R. McCombie, K. Lavine, V. Mittal, B. May, K. Kasschau, J.C. Carrington, R.W. Doerge, V. Colot, and R. Martienssen. 2004. Transposable elements mediate heterochromatin and epigenetic control. Nature. 430:471-476.
- J. M. Ordway, J.A. Bedell, R.W. Citek, A. Nunberg, A. Garrido, R. Kendall, J.R. Stevens, D. Cao, R.W. Doerge, Y. Korshunova, H. Holemon, J. D. McPherson, N. Lakey, J. Leon, R.A. Martienssen and J.A. Jeddeloh. 2006. Comprehensive DNA methylation profiling in a human cancer genome identifies novel epigenetic targets. Carcinogenesis. 27(12):2409-2423.
Wednesday, October 7, 2009, 04:30 PM in REC 315
Stephen J. Ruberg PhD, Adjunct Professor of Statistics Purdue University and Senior Research Fellow Eli Lilly & Company
Where’s Waldo
Finding the Right Patients for Drug Therapy
The genomics revolution is still in its infancy, and there is much to learn about how and why individual patients respond to different drug treatments in different ways. In addition to genetic mechanisms, there are many clinical markers (disease history, standard laboratory measures, etc.) as well as social/environmental factors (smoking habits, marital status, etc.) that can be used to aid in identifying who may respond or not respond to a particular treatment. This issue has some considerable statistical complexity, and different approaches to the design and analysis of clinical trials may yield more interesting insights into the problem. Several novel approaches or applications of statistical methods will be discussed and real-life examples will be used to demonstrate various aspects of tailoring.
Wednesday, October 14, 2009, 04:30 PM in REC 315
Assistant Professor Sergey Kirshner, Department of Statistics, Purdue University
Copulas for Learning from High-Dimensional Data
In many applications, data sets consist of multivariate real-valued observations (e.g., financial time-series, atmospheric observations). Whether the task is to build a generative model for such data or to estimate the dependence between variables, one of the common problems is that the functional form of the dependence is not known. Copulas, multivariate distributions with uniform on [0,1] marginals, provide a flexible framework for dealing with such problems.
In the seminar, I will give a brief overview of my research while focusing on several projects related to copulas. The applications range from separation of speech signals to modeling of rainfall time series to characterization of droughts.
Wednesday, October 21, 2009, 04:30 PM in REC 315
Assistant Professor Jian Zhang, Department of Statistics, Purdue University
An Introduction to Statistical Learning Theory
The goal of statistical learning theory is to study the statistical properties of learning algorithms. In particular, most results are in the form of generalization error bounds, from which consistency and convergence rates might be derived. I will first introduce the basic techniques and concepts and then show how to develop both data-independent and data-dependent generalization error bounds of learning algorithms. The results will also be extended to study regression problems. In the later part of the talk I will give a brief overview of my recent research on statistical machine learning, in particular some specialized learning problems and their applications.
Wednesday, November 4, 2009, 04:30 PM in REC 315
Assistant Professor Guang Cheng, Department of Statistics, Purdue University
On Semiparametric Inference
Semiparametric modelling is an excellent framework due to its flexibility to model some features parametrically without making assumptions on the other features. The infinite-dimensional nuisance parameter in the semiparametric models generally poses several challenges for making maximum likelihood inference for the parameter of interest at both theoretical and methodological levels. In this talk, in order to avoid those challenges, I first talk about several different ways to do semiparametric inference, i.e. MCMC, Bootstrap and Numerical methods, and their higher order theoretical properties. Next I will focus on Some special interesting topics, i.e. variable selection and isotonic regression, in the semiparametric models. The related future research directions are also mentioned.
Wednesday, November 11, 2009, 04:30 PM in REC 315
Assistant Professor Mark Ward, Department of Statistics, Purdue University
Don't Miss The Forest For The Trees
We discuss why and how stochastic sequences are often changed into trees for applications and for theoretical analysis. In particular, we look at the relationships between patterns in sequences and subtrees. The methods for the analysis of trees and sequences usually include generating functions, singularity analysis, and several types of transforms. The analysis also usually requires a precise description of correlations (including autocorrelations) in sequences.
The speaker will also give a very brief overview of his other research interests, and he will briefly trace his path to his current position at Purdue.
Wednesday, November 18, 2009, 04:30 PM in REC 315
Assistant Professor Olga Vitek, Department of Statistics, Purdue University
An Introduction to Statistical Methods for Proteomic Profiling of Disease
The talk will introduce the general area of profiling of disease with mass spectrometry-based proteomic experiments Multiple steps of these investigations require statistical analysis and expertise. These include experimental design, identification and quantification of spectral features, protein quantification from multiple features, integration of spectral data with patient's clinical characteristics, and planning subsequent experiments.
I'll describe statistical approaches currently used with these experiments, and will give examples of research in these areas by students in my lab.
Associated reading:
O. Vitek. Getting started in computational mass spectrometry-based proteomics. PLoS Computational Biology, 5(5), 2009.
Wednesday, November 25, 2009,
No Exploring Statistical Sciences Seminar
Wednesday, December 2, 2009, 04:30 PM in REC 315
Assistant Professor Bo Li, Department of Statistics, Purdue University
Statistics in Environmental and Atmospheric Sciences
Statistical methods have been widely used in environmental and atmposheric science problems. I will talk about four such examples as below.- Calibrating NexRad data using rain gauge data
- Climate regionalization by clustering the temperature time series
- Paleoclimate Reconstruction using Bayesian hierarchical models
- Attenuation effect of measurement error in temperature reconstruction
Wednesday, December 9, 2009, 04:30 PM in REC 315
Assistant Professor Jose Figueroa-Lopez, Department of Statistics, Purdue University
Accurate asset price modeling and related statistical problems under microstructure noise