Research Colloquia, Fall 2009
XML Feed
- Click here to view the past seminars.
- Click here to remove abstracts from this view.
Schedule
Thursday, September 3, 2009, 04:30 PM in MATH 175
Professor Dabao Zhang, Department of Statistics, Purdue University
Penalized orthogonal-components regression for large p small n data
We propose a penalized orthogonal-components regression (POCRE) for large p small n data. Orthogonal components are sequentially constructed to maximize, upon standardization, their correlation to the response residuals. A new penalization framework, implemented via empirical Bayes thresholding, is presented to effectively identify sparse predictors of each component. POCRE is computationally efficient owing to its sequential construction of leading sparse principal components. In addition, such construction offers other properties such as grouping highly correlated predictors and allowing for collinear or nearly collinear predictors. With multivariate responses, POCRE can construct common components and thus build up latent-variable models for large p small n data.
This is a joint work with Yanzhu Lin and Min Zhang.
Refreshments will be served at 4:00 PM in HAAS 111.
Thursday, September 10, 2009, 04:30 PM in MATH 175
Professor Kiseop Lee, University of Louisville
A mathematical model for multi-name credit
We present a new mathematical model for a multi-name credit employing a stochastic flocking. Flocking mechanisms have been used in a variety of modeling of biological, sociological and physical aggregation phenomena. As a direct application of flocking mechanisms, we introduce a credit risk model based on community flocking for a credit worthiness index (CWI). Correlations between different credit worthiness indices are explained in terms of interaction rate as in the flocking system. Based on the flocking model for CWI, we provide a credit curve for individual names and default time distribution. We study how to price credit derivatives such as a credit default swap (CDS) and a collateralized debt obligation (CDO) with the proposed model.
Refreshments will be served at 4:00 PM in HAAS 111.
Thursday, September 17, 2009, 04:30 PM in MATH 175
Professor Bowei Xi, Department of Statistics, Purdue University
VoIP: Analysis and Modeling
Network engineering for quality-of-service (QoS) of Internet voice communication (VoIP) can benefit substantially from simulation study of VoIP packet traffic queueing, which requires accurate statistical models for packet arrivals. This article describes statistical analyses that result in validated models. Work began with a 48-hr collection of VoIP arrival times and headers of 1.315 billion packets from 332,018 calls on a link of the Global Crossing network. Modeling is based on comprehensive analysis of the marginal distributions and time dependencies of call-level properties (arrivals, durations, bit-rates, and transmission and silence intervals), and packet-level properties (timestamp accuracy, jitter, and 20-ms packet-counts). Two models result that generate packet-level traffic in one direction of a link. A semi-empirical model first generates a Poisson call-arrival process which provides the times of the first packets of the calls; the packet interarrival times for each call are those of a random sample from an empirical database of 277540 semi-calls (packets in one direction of a call). Validation of the representativeness of the semi-calls is complex. A parametric model replaces the semi-call sampling by generation of modeled call durations, generation of modeled transmission and silence intervals, and inserting 20-ms packet arrivals within transmission intervals.
Refreshments will be served at 4:00 PM in HAAS 111.
Thursday, September 24, 2009, 04:30 PM in MATH 175
Professor Peter Z. G. Qian, Department of Statistics, University of Wisconsin-Madison
Nested Space-Filling Designs
We introduce a new type of design, called nested space-filling design, for sequential integration and multi-fidelity computer modeling. Such designs are constructed by exploiting nested structures in permutations, orthogonal arrays and difference matrices. The constructed designs are also useful for solving stochastic optimization problems, including stochastic programs, the Monte Carlo EM algorithm and chance-constraint problems.
Refreshments will be served at 4:00 PM in HAAS 111.
Thursday, October 1, 2009, 04:30 PM in MATH 175
Professor S V N Vishwanathan, Department of Statistics, Purdue University
Boosting from an Optimization Perspective Upper and Lower Bounds
Boosting has become a well known ensemble method. The algorithm maintains a distribution on the binary labeled examples and a new base learner is added in a greedy fashion. The goal is to obtain a small linear combination of base learners that clearly separates the examples.
We focus on a recent view of Boosting where the update algorithm for distribution on the examples is characterized by a minimization problem that uses a relative entropy as a regularization. In particular, we concentrate on algorithms that provably maximize the soft margin when the data is noisy.
By borrowing results from optimization we show how one can prove upper and lower bounds on the number of iterations needed by these modern algorithms. We will also show how to solve large scale problems based on state of the art optimization techniques.
Joint work with Manfred K Warmuth.
Refreshments will be served at 4:00 PM in HAAS 111.
Thursday, October 8, 2009, 04:30 PM in MATH 175
Karen Kafadar, Indiana University, Rudy Professor of Statistics in the College of Arts and Sciences Indiana University, Bloomington
Measuring the effect of length biased sampling in randomized cancer screening trials
(with Philip C. Prorok, National Cancer Institute)
Length biased sampling (LBS) arises when items are sampled in proportion to their values on a random variable of interest. For example, older units may be more likely to be sampled simply because they have been in service for a longer period of time. The effect of this sampling bias on the mean is well known when the length-biased-sampled random variable, say X, is observable.
A more difficult situation arises when X is not observed, but the outcome of another random variable, say Y, is observed and is known to be correlated with X. This context arises in assessing the value of a cancer screening program: cases are identified from the screening during the preclinical phase, the duration of which is likely to be positively correlated with the clinical duration of the disease. Longer preclinical durations are more likely to be screen-detected than shorter ones, but may also have better prognosis, irrespective of screening. This situation arises with any periodic routine inspection program. We demonstrate theoretical implications and illustrate practical consequences of the LBS effect.
Refreshments will be served at 4:00 PM in HAAS 111.
Thursday, October 15, 2009, 04:30 PM in MATH 175
Juan Carlos Escanciano, Department of Economics, Indiana University, Bloomington
ASYMPTOTIC DISTRIBUTION-FREE DIAGNOSTIC TESTS FOR HETEROSKEDASTIC TIME SERIES MODELS
This article investigates model checks for a class of possibly nonlinear heteroskedastic time series models, including but not restricted to ARMA-GARCH models. We propose omnibus tests based on functionals of certain weighted standardized residual empirical processes. The new tests are asymptotically distribution-free, suitable when the conditioning set is infinite-dimensional, and consistent against a class of Pitman's local alternatives converging at the parametric rate n-1/2, with n the sample size. A Monte Carlo study shows that the simulated level of the proposed tests is close to the asymptotic level already for moderate sample sizes and that tests have a satisfactory power performance. Finally, we illustrate our methodology with an application to the well-known S&P 500 daily stock index. The paper also contains an asymptotic uniform expansion for weighted residual empirical processes when initial conditions are considered, a result of independent interest.
Refreshments will be served at 4:00 PM in HAAS 111.
Thursday, October 22, 2009, 04:30 PM in AT UIUC
Professor Guang Cheng, Department of Statistics, Purdue University
Bootstrap Consistency for General Semiparametric M-estimation
Consider M-estimation in a semiparametric model that is characterized by a Euclidean parameter of interest and a nuisance function parameter. The bootstrap is a widely used resampling method applied to draw inferences in the context of semiparametric M-estimation. We show that, under general conditions, the bootstrap is asymptotically consistent in estimating the distribution of the M-estimate of Euclidean parameter; this is, the bootstrap distribution asymptotically imitates the distribution of the M-estimate. We also show that the bootstrap confidence set has the asymptotically correct coverage probability. These general conclusions hold, in particular, when the nuisance parameter is not estimable at root-n rate. Our results provide a theoretical justification for the use of bootstrap as an inference tool in semiparametric modelling and apply to a broad class of bootstrap methods with exchangeable bootstrap weights. In this paper, we will also apply this general theory to several popular semiparametric models, e.g. Cox regression model with survival data.
Thursday, October 29, 2009, 04:30 PM in MATH 175
Doug Nychka, Director of the Institute for Mathematics Applied to Geosciences, National Center for Atmospheric Research (NCAR), Boulder, Colorado
The World of Large Spatial Data Sets
Many statistical problems such as analyzing the Earth's climate or in determining the influence of geographic location on health depend on large spatial data sets. These problems typically break standard and exact methods for spatial statistics and approximate approaches are needed. This talk will give some examples based on a suite of regional climate simulations and dense observational data sets to motivate new spatial statistics. These include some techniques for introducing sparsity into covariance models, approximate covariance models, and the use of conditional simulation for inference.
Refreshments will be served at 4:00 PM in HAAS 111.
Thursday, November 5, 2009, 04:30 PM in MATH 175
Glenn Shafer, Board of Governors Professor at the Rutgers Business School–Newark and New Brunswick and Professor in the Computer Learning Research Centre, Royal Holloway College University of London
Three Betting Interpretations of Probability
There are three important ways of interpreting the betting odds given by probabilities:- The classical interpretation: they are the correct odds.
- The subjective interpretation: they are your odds.
- The Ville interpretation: a betting strategy that uses them will not multiply the capital risked by a large factor.
The Ville interpretation clarifies the domain of application of Bayesian conditioning and extends to Walley's upper and lower probabilities and to Dempster-Shafer belief functions.
Refreshments will be served at 4:00 PM in HAAS 111.
Thursday, November 12, 2009, 04:30 PM in MATH 175
David Hunter, Department of Statistics, Pennsylvania State University
Estimation for Nonparametric Mixture Models
We present an algorithm for estimation in finite mixture models where the observations are multivariate with conditionally independent coordinates but their distributions are otherwise completely unspecified. This algorithm is an extension and modification of the recent EM-like algorithm of Bordes, Chauveau, and Vandekerkhove (2007) for univariate mixture models with symmetric components, which we will also discuss. Unlike in the univariate case, the multivariate algorithm does not necessarily require an assumption on the component density functions for the model parameters to be identifiable. We explain what is known about identifiability, show why our algorithm is more appealing than other algorithms for this problem, and discuss some remaining open questions.
Refreshments will be served at 4:00 PM in HAAS 111.
Thursday, November 19, 2009, 04:30 PM in MATH 175
Professor Lingsong Zhang, Department of Statistics Purdue University
Sparse Distance Weighted Discrimination Method
Distance Weighted Discrimination (DWD) has recently been proposed as an impressive classification method (Marron et al, 2007), and has been shown to outperform Support Vector Machine (SVM). In this paper, we first show Fisher consistency of the DWD method, which justifies its use when there are sufficient data. However, the DWD classifier is not sparse, which makes the interpretation and prediction performance less attractive. We propose a sparse DWD method, which incorporate variable selection techniques in classification using penalized loss functions to estimate the true hyperplane. We show that when an appropriate penalty is used, the sparse DWD method is consistent and the estimated normal vector of the separating hyperplane has the oracle property under suitable conditions. We evaluate the finite sample performance of the proposed methods using simulations and illustrate the methods with an application to the Faroe island proteomic biomarker data.
This is a joint work with Dr. Xihong Lin
Refreshments will be served at 4:00 PM in HAAS 111.
Thursday, November 26, 2009,
No Research Colloquium
Thanksgiving Break
Thursday, December 3, 2009, 04:30 PM in MATH 175
Andreas Argyriou, Research Assistant Professor, Toyota Technological Institute at Chicago
Multi-Task Learning and Matrix Regularization
Multi-task learning extends the standard paradigm of supervised learning. In multi-task learning, samples for multiple related tasks are given and the goal is to learn a function for each task and also to generalize well (transfer learned knowledge) on new tasks. The applications of this paradigm are numerous and range from computer vision to collaborative filtering to matrix completion to bioinformatics, while it also relates to vector valued problems, multiclass, multiview learning etc. I will present a framework for multi-task learning which is based on learning a common kernel for all tasks. I will also show how this formulation connects to the trace norm and group Lasso approaches. Moreover, the proposed optimization problem can be solved using an alternating minimization algorithm which is simple and efficient. It can also be "kernelized" and I will present a necessary and sufficient condition for such a kernelization (as well as for the classical representer theorem).
Refreshments will be served at 4:00 PM in HAAS 111.
Thursday, December 10, 2009, 04:30 PM in MATH 175
Professor Yong Bao, Department of Economics, Purdue University
Refreshments will be served at 4:00 PM in HAAS 111.