Plenary Speakers | Jun 20-23

Date & Time	Speaker	Title
June 20th (Wednesday) 6:00-7:00 p.m. Fowler Hall, Stewart Center	Michael Newton (University of Wisconsin, Madison)	Why Don't We Agree? Studying Influenza With RNA Interference
	Abstract Recent genome-wide RNAi studies of influenza exhibit very little agreement in terms of their reported lists of genes whose inactivation changes a cell's ability to reproduce virus. I will report on an investigation of the factors that affect among-study agreement, and will focus on results of a model-based statistical analysis that measures the relative contributions of false positive factors (e.g. off-target effects) and false negative factors (e.g. inaccessibility). I will also discuss evidence from gene set analysis and the analysis of protein-protein interactions.

June 21st (Thursday) 6:00-7:00 p.m. Fowler Hall, Stewart Center	Jim Berger (Duke University)	Reproducibility of Science: P-values and Multiplicity
	Abstract Published scientific findings seem to be increasingly failing efforts at replication. This is undoubtedly due to many sources, including specifics of individual scientific cultures and overall scientific biases such as publication bias. While these will be briefly discussed, the talk will focus on the now near ubiquitous use of p-values and the failure to properly take into account multiplicities as two likely factors causing the lack of reproducibility. The Bayesian approaches to both testing and multiplicity will be highlighted as possible general solutions to the problem.

June 22nd (Friday) 12:15-1:15 p.m. STEW 206	John Lafferty (University of Chicago)	Variations on Nonparametric Additive Modeling: Computational and Statistical Aspects
	Abstract Research over the last several years on sparse linear models has led to rich statistical theory and new computational methods. We have been studying extensions to nonparametric additive models for high dimensional data. For regression, graphical modeling, reduced rank regression, and canonical correlation analysis, variants of nonparametric additive models have been developed that scale to fairly high dimensions, with little loss in statistical or computational efficiency. In many cases the one dimensional nonparametric rates of convergence are achieved, with even faster rates obtainable for some structure estimation problems. In terms of computation, convex optimization can sometimes be used, but often does not scale well. Infinite dimensional optimization with variants on backfitting yields the most flexible and scalable algorithms. We give an overview of these developments, together with some of the gaps in our current understanding.

June 22nd (Friday) 6:00-7:00 p.m. Fowler Hall, Stewart Center	Maria Eulália Vares (Universidade Federal do Rio de Janeiro)	Dependent Percolation: Some Examples and Multi-scale Tools
	Abstract Percolation models with long range dependence appear naturally in a variety of contexts that includes statistical mechanics spin systems and growth processes in random environment. In this lecture I plan to discuss some recent developments, with emphasis on the role of multi-scale tools in this study.

June 23rd (Saturday) 12:15-1:15 p.m. STEW 206	Peter Hall (University of Melbourne, Australia)	Distribution Approximation, Roth's Theorem, and Looking for Insects in Shipping Containers
	Abstract Methods for distribution approximation, including the bootstrap, do not perform well when applied to lattice-valued data. For example, the inherent discreteness of lattice distributions confounds both the conventional normal approximation and the standard bootstrap when used to construct confidence intervals. However, in certain problems involving lattice-valued random variables, where more than one sample is involved, this difficulty can be overcome by ensuring that the ratios of sample sizes are quite irregular. For example, at least one of the ratios of sample sizes could be a reasonably good rational approximation to an irrational number. Results from number theory, in particular Roth's theorem (which applies to irrational numbers that are the roots of polynomials with rational coefficients), can be used to demonstrate theoretically the advantages of this approach. This project was motivated by a problem in risk analysis involving quarantine searches of shipping containers for insects and other environmental hazards, where confidence intervals for the sum of two binomial proportions are required.

June 23rd (Saturday) 6:00-7:00 p.m. Fowler Hall, Stewart Center	Larry Brown (University of Pennsylvania)	Variable Selection Insurance
	Abstract Among statisticians variable selection is a common and very dangerous activity. This talk will survey the dangers and then propose two forms of insurance to guarantee against the damages from this activity. Conventional statistical inference requires that a specific model of how the data were generated be specified before the data are examined and analyzed. Yet it is common in applications for a variety of variable selection procedures to be undertaken to determine a preferred model followed by statistical tests and confidence intervals computed for this “final” model. Such practices are typically misguided. The parameters being estimated depend on this final model, and post-model-selection sampling distributions may have unexpected properties that are very different from what is conventionally assumed. Confidence intervals and statistical tests do not perform as they should. We address this dilemma within a standard linear-model framework. There is a numerical response of interest (Y) and a suite of possible explanatory variables, X1,…,Xp. to be used in a multiple linear regression. The data is gathered, a multivariate linear model is constructed using a selected subset of the potential X variables, and inference (estimates, confidence intervals, tests) is performed for the selected slope parameters. We propose two types of insurance to guarantee against the deleterious effects of this type of variable selection. The first provides valid confidence intervals and tests based on the design matrix of the observed variables. It does not adherence to a pre-specified variable selection algorithm. This insurance may involve overly conservative procedures; but on the other hand, no less conservative procedure of this type will provide the desired insurance. The second type of insurance is purchased through use of a properly specified split-sample bootstrap. These intervals may be less conservative, but are not always so, and part of their price lies in the split-sample scheme that effectively sacrifices a portion of the data. This is joint work with R. Berk, A. Buja, E. George, E. Pitkin, M. Traskin, K. Zhang and L. Zhao.

June 23rd (Saturday) 7:00-10:00 p.m. (Banquet) South Ballroom, PMU	Sastry Pantula Director, Division of Mathematical Sciences, National Science Foundation	Celebrating Unity in Diversity