Chuanhai Liu

Written by: Allison Cummins, M.S. candidate in Statistics

Chuanhai Liu

Chuanhai Liu

Professor Chuanhai Liu enjoys working in many different areas of statistics including foundations of statistical inference, model building, and computational statistics.

Many statisticians agree inference regarding the unknown mean µ of the Gaussian model N(µ, 1) with unit variance from a single observation X is straightforward. The problem is more interesting and challenging if there is additional information available, such as 0 ≤ µ ≤ 1 or µ ~ N(0, τ2) with τ2 unknown and to be inferred. A careful look at these and related problems is to be available soon in a technical report prepared by Professor Liu and his collaborators Ph.D. candidate Duncan Leaf and Jun Hui. Professor Liu believes that a sound inferential theory is needed to resolve contemporary statistical problems that arise when working with massive data.

Since he joined the Department of Statistics at Purdue in August, 2005, Professor Liu has been collaborating with Professor Arthur P. Dempster at Harvard University on the Dempster-Shafer (DS) theory of belief functions, a successor of Fisher's fiducial inference. During the past three years, he and his associates at Purdue proposed the methods of Weak Belief (WB) and Maximal Belief (MB) for DS inference. The current versions of the methods are made available in the two technical reports: Dempster-Shafer Inference with Weak Beliefs by Liu and Ph.D. candidate Jianchun Zhang (2008) and Situation-Specific Inference using Dempster-Shafer Theory by Liu, and Ph.D. candidates Ryan Martin and Jianchun Zhang (2008). Professor Liu and his collaborators are considering MB approaches to a number of classical problems such as nonparametric one-sample significant test, linear regression and variable selection, inference with multivariate normal models for continuous data and multinomial models for discrete data, and constrained statistical inference.

Professor Liu has tremendous interest in statistical computation, including optimization and stochastic simulation, which are key to carrying out statistical inference. He and his collaborators developed two major extensions of the famous Expectation Maximization (EM) algorithm, namely, the ECME algorithm (Liu and Rubin, 1994) and the PX-EM algorithm (Liu, Rubin, and Wu, 1998). Additionally, Liu and collaborator, Scott Vander Wiel, Los Alamos National Laboratory, (Liu and Vander Wiel, 2007) made some contributions to resolving two fundamental problems in the Quasi-Newton algorithm. Their statistical consideration of the optimization problem led to a new Quasi-Newton algorithm, called Statistical Quasi-Newton, which outperforms the well-known BFGS algorithm.

EM algorithms use iterations of "imputed" complete data to find the Maximum Likelihood Estimate (MLE) of an unknown parameter in the sampling model for the observed data. For some closed forms of this optimization problem, the unknown parameter can be obtained. For models that do not have a closed form, optimization is not as straight-forward. For example, the MLE of the parameter µ in N(µ, 1) is the sample mean when the sample x1,...,xn is fully observed. When some of the data points are not fully observed, e.g., when an interval is censored, the MLE of µ does not have a closed form solution. In this case, the EM algorithm can be implemented to iteratively "impute" the incomplete data and update the estimate of µ.

In each step the model is made simpler by recovering more missing data; however, the convergence is slower with larger amounts of missing data. Variations of EM algorithms developed by Professor Liu and his collaborators are used to try to make the iterative process converge at a faster rate. After joining Purdue's Department of Statistics, Professor Liu continued to work in the EM area with one of his Ph.D. students, Yunxiao He. They extended (1) Liu's method (Liu, 1998) for computing the variance-covariance matrix associated with MLE of parameters, and (2) the ECME algorithm to dynamically construct ECME partitions for faster convergence.

Technical advances in Markov Chain Monte Carlo (MCMC) methods have made Bayesian methods more popular than ever. Along with a sensible prior distribution for the unknown parameter in a sampling model for the observed data, Bayesian inference is carried out by first computing the posterior distribution of the unknown parameter given the observed data. Typically, inference amounts to looking at lower dimensional space, specifically, the marginal distributions which are found using integration. Because calculating the integral is not always straightforward, the Monte Carlo method, which takes random draws from the distribution in question, is used to approximate the integral. Earlier work of Professor Liu on MCMC focused on variations of the Gibbs sampler, which can be viewed as stochastic versions of variations to the EM algorithm. For example, motivated by the PX-EM algorithm, Professor Liu proposed the Alternating Subspace-Spanning Resampling algorithm (Liu, 2003) to accelerate MCMC simulation.

Ideally, independent draws from a distribution are desired; however, it is difficult to take independent draws from the same distribution, and thus a Markov Chain Monte Carlo method is used to simulate draws by walking in a parameter space to obtain dependent points. Because these points are dependent, one would need more draws than if they were independent; however, both can be used to estimate the integral.

Recently, Professor Liu has been working with Ph.D. candidate Andrew Lewandowski to investigate the MCMC method. When independent draws are taken from a distribution, one can analyze how accurate the approximation is and can easily compute the expected value and variance. When dependent draws are taken, the variance is more difficult to compute because the draws are correlated. Currently, most algorithms look only at the last iteration to find the next observation and fail to use the past sequence of iterations. Professor Liu and Andrew Lewandowski are currently working on deriving a faster, more efficient, and reliable approximation to integration. They are striving to use the previously collected data points to obtain a better observation in the next iteration of the algorithm and create an IID sample from the target distribution at the point the algorithm converges.

Professor Liu obtained his M.S. degree from Wuhan University, China in Probability and Statistics and his M.A. and Ph.D. in Statistics from Harvard University. Professor Liu joined Purdue University in the fall of 2005 after working 10 years in the statistics and data mining research group at Bell Labs, Murray Hill, NJ. While Professor Liu continues to do research here at Purdue, he has enjoyed the opportunity to teach statistics to students from different backgrounds. He currently teaches Computational Statistics (STAT 598D), and Theory of Linear Models Analysis of Experimental Design (STAT 553). He has also formed an informal center for collaborative research on foundations of statistical inference with his students, Prof. Glenn Shafer at Rutgers University, and Prof. Arthur P. Dempster at Harvard University. For more information about Professor Liu, please visit his homepage.

The Dempster-Shafer (DS) theory

  1. Dempster, A. P. (2008). The Dempster-Shafer calculus for statisticians. International Journal of Approximate Reasoning 48, 365-377. [Preprint]

  2. Edlefsen, P. T., Liu, C., and Dempster, A. P. (2008). Estimating the mass of the Higgs particle using Dempster-Shafer analysis

Weak and Maximal Beliefs

  1. Liu, C. and Zhang, J. (2008). Dempster-Shafer Inference with Weak Beliefs. [Preprint]

  2. Liu, C., Martin, R., and Zhang, J. (2008). Situation-Specific Inference using Dempster-Shafer Theory. [Preprint]

Markov Chain Monte Carlo Methods

  1. Liang, F., Liu, C. and Carroll, R.J. (2007). Stochastic approximation in Monte Carlo computation. JASA, 102, 305-320

  2. Lewandowski, A. and Liu, C. (2008). The sample Metropolis-Hastings algorithm. JSM 2008, Denver, CO.

Optimization Methods

  1. Liu, C. and Rubin, D. B. (1994). The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 81, 633-648.

  2. Liu, C., Rubin, D. B., and Wu, Y. (1998). Parameter expansion to accelerate EM: the PX-EM algorithm, Biometrika 85, 755-770

  3. Liu, C. and Vander Wiel, S. (2007). Statistical Quasi-Newton: A new look at least change, SIAM Journal on Optimization, 18, 1266-1285.

February 2009