Hao Zhang

Written by: Andrea Rau, Ph.D. candidate in Statistics

Hao Zhang

Hao Zhang

Spatial statistics refers to a branch of statistics which confronts many of the issues that arise in analyzing data with a space or time component, such as measurements collected in different locations. Spatial statistics is one of the most rapidly growing areas of statistics, and it has applications in a great number of fields, including climatology, geophysics, geology, natural resources, agriculture, health sciences, economics, and marketing. Because spatial data are correlated and often very large, many interesting statistical problems exist when working with such data. To deal with some of these issues, Professor Hao Zhang is currently researching innovative and efficient methodological and computational approaches to spatial data analysis.

Under a three-year National Science Foundation (NSF) grant ("Spatial and spatio-temporal processes: Asymptotics, misspecification and multivariate extension"), Professor Zhang is working to address the challenge of statistical inference in spatial data by developing computationally feasible and statistically efficient estimation procedures, in both univariate and multivariate spatial data.

In particular, Professor Zhang's work involves developing appropriate infill or fixed domain asymptotic results for the evaluation of approximation methods for spatial data. This is useful because likelihood based inferences (Bayesian or MLE) involve the inverse of a large covariance matrix, which is hard or even impractical to compute when the sample size is extremely large. Professor Zhang expects his results to "make more accessible and feasible the analysis of huge spatial and spatial-temporal data to scientists in broader disciplines", which will better enable scientists to retrieve significant information from this type of data. He is currently developing methods for spatial interpolation that avoid operations on large covariance matrix but yield nearly optimal linear unbiased prediction.

Professor Zhang eventually hopes to apply this methodology to real-world applications, such as his work with the Public Agricultural Weather System (PAWS) in Washington state. The PAWS project was the first true real-time weather station network in the nation, and it involves 60 weather stations throughout the state. Each station collects 33 variables hourly every day, including solar radiation, humidity, wind speed, air temperature, and soil temperature. The analysis of the resulting spatial-temporal data is very complicated, due to the massive nature of the data as well as its correlation structure. The data are spatially auto-correlated (the relationship between a variable measured at two different stations), cross-spatially correlated (the relationship between two different variables at two different stations), linearly correlated (the relationship between two different variables at the same station), and temporally correlated (the relationship between time points). As such, this project provides "desirable multivariate space-time data to employ the multivariate spatio-temporal models and the approximate inferences" that Professor Zhang is currently developing.

Professor Zhang joined the Department of Statistics in the fall of 2007. His many research interests include spatial generalized linear mixed models, infill asymptotics, multivariate spatial statistics, and the applications of spatial statistics to environmental, agricultural, and natural resources sciences. He has taken part in many collaborative efforts with agricultural scientists on projects ranging from crop insurance to plant pathology. He is currently developing a collaborative relationship with faculty from multiple units at Purdue, including the Departments of Forestry and Natural Resources, Agricultural Economics, the Center for the Environment, and Purdue Climate Change Research Center. For more information on Professor Zhang, please visit his homepage.

Statistical Geospatial Modeling Enables Precision Agriculture

Clockwise: (a) Weather stations across WA. The area inside the green rectangle is the study area in this work. (b) Scatter plot of soil temperature versus air temperature. (c) Contour map of interpolated air temperature. (d). Contour map of interpolated soil temperature.

Clockwise: (a) Weather stations across WA. The area inside the green rectangle is the study area in this work. (b) Scatter plot of soil temperature versus air temperature. (c) Contour map of interpolated air temperature. (d). Contour map of interpolated soil temperature.