Research Profile

Dabao Zhang


















My current researches mainly focus on (1) developing supervised dimension reduction methods which help exploring and visualizing high-dimensional data; (2) building directed graphical models based on structural equations; (3) defining R2 for models beyond (homoscedastic) linear regression models. Although I am interested in addressing statistical issues in general data science, most of my current researches are motivated by analyzing data from whole-genome/sequencing-based association studies, whole-genome/sequencing-based animal/plant selection, eQTL mapping, and gene-gene/gene-environment interaction studies.

Statistical Methodologies

Bayesian Analysis, Empirical Likelihood Approach, Exploratory Data Analysis, Graphical Models, Multivariate Extreme Values, Multivariate Statistics, Supervised Dimension Reduction, Variable Selection for Large p Small n Data

1.    M. Ren, and D. Zhang (2018) Differential Analysis of Directed Network. Proceedings of the 34th Conference on Uncertainty in Artificial Intelligence (UAI), 2018.

2.    C. Chen, M. Ren, M. Zhang and D. Zhang (2017) Two-stage penalized least squares method for constructing large systems of structural equations. Accepted by Journal of Machine Learning Research. arXiv:1511.00370. R Package: BigSEM.

3.    D. Zhang (2017). A coefficient of determination for generalized linear models. The American Statistician, 71(4): 310-316. R Package: rsq. SAS Macro: RsquareV.

4.    V. Pungpapong, M. Zhang and D. Zhang (2015). Selecting massive variables using an iterative conditional modes/medians algorithm. The Electronic Journal of Statistics, 9, 1243-1266.

5.    Y. Lin, M. Zhang and D. Zhang (2015). Generalized orthogonal components regression for high dimensional generalized linear models. Computational Statistics & Data Analysis, 88, 119-127.

6.    M. T. Wells and D. Zhang (2011). Graphical models for clustered binary and continuous responses. In Advances in Directional and Linear Statistics (edited by M.T. Wells and A. SenGupta), 305-321, Springer-Verlag Berlin Heidelberg.

7.    N.-H. Chan, L. Peng and D. Zhang (2010). Empirical-likelihood-based confidence intervals for conditional variance in heteroscedastic regression models. Econometric Theory, 27: 1-24.

8.    M. Zhang, D. Zhang and M.T. Wells (2010). Generalized thresholding estimators for high-dimensional location parameters. Statistica Sinica, 20: 911-926.

9.    D. Zhang, Y. Lin and M. Zhang (2009). Penalized orthogonal-components regression for large p small n data. Electronic Journal of Statistics, 3: 781-796. R Package: POCRE

10.  D. Zhang, M. T. Wells and L. Peng (2008). Nonparametric estimation of the dependence function for a multivariate extreme value distribution. Journal of Multivariate Analysis, 99: 577-588.

11.  D. Zhang, M. T. Wells, B. W. Turnbull, D. Sparrow and P. A. Cassano (2005). Hierarchical Graphical Models: An Application to Pulmonary Function and Cholesterol Levels in the Normative Aging Study. Journal of the American Statistical Association, 100: 719-727.

12.  D. Zhang, S. He and Z. Xie (1993). Outlier Detection and Intervention for ARIMA(p,d,0). Proceedings of First Asian Conference on Statistical Computation.


Statistical Genetics and Bioinformatics

Analysis of Gene Expression Data, Analysis of Mass Spectrometry Data, Comparative Proteomics/Metabolomics Study, Quantitative Trait Loci Mapping, Whole-Genome/Sequencing-Based Association Study



1.    L. Guan, Q. Wang, L. Wang, B. Wu, Y. Chen, F. Liu, F. Ye, T. Zhang, K. Li, B. Yan, C. Lu, L. Su, G. Jin, H. Wang, H. Tian, L. Wang, Z. Chen, Y. Wang, J. Chen, Y. Yuan, W. Cong, J. Zheng, J. Wang, X. Xu, H. Liu, W. Xiao, C. Han, Y. Zhang, F. Jia, X. Qiao, Genetic REsearch on schizophreniA neTwork-China and Netherland (GREAT-CN), D. Zhang, M. Zhang, H. Ma (2016). Common Variants on 17q25 and Gene-Gene Interactions Conferring risk of Schizophrenia in Han Chinese Population and Regulating Gene Expression in Human Brain. Molecular Psychiatry, 2016, 1-7.

2.    C. Chen, L. Deng, S. Wei, G. A. N. Gowda, H. Gu, G. Chiorean, M. Zaid, M. Harrison, J. Pekny, P. Loehrer, D. Zhang, M. Zhang, D. Raftery (2015). Exploring Metabolic Profile Differences between Colorectal Polyp Patients and Controls Using Seemingly Unrelated Regression. Journal of Proteome Research, 14: 2492-2499.

3.    H. T. Zhang, D. Zhang, Z. G. Zha, C. D. Hu (2014). Transcriptional activation of PRMT5 by NF-Y is required for cell growth and negatively regulated by the PKC/c-Fos signaling in prostate cancer cells. BBA - Gene Regulatory Mechanisms, 1839, 1330-1340.

4.    H. Li, Y. J. Wang, L. Hua, Y. T. Yang, M. Zhang, D. Zhang, C. Y. Wang, and Z. Q. Xu (2013). Lack of association between dendritic cell nuclear protein-1 gene and major depressive disorder in the Han Chinese population. Progress in Neuro-Psychopharmacology & Biological Psychiatry, 45, 7-10.

5.    V. Pungpapong, W. M. Muir, X. Li, D. Zhang, and M. Zhang (2012). A fast and efficient approach for genomic selection with high density markers. G3: Genes, Genomes, Genetics, 2: 1179-1184.

6.    V. Pungpapong, L. Wang, Y. Lin, D. Zhang, and M. Zhang (2011). Genome-wide association analysis of GAW17 data using an empirical Bayes variable selection. BMC Proceeding, 5 (Suppl 9): S5.

7.    L. Wang, V. Pungpapong, Y. Lin, M. Zhang, and D. Zhang (2011). Genome-wide case-control study in GAW17 using coalesced rare variants. BMC Proceeding, 5 (Suppl 9): S110.

8.    X. Li, C. Zhu, Z. Lin, Y. Wu, D. Zhang, G. Bai, W. Song, J. Ma, G.J. Muehlbauer, M.J. Scanlon, M. Zhang, and J. Yu (2011). Chromosome size in diploid eukaryotic species centers on the average length with a conserved boundary. Molecular Biology and Evolution, 28: 1901-1911.

9.    D. Zhang (2010). Bayes and empirical Bayes methods for spotted microarray data analysis. In Bayesian Modeling in Bioinformatics (edited by Dey, Ghosh, and Mallick).

10.  Y. Lin, M. Zhang, L. Wang, V. Pungpapong, J.C. Fleet, and D. Zhang (2009). Simultaneous genome-wide association studies of anti-CCP in rheumatoid arthritis using penalized orthogonal-components regression. BMC Proceedings, 3 (Suppl 7): S20.

11.  M. Zhang, Y. Lin, L. Wang, V. Pungpapong, J.C. Fleet, and D. Zhang (2009). Case-control genome-wide association study of rheumatoid arthritis from GAW16 using POCRE-LDA. BMC Proceedings, 3 (Suppl 7): S17.

12.  N. Liu, D. Zhang, and H. Zhao (2009). Genotyping error detection in samples of unrelated individuals without replicate genotyping. Human Heredity, 67: 154-162 (DOI: 10.1159/000181153).

13.  D. Zhang, X. Huang, F.E. Regnier, and M. Zhang (2008). Two-dimensional correlation optimized warping algorithm for aligning GCXGC-MS data. Analytical Chemistry, 80 (8): 2664-2671.

14.  M. Zhang, D. Zhang, M. T., Wells (2008). Variable selection with large p small n regression models: mapping QTL with epistasis. BMC Bioinformatics, 9:251.

15.  D. Zhang and M. Zhang (2007). Bayesian profiling of molecular signatures to predict event times. Theoretical Biology & Medical Modelling, 4:3, doi:10.1186/1742-4682-4-3.

16.  D. Zhang, M. Zhang, and M. T. Wells (2006). Multiplicative Background Correction for Spotted Microarrays to Improve Reproducibility. Genetical Research, 87: 195-206.

17.  M. Zhang, K. L. Montooth, M. T. Wells, A. G. Clark and D. Zhang (2005). Mapping Multiple Quantitative Trait Loci by Bayesian Classification. Genetics, 169: 2305-2318.

18.  D. Zhang, M. T. Wells, C. D. Smart, and W. E. Fry (2005). Bayesian Normalization and Inference for Differential Gene Expression Data. Journal of Computational Biology, 12: 391-406.

19.  Complex Traits Consortium (2004). The Collaborative Cross: A Community Resource for the Genetic Analysis of Complex Traits. Nature Genetics, 36: 1133-1137.



Applied Statistics

Analysis of Diverse Biomedical Data

1.    J. E. Huber, M. Darling, E. J. Francis, and D. Zhang (2012). Impact of typical aging and Parkinson's disease on the relationship among breath pausing, syntax, and punctuation. American Journal of Speech-Language Pathology, 21: 368-379.

2.    T.R. Mhyre, R. Loy, P.N. Tariot, L.A. Profenno, K.A. Maguire-Zeiss, D. Zhang, P.D. Coleman and H.J. Federoff (2008). Proteomic analysis of peripheral leukocytes in Alzheimer's disease patients treated with divalproex sodium. Neurobiology of Aging, 29: 1631-1643.

3.    S. W. Perry, J. P. Norman, A. Litzburg, D. Zhang, S. Dewhurst and H. A. Gelbard (2005). HIV-1 Transactivator of Transcription Protein Induces Mitochondrial Hyperpolarization and Synaptic Stress Leading to Apoptosis. Journal of Immunology, 174: 4333-4344.

4.    M. Zhang, X. Wang, D. Zhang, G. Xu, H. Dong, Y. Yu and J. Han (2004). Orphanin FQ Antagonizes the Inhibition of Ca2+ Currents Induced by Mu-opioid Receptors. Journal of Molecular Neuroscience, 25: 21-27.


Statistical Packages

Most are developed in MATLAB. All copyrights are retained by Dabao Zhang unless stated otherwise. They are free to use for academic purpose with proper citation. Please contact me for any bugs and application issues.

        GOCRE: Implement the generalized orthogonal-component regression (GOCRE) algorithm proposed in Lin, Zhang and Zhang (2014).

        POCRE: Implement the penalized orthogonal-component regression (POCRE) algorithm proposed in Zhang, Lin and Zhang (2009), also add new functions on variable screening, tuning parameter selection, and plotting.

        2DCOW: Implement the two-dimensional correlation optimized warping algorithm proposed in Zhang, Huang, Regnier and  Zhang (2008).

        MicroBayes: Implement the approach proposed in Zhang, Wells, Smart, and Fry (2005).

        GEBCauchy: Implement the generalized empirical Bayes thresholding with Cauchy priors proposed in Zhang, Zhang and Wells (2009).

        GEBLaplace: Implement the generalized empirical Bayes thresholding with Laplace priors which is developed in a paper in preparation (see Zhang, Zhang and Wells, 2009 for GEBT).

        QTLBayes: Implement the Bayesian approach for QTL mapping proposed in Zhang, Montooth, Wells, Clark and Zhang (2005), which is extended in Zhang, Zhang, and Wells (2008) and another paper in preparation.

        SemMix: Implement the EM algorithm for mixed graphical models as described in Zhang, Wells, Turnbull, Sparrow and Cassano (2005).