Olga Vitek

Written by: Andrea Rau, Ph.D. candidate in Statistics

Photo of Olga Vitek
Olga Vitek
Photo of Melissa Key
Melissa Key
Photo of Zuoyi Zhang
Zuoyi Zhang
Photo of Cheng Zheng
Cheng Zheng

Understanding the mechanisms at work within a living organism is a complicated task. To answer this question many scientists have focused on the genetic makeup of an organism. However, to gain a full understanding of these complex processes, it is equally important to characterize all the molecules present in the organism, and in particular proteins expressed by the genes. Proteomics refers to the study of the structures and functions of the proteins produced by an organism during its life.

In recent years, many technological advances have been made in the area of proteomics, and the field of mass spectrometry-based proteomics has gained great visibility. However, statistical analysis of the massive amount of data generated by proteomic experiments is currently a bottleneck to many projects. Although many chemists and computer scientists work with proteomic data, few statisticians have become involved in the field, and there is still a great need for capable statisticians to resolve the existing problems. To this end, Professor Olga Vitek, with her graduate students Melissa Key, Zuoyi Zhang, and Cheng Zheng, has focused her research on resolving some of the challenging statistical issues that are present in this promising field.

Professor Vitek and her students work in close collaboration with Dr. Susanne Ragg at the Indiana University School of Medicine under a pilot grant for IUSM/PU Collaboration in Biomedical Research to apply clinical proteomics to study cardiovascular disease in people. A genomic approach to such an analysis would require a tissue sample from each patient, which is unobtainable for a condition like heart disease; in contrast, the proteomic approach allows researchers to repeatedly take a drop of blood from each patient, and produce a continuous observation of disease progression without waiting for major manifestations of a problem. The ultimate goal of such research is the possibility of "personalized medicine", where treatment could be targeted specifically to each patient.

In proteomic studies such as this, researchers wish to address two primary questions: the identification of expressed proteins, as well as the quantification of their abundance. Because proteins are not always abundant, it is particularly difficult to determine the identity and the amount of low abundant proteins. Statistical methods controlling for multiple comparisons and allowing sensitive detection of changes in protein abundance have been helpful in resolving some of these issues, but a great deal of work remains to be done. In addition, partitioning the patients into smaller and more homogeneous groups is often problematic, as many patients' characteristics affect their response to the treatment in addition to their molecular profiles. As such, statisticians must carefully consider how best to accurately stratify patients. For more information about the critical issues involved in the statistical analysis of data generated in mass spectrometry-based proteomics experiments, please see Nature Methods - 4, 787 - 797 (2007) Published online: 27 September 2007; | doi:10.1038/nmeth1088.

Figure 2

Figure 2 from Nature Methods - 4, 787 - 797 (2007) Published online: 27 September 2007; | doi:10.1038/nmeth1088
Statistical analysis of large-scale datasets of peptide assignments. In the target-decoy strategy (left), all spectra from the entire experiment are searched against a composite target plus decoy database, and then the numbers of matches to decoy peptides are used to estimate the false discovery rate (FDR) resulting from filtering the data using various score thresholds. In the probabilistic mixture-modeling approach (right), the most likely distributions among correct (red curve) and incorrect (blue curve) peptide assignments are fitted to the observed data (histogram). A probability is computed for each peptide assignment in the dataset, which can then be used to estimate the FDR.

Professor Vitek obtained her Ph.D. from the Department of Statistics at Purdue in 2005, and she joined the Department of Statistics in 2006 as an Assistant Professor. Her research interests are in statistical and computational methods for molecular biology, in particular in computational mass spectrometry-based proteomics. Much of her work involves close collaborations with biologists, chemists, clinicians and computer scientists. For more information on Professor Vitek, please visit her homepage.

November 2007