Title: "Methods for protein identification and quantification from tandem mass spectrometry data"
Speaker: Predrag Radivojac, School of Informatics, Indiana University
Place: Mechanical Engineering (ME) 161; March 25, 2008, Tuesday, 4:30pm


Shotgun proteomics refers to the use of bottom-up proteomics techniques in which the protein content in a biological sample mixture is digested prior to separation and mass spectrometry analysis. In this talk, I will address our approaches to two of the major challenges in shotgun proteomics, protein identification and label-free protein quantification. We proposed a new concept of peptide detectability and showed that it could be an important factor in explaining the relationship between a protein's quantity and the peptides identified from it in a high-throughput proteomics experiment. We define peptide detectability as the probability of observing a peptide in a standard sample analyzed by a standard proteomics routine and argue that it is an intrinsic property of the peptide sequence and neighboring regions in the parent protein. To test this hypothesis we first used publicly available data and data from our own synthetic samples in which quantities of model proteins were controlled. We then applied machine learning approaches to demonstrate that peptide detectability can be predicted from its sequence and the neighboring regions in the parent protein with satisfactory accuracy. The utility of this approach for protein quantification is demonstrated by peptides with higher detectability generally being identified at lower concentrations over those with lower detectability in the synthetic protein mixtures. These results establish a direct link between protein concentration and peptide detectability. In our second approach, we used peptide detectability to address a major challenge in shotgun proteomics, that of the assignment of identified peptides to the proteins from which they originate, referred to as the protein inference problem. Redundant and homologous protein sequences present a challenge in being correctly identified, as a set of peptides may in many cases represent multiple proteins. One simple solution to this problem is the assignment of the smallest number of proteins that explains the identified peptides. However, it is not certain that a natural system should be accurately represented using this minimalist approach. We propose a reformulation of the protein inference problem by utilizing peptide detectability. We also propose a heuristic algorithm to solve this problem and evaluate its performance on synthetic and real proteomics data. In comparison to a greedy implementation of the minimum protein set algorithm, our solution that incorporates peptide detectability performs favorably.

Associated Reading:

Tang H, Arnold RJ, Alves P, Xun Z, Clemmer DE, Novotny MV, Reilly JP, Radivojac P. A computational approach toward label-free protein quantification using predicted peptide detectability.: Bioinformatics. 2006 Jul 15;22(14):e481-8.

Alves et al. Advancement in protein inference from shotgun proteomics using peptide detectability. Pacific Symposium on Biocomputing 12:409-420(2007).

Click here for a full schedule of BIOINFORMATICS SEMINARS, past and present.