Database Construction and Peptide Identification Strategies for Proteogenomic Studies on Sequenced Genomes

Author(s): Celine Hernandez, Patrice Waridel, Manfredo Quadroni

Journal Name: Current Topics in Medicinal Chemistry

Volume 14 , Issue 3 , 2014

Become EABM
Become Reviewer


Since the advent of high-throughput DNA sequencing technologies, the ever-increasing rate at which genomes have been published has generated new challenges notably at the level of genome annotation. Even if gene predictors and annotation softwares are more and more efficient, the ultimate validation is still in the observation of predicted gene product( s). Mass-spectrometry based proteomics provides the necessary high throughput technology to show evidences of protein presence and, from the identified sequences, confirmation or invalidation of predicted annotations. We review here different strategies used to perform a MS-based proteogenomics experiment with a bottom-up approach. We start from the strengths and weaknesses of the different database construction strategies, based on different genomic information (whole genome, ORF, cDNA, EST or RNA-Seq data), which are then used for matching mass spectra to peptides and proteins. We also review the important points to be considered for a correct statistical assessment of the peptide identifications. Finally, we provide references for tools used to map and visualize the peptide identifications back to the original genomic information.

Keywords: Proteogenomics, databases, bioinformatics, gene annotation, mass-spectrometry, proteomics.

Rights & PermissionsPrintExport Cite as

Article Details

Year: 2014
Page: [425 - 434]
Pages: 10
DOI: 10.2174/1568026613666131204105652
Price: $65

Article Metrics

PDF: 49