Since the advent of high-throughput DNA sequencing technologies, the ever-increasing rate at which genomes
have been published has generated new challenges notably at the level of genome annotation. Even if gene predictors and
annotation softwares are more and more efficient, the ultimate validation is still in the observation of predicted gene product(
s). Mass-spectrometry based proteomics provides the necessary high throughput technology to show evidences of protein
presence and, from the identified sequences, confirmation or invalidation of predicted annotations. We review here
different strategies used to perform a MS-based proteogenomics experiment with a bottom-up approach. We start from the
strengths and weaknesses of the different database construction strategies, based on different genomic information (whole
genome, ORF, cDNA, EST or RNA-Seq data), which are then used for matching mass spectra to peptides and proteins.
We also review the important points to be considered for a correct statistical assessment of the peptide identifications. Finally,
we provide references for tools used to map and visualize the peptide identifications back to the original genomic
Keywords: Proteogenomics, databases, bioinformatics, gene annotation, mass-spectrometry, proteomics.
Rights & PermissionsPrintExport