Mass spectrometry based proteomics allow us to analyze complex mixtures of proteins from various biological
samples in a high-throughput manner, in order to identify important proteomic patterns and hopefully novel disease biomarkers.
However, as most omics data, mass spectrometry proteomics data are complex, noisy and incomplete. Additionally,
the data are usually represented by relatively few samples and a very large number of predictor variables, i.e., m/z
peaks. These characteristics pose a significant challenge for most computational analysis methods and in recent literature
various alternatives have been proposed.
A typical mass spectrometry proteomics data analysis workflow consists of two major steps: preprocessing and higher
level analysis. In the recent years, a wide range of algorithms have been proposed for both, varying from classical approaches
to second generation algorithms. Many of the proposed algorithms have been reported to produce encouraging
results. However, no common strategy has emerged as a method of choice and for each dataset different algorithms produce
different results, making the evaluation of the algorithms practically impossible.
This work provides a critical review of the recent approaches for both preprocessing and higher level analysis of proteomics
data. The strengths and limitations of each method are also presented and emphasis is given on describing the
most common and serious mistakes recorded in published differential proteomics studies. Moreover, the review provides
guidance for choosing and correctly applying the appropriate algorithms according to our experience and hints for the design
of novel algorithms, which will more effectively handle the specific characteristics and constrains of differential proteomics