Protein phosphorylation is one of the most important post-translational modifications of proteins.
Almost all processes that regulate the life activities of an organism as well as almost all physiological
and pathological processes are involved in protein phosphorylation. In this paper, we summarize
specific implementation and application of the methods used in protein phosphorylation site prediction
such as the support vector machine algorithm, random forest, Jensen-Shannon divergence combined
with quadratic discriminant analysis, Adaboost algorithm, increment of diversity with quadratic
discriminant analysis, modified CKSAAP algorithm, Bayes classifier combined with phosphorylation
sequences enrichment analysis, least absolute shrinkage and selection operator, stochastic search variable
selection, partial least squares and deep learning. On the basis of this prediction, we use k-nearest
neighbor algorithm with BLOSUM80 matrix method to predict phosphorylation sites. Firstly, we construct
dataset and remove the redundant set of positive and negative samples, that is, removal of protein
sequences with similarity of more than 30%. Next, the proposed method is evaluated by sensitivity
(Sn), specificity (Sp), accuracy (ACC) and Mathew’s correlation coefficient (MCC) these four metrics.
Finally, tenfold cross-validation is employed to evaluate this method. The result, which is verified by
tenfold cross-validation, shows that the average values of Sn, Sp, ACC and MCC of three types of amino
acid (serine, threonine, and tyrosine) are 90.44%, 86.95%, 88.74% and 0.7742, respectively. A
comparison with the predictive performance of PhosphoSVM and Musite reveals that the prediction
performance of the proposed method is better, and it has the advantages of simplicity, practicality and
low time complexity in classification.