Background: Accurately recognizing nitrated tyrosine residues from protein sequences
would pave a way for understanding the mechanism of nitration and the screening of the tyrosine
residues in sequences.
Results: In this study, we proposed a prediction model that used the extreme learning machine
(ELM) algorithm as the prediction engine to identify nitrated tyrosine residues. To encode each
tyrosine residue, a sliding window technique was adopted to extract a peptide segment for each
tyrosine residue, from which a number of features were extracted. These features were analyzed by a
popular feature selection method, Minimum Redundancy Maximum Relevance (mRMR) method,
producing a feature list, in which all features were ranked in a rigorous way. Then, the Incremental
Feature Selection (IFS) method was utilized to discover the optimal features, on which the optimal
ELM-based prediction model was built. This model produced satisfactory results on the training
dataset with a Matthews correlation coefficient of 0.757. The model was also evaluated by an
independent test dataset that contained only positive samples, yielding a sensitivity of 0.938.
Conclusion: Compared to other prediction models that use classic machine learning algorithms as
prediction engines on the same datasets with their own optimal features, the optimal ELM-based
prediction model produced much better results, indicating the superiority of the proposed model for
the identification of nitrated tyrosine residues from protein sequences.