Background: Citrullination, an important post-translational modification of proteins,
alters the molecular weight and electrostatic charge of the protein side chains. Citrulline, in protein
sequences, is catalyzed by a class of Peptidyl Arginine Deiminases (PADs). Dependent on Ca2+,
PADs include five isozymes: PAD 1, 2, 3, 4/5, and 6. Citrullinated proteins have been identified in
many biological and pathological processes. Among them, abnormal protein citrullination
modification can lead to serious human diseases, including multiple sclerosis and rheumatoid arthritis.
Objective: It is important to identify the citrullination sites in protein sequences. The accurate
identification of citrullination sites may contribute to the studies on the molecular functions and
pathological mechanisms of related diseases.
Methods and Results: In this study, after an encoded training set (containing 116 positive and 348
negative samples) into the feature matrix, the mRMR method was used to analyze the 941-
dimensional features which were sorted on the basis of their importance. Then, a predictive model
based on a self-normalizing neural network (SNN) was proposed to predict the citrullination sites
in protein sequences. Incremental Feature Selection (IFS) and 10-fold cross-validation were used
as the model evaluation method. Three classical machine learning models, namely random forest,
support vector machine, and k-nearest neighbor algorithm, were selected and compared with the
SNN prediction model using the same evaluation methods. SNN may be the best tool for
citrullination site prediction. The maximum value of the Matthews Correlation Coefficient (MCC)
reached 0.672404 on the basis of the optimal classifier of SNN.
Conclusion: The results showed that the SNN-based prediction methods performed better when
evaluated by some common metrics, such as MCC, accuracy, and F1-Measure. SNN prediction
model also achieved a better balance in the classification and recognition of positive and negative
samples from datasets compared with the other three models.