Background: Carboxylation is one of the most biologically important post-translational
modifications and occurs on lysine, arginine, and glutamine residues of a protein. Among all these
three, the covalent attachment of the carboxyl group with the lysine side chain is the most frequent
and biologically important type of carboxylation. For studying such biological functions, it is essential
to correctly determine the lysine sites sensitive to carboxylation.
Objective: Herein, we present a computational model for the prediction of the carboxylysine site
which is based on machine learning.
Methods: Various position and composition relative features have been incorporated into the Pse-
AAC for construction of feature vectors and a neural network is employed as a classifier. The
model is validated by jackknife, cross-validation, self-consistency, and independent testing.
Results: The results of the self-consistency test elaborated that model has 99.76% Acc, 99.76% Sp,
99.76% Sp, and 0.99 MCC.Using the jackknife method, prediction model validation gave 97.07%
Acc, while for 10-fold cross-validation, prediction model validation gave 95.16% Acc.
Conclusion: The results of independent dataset testing were 94.3% which illustrated that the proposed
model has better performance as compared to the existing model PreLysCar; however, the
accuracy can be improved further, in the future, due to the increasing number of carboxylysine
sites in proteins.