Background: The post-translational modifications (PTMs) on the side chains of
conserved lysine (Lys) residues play important roles in myriad cellular processes, such as
modification of the structures and activities of histones, protein degradation and turnover, and the
regulation of DNA damage responses. To date, several computational methods have been developed
to identify different PTMs on Lys residues. However, most of these methods focused on identifying
one particular PTM regardless of other types of PTMs.
Method: In this study, we first conducted a computational investigation of three types of PTMs
(acetylation, sumoylation, and ubiquitination) at the same time by analyzing the protein structure and
sequence factors surrounding the substrate Lysresidues in these types of PTMs. To fully extract the
structural and sequence information around the Lysresidues, six types of features were used to
encode the peptide segments containing the substrates. Next, through a feature selection method, i.e.,
maximum relevance minimum redundancy (mRMR), two feature lists, i.e., MaxRel feature list and
mRMR feature list, were obtained. For the mRMR feature list, it was applied to extract the optimal
features of the random forest algorithm for distinguishing three types of PTMs.
Results: An optimal classification model with an overall accuracy of 0.989 was built. For the
MaxRel feature list, we investigated the top-ranked features to uncover the site-preference and
residue-preference of Lys residues.
Conclusion: The results suggested that the disorder structure and the preference of flanking residues
were the most important attributes to distinguish the three types of PTMs, which were consistent
with the results reported in previous studies.