Aim and Objective: Protein malonylation is a newly discovered post-translational
modification. Malonylation is known to closely be associated with type 2 diabetes and to play its
regulatory role in fatty acid oxidation and the associated genetic disease. Identifying protein
malonylations might lay a solid foundation to explore malonylation function. Due to the limitations
of experimental techniques, it is a great challenge to fast and accurately identify malonylation sites.
Methods: We proposed a computational method to predict malonylation sites and to analyze
malonylation pattern. We firstly extracted protein segments so that the lysine is at the center of each
segment. Then, each segment was encoded by the pseudo amino acid compositions. The support
vector machine classifier trained by a training dataset was built to distinguish malonylation sites
from non-malonylation ones.
Results: The leave-one-out test on the training dataset reached the accuracy of 0.7733, and the
independent test on the testing dataset got 0.8889. Furthermore, the classifier also successfully
identified 144 of 160 putative malonylation sites. Analyses on the differences between malonylation
and non-malonylation segments implicated that lysine malonylation should follow a specific pattern,
e.g. lysine with its neighbors being Glycine and Alanine might be more likely to be malonylated.
Therefore, the proposed method is expected to be a promising tool to identify malonylation sites.