Background: Myristoylation is an important hydrophobic post-translational modification
that is covalently bound to the amino group of Gly residues on the N-terminus of proteins. The many
diverse functions of myristoylation on proteins, such as membrane targeting, signal pathway
regulation and apoptosis, are largely due to the lipid modification, whereas abnormal or irregular
myristoylation on proteins can lead to several pathological changes in the cell.
Objective: To better understand the function of myristoylated sites and to correctly identify them in
protein sequences, this study conducted a novel computational investigation on identifying
myristoylation sites in protein sequences.
Materials and Methods: A training dataset with 196 positive and 84 negative peptide segments were
obtained. Four types of features derived from the peptide segments following the myristoylation sites
were used to specify myristoylatedand non-myristoylated sites. Then, feature selection methods
including maximum relevance and minimum redundancy (mRMR), incremental feature selection
(IFS), and a machine learning algorithm (extreme learning machine method) were adopted to extract
optimal features for the algorithm to identify myristoylation sites in protein sequences, thereby
building an optimal prediction model.
Results: As a result, 41 key features were extracted and used to build an optimal prediction model.
The effectiveness of the optimal prediction model was further validated by its performance on a test
dataset. Furthermore, detailed analyses were also performed on the extracted 41 features to gain
insight into the mechanism of myristoylation modification.
Conclusion: This study provided a new computational method for identifying myristoylation sites in
protein sequences. We believe that it can be a useful tool to predict myristoylation sites from protein