Background: Hypusination is a unique modification on lysine residues in eukaryotic translation
initiation factor 5A (eIF5A), which is essential and highly conserved in all kinds of eukaryotes. However, the
mechanism of recognizing this particular hypusination site remains unclear. In this study, we first gave an
attempt in uncovering the characteristics of the hypusination sites using computational methods.
Method: The hypusination sites validated by experiments or predicted through sequence similarity that
were retrieved from the UniProt database were selected for investigating. Each site was transformed
into a peptide segment that contained the modification site and the residues around it. Four types of
features were extracted from the peptide segments. Because the hypusination sites are much fewer than
non-hypusination sites, the synthetic minority over-sampling technique (SMOTE) was performed to
make the dataset containing them balanced. Then, some feature selection methods, including maximum
relevance minimum redundancy (mRMR) and incremental feature selection (IFS), were used to
analyze four types of features and build an optimal classifier that used support vector machine (SVM)
as the prediction engine.
Results: The obtained optimal SVM classifier harboring four amino acid features yielded a perfect
Mathews’ correlation coefficient (MCC) value of 1.000 on both training and testing sets, indicating
these four features are hypusination specific characteristics.
Conclusions: As a pioneer work, our analysis provides insight into the improvement of the understanding
of hypusination mechanisms.