Background: Lantibiotics, which are usually produced from Gram-positive bacteria, are
regarded as one type of special bacteriocins. Lantibiotics have unsaturated amino acid residues
formed by lanthionine (Lan) and β-methyllanthionine (MeLan) residues as a ring structure in the
peptide. They are derived from the serine and threonine residues and are essential to preventing the
growth of other similar strains.
Method: In this pioneering work, we firstly proposed a machine learning method to recognize and
predict the Lan and MeLan residues in the protein sequences of lantibiotics. We adopted maximal
relevance minimal redundancy (mRMR) and incremental feature selection (IFS) to select optimal
features and random forest (RF) to build classifiers determining the Lan and MeLan residues. A 10-
fold cross-validation test was performed on the classifiers to evaluate their predicted performances.
Results: The Matthew's correlation coefficient (MCC) values for predicting the Lan and MeLan
residues were 0.813 and 0.769, respectively. Our constructed RF classifiers were shown to have a
reliable ability to recognize Lan and MeLan residues from lantibiotic sequences. Furthermore, three
other methods, Dagging, the nearest neighbor algorithm (NNA) and sequential minimal optimization
(SMO) were also utilized to build classifiers to predict Lan and MeLan residues for comparison.
Analysis was also performed on the optimal features, and the relationships between the optimal
features and their biological importance were provided.
Conclusion: The selected optimal features and analysis in this work will contribute to a better
understanding of the sequence and structural features around the Lan and MeLan residues. It could
provide useful information and practical suggestions for experimental and computational methods
toward exploring the biological features of such special residues in lantibiotics.