Introduction: N6-methyladenosine (m6A) is one of the most common post-transcriptional
modifications in RNA, which has been related to several biological processes. The accurate prediction of
m6A sites from RNA sequences is one of the challenging tasks in computational biology. Several computational
methods utilizing machine-learning algorithms have been proposed that accelerate in silico
screening of m6A sites, thereby drastically reducing the experimental time and labor costs involved.
Methodology: In this study, we proposed a novel computational predictor termed ERT-m6Apred, for
the accurate prediction of m6A sites. To identify the feature encodings with more discriminative capability,
we applied a two-step feature selection technique on seven different feature encodings and identified
the corresponding optimal feature set.
Results: Subsequently, performance comparison of the corresponding optimal feature set-based extremely
randomized tree model revealed that Pseudo k-tuple composition encoding, which includes 14
physicochemical properties significantly outperformed other encodings. Moreover, ERT-m6Apred
achieved an accuracy of 78.84% during cross-validation analysis, which is comparatively better than
recently reported predictors.
Conclusion: In summary, ERT-m6Apred predicts Saccharomyces cerevisiae m6A sites with higher
accuracy, thus facilitating biological hypothesis generation and experimental validations.