Background: MicroRNAs (miRNAs) are a set of non-coding, short (approximately 21nt)
RNAs that play an important role as a regulator in biological processes in the cells. The identification
and discovery of pre-miRNAs are beneficial in understanding the regulatory process, the functions of
miRNAs and other genes, and furthermore in biological evolution.
Methods: Machine learning method has been a powerful technology in distinguishing the real premiRNAs
from other hairpin-like sequences (pseudo pre-miRNAs). However, most of the commonly
used classifiers are not promising in predicting performances on independent testing data sets. To overcome
this, we proposed a novel BRAda algorithm integrating BP neural network and random forest
classifier based on two balanced training sets. By distributing weights to these classifiers and the proposed
98-dimensional features, we obtained a strong classifier with high-accuracy and stability. Furthermore,
based on the novel classifier we proposed, two independent testing sets (undated human and
non-human pre-miRNAs) were employed to evaluate the prediction performance.
Results: The novel method BRAda algorithm is significantly outperformed the other methods in identifying
both human and non-human pre-miRNAs.
Conclusion: The novel algorithm integrated BP neural network and random forest classifier based on
two balanced training sets. Compared with other state-of-art machine-learning methods, the performance
of BRAda was perfect (the ACC is over 99%) according to the validation. Besides, though the
algorithm was trained by human gene sets, the prediction performance on non-human testing sets was
also excellent (the average ACC is over 97%), which means the method not only has high stability but
also robustness. By experiments and validation, the authors showed the method is an effective tool for