CSBPI_Site：Multi-Information Sources of Features to RNA Binding Sites Prediction
Background: RNA-binding proteins establish posttranscriptional gene regulation by coordinating the
maturation, editing, transport, stability, and translation of cellular RNAs. The immunoprecipitation experiments could
identify interaction between RNA and proteins, but they are limited due to the experimental environment and material.
Therefore, it is essential to construct computational models to identify the function sites.
Objective: Although some computational methods have been proposed to predict RNA binding sites, the accuracy could
be further improved. Moreover, it is necessary to construct a dataset with more samples to design a reliable model. Here
we present a computational model based on multi-information sources to identify RNA binding sites.
Method: We construct an accurate computational model named CSBPI_Site, based on xtreme gradient boosting. The
specifically designed 15-dimensional feature vector captures four types of information (chemical shift, chemical bond,
chemical properties and position information).
Results: The satisfied accuracy of 0.86 and AUC of 0.89 were obtained by leave-one-out cross validation. Meanwhile, the
accuracies were slightly different (range from 0.83 to 0.85) among three classifiers algorithm, which showed the novel
features are stable and fit to multiple classifiers. These results showed that the proposed method is effective and robust for
noncoding RNA binding sites identification.
Conclusion: Our method based on multi-information sources is effective to represent the binding sites information among
ncRNAs. The satisfied prediction results of Diels-Alder riboz-yme based on CSBPI_Site indicates that our model is
valuable to identify the function site.
Journal Title: Current Bioinformatics