Background: RNA-protein interactions (RPIs) play an important role in many cellular
processes. In particular, noncoding RNA-protein interactions (ncRPIs) are involved in various
gene regulations and human complex diseases. High-throughput experiments have provided a
large number of valuable information about ncRPIs, but these experiments are expensive and timeconsuming.
Therefore, some computational approaches have been developed to predict ncRPIs efficiently
Methods: In this work, we will describe the recent advance of predicting ncRPIs from the following
aspects: i) the dataset construction; ii) the sequence and structural feature representation, and
iii) the machine learning algorithm.
Results: The current methods have successfully predicted ncRPIs, but most of them trained and
tested on the small benchmark datasets derived from ncRNA-protein complexes in PDB database.
The generalization performance and robust of these existing methods need to be further improved.
Conclusion: Concomitant with the large numbers of ncRPIs generated by high-throughput technologies,
three future directions for predicting ncRPIs with machine learning should be paid attention.
One direction is that how to effectively construct the negative sample set. Another is the selection
of novel and effective features from the sequences and structures of ncRNAs and proteins.
The third is the design of powerful predictor.