The interactions between RNA-binding proteins (RBPs) with RNA play key roles in managing some of the cells basic functions. The identification and prediction of RNA binding sites is important for understanding the RNAbinding mechanism. Computational approaches are being developed to predict RNA-binding residues based on the sequence- or structure-derived features. To achieve higher prediction accuracy, improvements on current prediction methods are necessary. We identified that the structural neighbors of RNA-binding and non-RNA-binding residues have different amino acid compositions. Combining this structure-derived feature with evolutionary (PSSM) and other structural information (secondary structure and solvent accessibility) significantly improves the predictions over existing methods. Using a multiple linear regression approach and 6-fold cross validation, our best model can achieve an overall correct rate of 87.8% and MCC of 0.47, with a specificity of 93.4%, correctly predict 52.4% of the RNA-binding residues for a dataset containing 107 non-homologous RNA-binding proteins. Compared with existing methods, including the amino acid compositions of structure neighbors lead to clearly improvement. A web server was developed for predicting RNA binding residues in a protein sequence (or structure), which is available at http://jeele.go.3322.org/RNA/.
Keywords: Protein-RNA interaction, protein-RNA binding prediction, structural neighbours, multiple linear regression
Rights & PermissionsPrintExport