Identification and prediction of RNA-binding residues (RBRs) provides valuable insights into the mechanisms of protein-RNA interactions. We analyzed the contributions of a wide range of factors including amino acid sequence, evolutionary conservation, secondary structure and solvent accessibility, to the prediction/characterization of RBRs. Five feature sets were designed and feature selection was performed to find and investigate relevant features. We demonstrate that (1) interactions with positively charged amino acids Arg and Lys are preferred by the negatively charged nucleotides; (2) Gly provides flexibility for the RNA binding sites; (3) Glu with negatively charged side chain and several hydrophobic residues such as Leu, Val, Ala and Phe are disfavored in the RNA-binding sites; (4) coil residues, especially in long segments, are more flexible (than other secondary structures) and more likely to interact with RNA; (5) helical residues are more rigid and consequently they are less likely to bind RNA; and (6) residues partially exposed to the solvent are more likely to form RNA-binding sites. We introduce a novel sequence-based predictor of RBRs, RBRpred, which utilizes the selected features. RBRpred is comprehensively tested on three datasets with varied atom distance cutoffs by performing both five-fold cross validation and jackknife tests and achieves Matthews correlation coefficient (MCC) of 0.51, 0.48 and 0.42, respectively. The quality is comparable to or better than that for state-of-the-art predictors that apply the distancebased cutoff definition. We show that the most important factor for RBRs prediction is evolutionary conservation, followed by the amino acid sequence, predicted secondary structure and predicted solvent accessibility. We also investigate the impact of using native vs. predicted secondary structure and solvent accessibility. The predictions are sufficient for the RBR prediction and the knowledge of the actual solvent accessibility helps in predictions for lower distance cutoffs.
Keywords: Protein-RNA binding, RNA-binding residues, prediction of RNA-binding residues, Predicted Secondary Structure, Solvent Accessibility, five-fold cross validation, jackknife tests, Matthew's correlation coefficient, protein-RNA interactions, nucleotides, DNA, RNA, Protein Data Bank, neural network, Sequence-Based Features, 20-dimensional binary vector, Support Vector Machine (SVM)
Rights & PermissionsPrintExport