Generic placeholder image

Protein & Peptide Letters

Editor-in-Chief

ISSN (Print): 0929-8665
ISSN (Online): 1875-5305

Robust Prediction of B-Factor Profile from Sequence Using Two-Stage SVR Based on Random Forest Feature Selection

Author(s): Xiao-Yong Pan and Hong-Bin Shen

Volume 16, Issue 12, 2009

Page: [1447 - 1454] Pages: 8

DOI: 10.2174/092986609789839250

Price: $65

Abstract

B-factor is highly correlated with protein internal motion, which is used to measure the uncertainty in the position of an atom within a crystal structure. Although the rapid progress of structural biology in recent years makes more accurate protein structures available than ever, with the avalanche of new protein sequences emerging during the post-genomic Era, the gap between the known protein sequences and the known protein structures becomes wider and wider. It is urgent to develop automated methods to predict B-factor profile from the amino acid sequences directly, so as to be able to timely utilize them for basic research. In this article, we propose a novel approach, called PredBF, to predict the real value of B-factor. We firstly extract both global and local features from the protein sequences as well as their evolution information, then the random forests feature selection is applied to rank their importance and the most important features are inputted to a two-stage support vector regression (SVR) for prediction, where the initial predicted outputs from the 1st SVR are further inputted to the 2nd layer SVR for final refinement. Our results have revealed that a systematic analysis of the importance of different features makes us have deep insights into the different contributions of features and is very necessary for developing effective B-factor prediction tools. The two-layer SVR prediction model designed in this study further enhanced the robustness of predicting the B-factor profile. As a web server, PredBF is freely available at: http://www.csbio.sjtu.edu.cn/bioinf/PredBF for academic use.

Keywords: B-factor, residue flexibility, random forest, two-layer SVR, protein effective length, packing density, sequence evolution, PSSM, PredBF


Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy