A Sequence-segment Neighbor Encoding Schema for Protein Hotspot Residue Prediction

Peng       Chen; Tong       Shen; Youzhi       Zhang; Bing       Wang

Abstract

Background: Hotspots are those residues that contribute major free energy of binding in protein-protein interactions. Protein functions are frequently dependent on hotspot residues. At present, hotspot residues are always identified by Alanine scanning mutagenesis technology, which is costly, time-consuming and laborious.

Objective: Therefore, more accurate and efficient methods have to be developed to identify protein hotspot residues.

Methods: This paper proposed a novel encoding schema of sequence-segment neighbors and constructed a random forest-based model to identify hotspots in protein interaction interfaces. Firstly, 10 amino acid physicochemical properties, 16 features related to the PI and DI, and 25 features related to ASA were extracted. Different from the previous residue encoding schemas, such as auto correlation descriptor or triplet combination information, this paper employed the influence of amino acids neighbors to hotspot residues and amino acids with a certain distance in sequence to the hotspot.

Results: Moreover, the proposed model was compared with other hotspot prediction methods, including APIS, Robetta, FOLDEF, KFC, MINERVA models, etc.

Conclusion: The experimental results showed that the proposed model can improve the prediction ability of protein hotspot residues on the same test set.

Keywords: Protein interaction, hotspots, encoding of sequence-segment neighbors, sliding window, random forest, schema.

« Previous Next »

Graphical Abstract

[1] 
DeLano WL, Ultsch MH, de Vos AM, Wells JA. Convergent solutions to binding at a protein-protein interface. Science  2000; 287(5456): 1279-83.
[http://dx.doi.org/10.1126/science.287.5456.1279] [PMID:  10678837] 
[2] 
Schmeichel KL, Beckerle MC. The LIM domain is a modular protein-binding interface. Cell  1994; 79(2): 211-9.
[http://dx.doi.org/10.1016/0092-8674(94)90191-0] [PMID:  7954790] 
[3] 
Blanton MP, Cohen JB. Identifying the lipid-protein interface of the Torpedo nicotinic acetylcholine receptor: secondary structure implications. Biochemistry  1994; 33(10): 2859-72.
[http://dx.doi.org/10.1021/bi00176a016] [PMID:  8130199] 
[4] 
Thorn KS, Bogan AA. ASEdb: a database of alanine mutations and their effects on the free energy of binding in protein interactions. Bioinformatics  2001; 17(3): 284-5.
[http://dx.doi.org/10.1093/bioinformatics/17.3.284] [PMID:  11294795] 
[5] 
Liu Q, Li J. Propensity vectors of low-ASA residue pairs in the distinction of protein interactions. Proteins  2010; 78(3): 589-602.
[PMID:  19768686] 
[6] 
Li J, Liu Q. ‘Double water exclusion’: a hypothesis refining the O-ring theory for the hot spots at protein interfaces. Bioinformatics  2009; 25(6): 743-50.
[http://dx.doi.org/10.1093/bioinformatics/btp058] [PMID:  19179356] 
[7] 
Saha RP, Bahadur RP, Chakrabarti P. Interresidue contacts in proteins and protein-protein interfaces and their use in characterizing the homodimeric interface. J Proteome Res  2005; 4(5): 1600-9.
[http://dx.doi.org/10.1021/pr050118k] [PMID:  16212412] 
[8] 
Atwell S, Ultsch M, De Vos AM, Wells JA. Structural plasticity in a remodeled protein-protein interface. Science  1997; 278(5340): 1125-8.
[http://dx.doi.org/10.1126/science.278.5340.1125] [PMID:  9353194] 
[9] 
Ruzhansky M, Je Cho Y, Agarwal P, Area I. Advances in Real and Complex Analysis with Applications. Springer Singapore 2017.
[http://dx.doi.org/10.1007/978-981-10-4337-6] 
[10] 
Agarwal P, Deni̇z S, Jain S, Alderremy AA. Shaban Aly, A new analysis of a partial differential equation arising in biology and population genetics via semi analytical techniques. Physica A:
Statistical Mechanics and its Applications  542: 2020: 122769.
[11] 
Kortemme T, Baker D. A simple physical model for binding energy hot spots in protein-protein complexes. Proc Natl Acad Sci USA  2002; 99(22): 14116-21.
[http://dx.doi.org/10.1073/pnas.202485799] [PMID:  12381794] 
[12] 
Guerois R, Nielsen JE, Serrano L. Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J Mol Biol  2002; 320(2): 369-87.
[http://dx.doi.org/10.1016/S0022-2836(02)00442-4] [PMID:  12079393] 
[13] 
Darnell SJ, Page D, Mitchell JC. An automated decision-tree approach to predicting protein interaction hot spots. Proteins  2007; 68(4): 813-23.
[http://dx.doi.org/10.1002/prot.21474] [PMID:  17554779] 
[14] 
Cho KI, Kim D, Lee D. A feature-based approach to modeling protein-protein interaction hot spots. Nucleic Acids Res  2009; 37(8): 2672-87.
[http://dx.doi.org/10.1093/nar/gkp132] [PMID:  19273533] 
[15] 
Xia JF, Zhao XM, Song J, Huang DS. APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility. BMC Bioinformatics  2010; 11(1): 174.
[http://dx.doi.org/10.1186/1471-2105-11-174] [PMID:  20377884] 
[16] 
Fischer TB, Arunachalam KV, Bailey D, et al. The binding interface database (BID): a compilation of amino acid hot spots in protein interfaces. Bioinformatics  2003; 19(11): 1453-4.
[http://dx.doi.org/10.1093/bioinformatics/btg163] [PMID:  12874065] 
[17] 
Chen P, Li J, Wong L, Kuwahara H, Huang JZ, Gao X. Accurate prediction of hot spot residues through physicochemical characteristics of amino acid sequences. Proteins  2013; 81(8): 1351-62.
[http://dx.doi.org/10.1002/prot.24278] [PMID:  23504705] 
[18] 
Pintar A, Carugo O, Pongor S. CX, an algorithm that identifies protruding atoms in proteins. Bioinformatics  2002; 18(7): 980-4.
[http://dx.doi.org/10.1093/bioinformatics/18.7.980] [PMID:  12117796] 
[19] 
Pintar A, Carugo O, Pongor S. DPX: for the analysis of the protein core. Bioinformatics  2003; 19(2): 313-4.
[http://dx.doi.org/10.1093/bioinformatics/19.2.313] [PMID:  12538266] 
[20] 
Mihel J, Sikić M, Tomić S, et al. PSAIA – Protein Archer K J, Kimes R V. Empirical characterization of random forest variable importance measures. Comput Stat Data Anal  2008; 52(4): 2249-60.
[http://dx.doi.org/10.1016/j.csda.2007.08.015] 
[21] 
Mihel J, Sikić M, Tomić S, Jeren B, Vlahovicek K. PSAIA - protein structure and interaction analyzer. BMC Struct Biol  2008; 8(1): 21.
[http://dx.doi.org/10.1186/1472-6807-8-21] [PMID:  18400099] 
[22] 
Li BQ, Hu LL, Chen L, Feng KY, Cai YD, Chou KC. Prediction of protein domain with mRMR feature selection and analysis. PLoS One  2012; 7(6) e39308.
[http://dx.doi.org/10.1371/journal.pone.0039308] [PMID:  22720092] 
[23] 
Niu S, Huang T, Feng K, Cai Y, Li Y. Prediction of tyrosine sulfation with mRMR feature selection and analysis. J Proteome Res  2010; 9(12): 6490-7.
[http://dx.doi.org/10.1021/pr1007152] [PMID:  20973568] 
[24] 
Chou KC. Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics  2005; 21(1): 10-9.
[http://dx.doi.org/10.1093/bioinformatics/bth466] [PMID:  15308540] 
[25] 
Chen P. Limsoon Wong, Jinyan Li, Outlier detection: a challenge to clean interface data in protein hetero-complexes and the application. IEEE/ACM Trans Comput Biol Bioinformatics  2012; 9(4): 1155-65.
[26] 
Hu SS, Chen P, Wang B, Li J. Protein binding hot spots prediction from sequence only by a new ensemble learning method. Amino Acids  2017; 49(10): 1773-85.
[http://dx.doi.org/10.1007/s00726-017-2474-6] [PMID:  28766075] 
[27] 
Zhu X, Mitchell JC. KFC2: a knowledge-based hot spot prediction method based on interface solvation, atomic density, and plasticity features. Proteins  2011; 79(9): 2671-83.
[http://dx.doi.org/10.1002/prot.23094] [PMID:  21735484] 

Rights & Permissions Print Cite

Article Metrics

36

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/1574893615666200106115421	Print ISSN 1574-8936
Publisher Name Bentham Science Publisher	Online ISSN 2212-392X

Current Bioinformatics

A Sequence-segment Neighbor Encoding Schema for Protein Hotspot Residue Prediction

Abstract

Graphical Abstract

Related Journals

Related Books