A Study on Host Tropism Determinants of Influenza Virus Using Machine Learning

Author(s): Eunmi Kwon, Myeongji Cho, Hayeon Kim, Hyeon S. Son*.

Journal Name: Current Bioinformatics

Volume 15 , Issue 2 , 2020

Become EABM
Become Reviewer

Graphical Abstract:


Background: The host tropism determinants of influenza virus, which cause changes in the host range and increase the likelihood of interaction with specific hosts, are critical for understanding the infection and propagation of the virus in diverse host species.

Methods: Six types of protein sequences of influenza viral strains isolated from three classes of hosts (avian, human, and swine) were obtained. Random forest, naïve Bayes classification, and knearest neighbor algorithms were used for host classification. The Java language was used for sequence analysis programming and identifying host-specific position markers.

Results: A machine learning technique was explored to derive the physicochemical properties of amino acids used in host classification and prediction. HA protein was found to play the most important role in determining host tropism of the influenza virus, and the random forest method yielded the highest accuracy in host prediction. Conserved amino acids that exhibited host-specific differences were also selected and verified, and they were found to be useful position markers for host classification. Finally, ANOVA analysis and post-hoc testing revealed that the physicochemical properties of amino acids, comprising protein sequences combined with position markers, differed significantly among hosts.

Conclusion: The host tropism determinants and position markers described in this study can be used in related research to classify, identify, and predict the hosts of influenza viruses that are currently susceptible or likely to be infected in the future.

Keywords: Amino acid properties, bioinformatics, hemagglutinin, host tropism, influenza virus, machine learning, random forest.

Neumann G, Kawaoka Y. Host range restriction and pathogenicity in the context of influenza pandemic. Emerg Infect Dis 2006; 12(6): 881-6.
[http://dx.doi.org/10.3201/eid1206.051336] [PMID: 16707041]
Klenk HD, Matrosovich M, Stech J. Avian influenza: molecular mechanisms of pathogenesis and host range.Mettenleiter TC and Sobrino F, Eds Animal viruses: molecular biology. Wymondham; United Kingdom: caister academic press 2007.
Chen GW, Chang SC, Mok CK, et al. Genomic signatures of human versus avian influenza A viruses. Emerg Infect Dis 2006; 12(9): 1353-60.
[http://dx.doi.org/10.3201/eid1209.060276] [PMID: 17073083]
Steel J, Lowen AC, Mubareka S, Palese P. Transmission of influenza virus in a mammalian host is increased by PB2 amino acids 627K or 627E/701N. PLoS Pathog 2009; 5(1): e1000252.
[http://dx.doi.org/10.1371/journal.ppat.1000252] [PMID: 19119420]
Allen JE, Gardner SN, Vitalis EA, Slezak TR. Conserved amino acid markers from past influenza pandemic strains. BMC Microbiol 2009; 9(1): 77.
[http://dx.doi.org/10.1186/1471-2180-9-77] [PMID: 19386124]
Hu W. Novel host markers in the 2009 pandemic H1N1 influenza A virus. J Biomed Sci Eng 2010; 3(6): 584.
Sherif FF, Kadah YM, El-Hefnawi M. Influenza A subtyping and host origin classification using Profile Hidden Markov Models. J Mech Med Biol 2012; 12(2): 1240009
Attaluri PK, Chen Z, Lu G. Applying neural networks to classify influenza virus antigenic types and hosts. IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology. 1-6.
Eng CL, Tong JC, Tan TW. Predicting host tropism of influenza A virus proteins using random forest. BMC Med Genomics 2014; 7(3): S1.
[http://dx.doi.org/10.1186/1755-8794-7-S3-S1] [PMID: 25521718]
Dubchak I, Muchnik I, Mayor C, Dralyuk I, Kim SH. Recognition of a protein fold in the context of the SCOP classification. Proteins 1999; 35(4): 401-7.
[http://dx.doi.org/10.1002/(SICI)1097-0134(19990601)35:4<401::AID-PROT3>3.0.CO;2-K] [PMID: 10382667]
Dubchak I, Muchnik IB, Kim SH. Protein folding class predictor for SCOP: approach based on global descriptors. ismb 1997; 104-7.
Chou KC. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins 2001; 43(3): 246-55.
[http://dx.doi.org/10.1002/prot.1035] [PMID: 11288174]
Nanni L, Lumini A, Gupta D, Garg A. Identifying bacterial virulent proteins by fusing a set of classifiers based on variants of Chou's pseudo amino acid composition and on evolutionary information. IEEE/ACM Trans Comput Biol Bioinform 2012; 329(2): 467-75.
Hayat M, Khan A. Discriminating outer membrane proteins with Fuzzy K-nearest Neighbor algorithms based on the general form of Chou’s PseAAC. Protein Pept Lett 2012; 19(4): 411-21.
[http://dx.doi.org/10.2174/092986612799789387] [PMID: 22185508]
Zhang SW, Zhang YL, Yang HF, Zhao CH, Pan Q. Using the concept of Chou’s pseudo amino acid composition to predict protein subcellular localization: an approach by incorporating evolutionary information and von Neumann entropies. Amino Acids 2008; 34(4): 565-72.
[http://dx.doi.org/10.1007/s00726-007-0010-9] [PMID: 18074191]
Kandaswamy KK, Pugalenthi G, Möller S, et al. Prediction of apoptosis protein locations with genetic algorithms and support vector machines through a new mode of pseudo amino acid composition. Protein Pept Lett 2010; 17(12): 1473-9.
[http://dx.doi.org/10.2174/0929866511009011473] [PMID: 20666727]
Zou D, He Z, He J, Xia Y. Supersecondary structure prediction using Chou’s pseudo amino acid composition. J Comput Chem 2011; 32(2): 271-8.
[http://dx.doi.org/10.1002/jcc.21616] [PMID: 20652881]
Chen YK, Li KB. Predicting membrane protein types by incorporating protein topology, domains, signal peptides, and physicochemical properties into the general form of Chou’s pseudo amino acid composition. J Theor Biol 2013; 318: 1-12.
[http://dx.doi.org/10.1016/j.jtbi.2012.10.033] [PMID: 23137835]
Esmaeili M, Mohabatkar H, Mohsenzadeh S. Using the concept of Chou’s pseudo amino acid composition for risk type prediction of human papillomaviruses. J Theor Biol 2010; 263(2): 203-9.
[http://dx.doi.org/10.1016/j.jtbi.2009.11.016] [PMID: 19961864]
Georgiou DN, Karakasidis TE, Megaritis AC. A short survey on genetic sequences, Chou’s pseudo amino acid composition and its combination with fuzzy set theory. Open Bioinform J 2013; 7: 41-8.
Li ZR, Lin HH, Han LY, Jiang L, Chen X, Chen YZ. PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence. Nucleic Acids Res 2006; 34(Web Server issue): W32-7.
Dubchak I, Muchnik I, Holbrook SR, Kim SH. Prediction of protein folding class using global description of amino acid sequence. Proc Natl Acad Sci USA 1995; 92(19): 8700-4.
[http://dx.doi.org/10.1073/pnas.92.19.8700] [PMID: 7568000]
Witten IH, Frank E, Trigg LE, Hall MA, Holmes G, Cunningham SJ. Weka: Practical machine learning tools and techniques with java implementations. 1999.
Thompson JD, Gibson TJ, Higgins DG. Multiple sequence alignment using ClustalW and ClustalX. Curr Protoc Bioinformatics 2002; 3.
[PMID: 18792934]
Koel BF, Burke DF, Bestebroer TM, et al. Substitutions near the receptor binding site determine major antigenic change during influenza virus evolution. Science 2013; 342(6161): 976-9.
[http://dx.doi.org/10.1126/science.1244730] [PMID: 24264991]
Freund RJ, Littell RC. SAS for linear models: A guide to the ANOVA and GLM procedures 1981.
Schneider-Schaulies J. Cellular receptors for viruses: links to tropism and pathogenesis. J Gen Virol 2000; 81(Pt 6): 1413-29.
[http://dx.doi.org/10.1099/0022-1317-81-6-1413] [PMID: 10811925]
Hueffer K, Parker JS, Weichert WS, Geisel RE, Sgro JY, Parrish CR. The natural host range shift and subsequent evolution of canine parvovirus resulted from virus-specific binding to the canine transferrin receptor. J Virol 2003; 77(3): 1718-26.
[http://dx.doi.org/10.1128/JVI.77.3.1718-1726.2003] [PMID: 12525605]
Imai M, Kawaoka Y. The role of receptor binding specificity in interspecies transmission of influenza viruses. Curr Opin Virol 2012; 2(2): 160-7.
[http://dx.doi.org/10.1016/j.coviro.2012.03.003] [PMID: 22445963]
Tarendeau F, Crepin T, Guilligay D, Ruigrok RW, Cusack S, Hart DJ. Host determinant residue lysine 627 lies on the surface of a discrete, folded domain of influenza virus polymerase PB2 subunit. PLoS Pathog 2008; 4(8): e1000136
[http://dx.doi.org/10.1371/journal.ppat.1000136] [PMID: 18769709]
Bussey KA, Desmet EA, Mattiacio JL, et al. PA residues in the 2009 H1N1 pandemic influenza virus enhance avian influenza virus polymerase activity in mammalian cells. J Virol 2011; 85(14): 7020-8.
[http://dx.doi.org/10.1128/JVI.00522-11] [PMID: 21561908]
Mehle A, Dugan VG, Taubenberger JK, Doudna JA. Reassortment and mutation of the avian influenza virus polymerase PA subunit overcome species barriers. J Virol 2012; 86(3): 1750-7.
[http://dx.doi.org/10.1128/JVI.06203-11] [PMID: 22090127]
Gabriel G, Klingel K, Otte A, et al. Differential use of importin-α isoforms governs cell tropism and host adaptation of influenza virus. Nat Commun 2011; 2: 156.
[http://dx.doi.org/10.1038/ncomms1158] [PMID: 21245837]
Xu D, Newhouse EI, Amaro RE, et al. Distinct glycan topology for avian and human sialopentasaccharide receptor analogues upon binding different hemagglutinins: a molecular dynamics perspective. J Mol Biol 2009; 387(2): 465-91.
[http://dx.doi.org/10.1016/j.jmb.2009.01.040] [PMID: 19356594]
Kobayashi Y, Suzuki Y. Compensatory evolution of net-charge in influenza A virus hemagglutinin. PLoS One 2012; 7(7): e40422
[http://dx.doi.org/10.1371/journal.pone.0040422] [PMID: 22808159]
Hensley SE, Das SR, Bailey AL, et al. Hemagglutinin receptor binding avidity drives influenza A virus antigenic drift. Science 2009; 326(5953): 734-6.
[http://dx.doi.org/10.1126/science.1178258] [PMID: 19900932]
Koel BF, Mögling R, Chutinimitkul S, et al. Identification of amino acid substitutions supporting antigenic change of influenza A(H1N1)pdm09 viruses. J Virol 2015; 89(7): 3763-75.
[http://dx.doi.org/10.1128/JVI.02962-14] [PMID: 25609810]

Rights & PermissionsPrintExport Cite as

Article Details

Year: 2020
Page: [121 - 134]
Pages: 14
DOI: 10.2174/1574893614666191104160927
Price: $65

Article Metrics

PDF: 25