Background: The host tropism determinants of influenza virus, which cause changes in
the host range and increase the likelihood of interaction with specific hosts, are critical for
understanding the infection and propagation of the virus in diverse host species.
Methods: Six types of protein sequences of influenza viral strains isolated from three classes of
hosts (avian, human, and swine) were obtained. Random forest, naïve Bayes classification, and knearest
neighbor algorithms were used for host classification. The Java language was used for
sequence analysis programming and identifying host-specific position markers.
Results: A machine learning technique was explored to derive the physicochemical properties of
amino acids used in host classification and prediction. HA protein was found to play the most
important role in determining host tropism of the influenza virus, and the random forest method
yielded the highest accuracy in host prediction. Conserved amino acids that exhibited host-specific
differences were also selected and verified, and they were found to be useful position markers for
host classification. Finally, ANOVA analysis and post-hoc testing revealed that the
physicochemical properties of amino acids, comprising protein sequences combined with position
markers, differed significantly among hosts.
Conclusion: The host tropism determinants and position markers described in this study can be
used in related research to classify, identify, and predict the hosts of influenza viruses that are
currently susceptible or likely to be infected in the future.