Chen, W., Pei, T., Zhang, Z. et al. Predicting host tropism in influenza a viruses: insights from multi-segment nucleotide signatures. J Transl Med (2025)
Background: Influenza A virus (IAV) poses a significant public health threat due to its cross-species transmission and complex host adaptation mechanisms. This study integrated whole-genome data from avian, human, swine, and bovine IAV strains, using machine learning to predict viral host tropism based on nucleotide site features and to identify key sites driving host adaptation along with their synergistic effects.
Methods: A total of 64,000 IAV sequences from avian, human, swine, and bovine hosts were analyzed to build host-prediction models. A four-class classification framework (avian, human, swine, bovine) was constructed using nucleotide site features from all eight genomic segments (PB2, PB1, PA, HA, NP, NA, MP, NS). Eight machine learning algorithms (logistic regression, decision tree, random forest, SVM, KNN, gradient boosting, XGBoost, LightGBM) were benchmarked via 10-fold stratified cross-validation. Model performance was evaluated using accuracy, precision, recall, F1-score, AUPRC, and AUC. SHAP (SHapley Additive exPlanations) analysis prioritized critical nucleotide sites, while bivariate association tests identified synergistic/antagonistic interactions between sites. Nucleotide composition profiles were compared across host groups using hierarchical clustering and heatmap visualization.
Results: The XGBoost algorithm demonstrated the best and most stable performance, achieving an AUC value of over 0.95 in distinguishing human-derived sequences from non-human ones. SHAP analysis identified the top 20 critical nucleotide sites for each gene segment, such as sites 46 and 698 in the NS segment. Nucleotide composition analysis revealed high similarity between human and swine sequences in the HA and PB2 segments, and between avian and bovine sequences. The HA segment was particularly challenging in differentiating human from swine strains. Bivariate site association analysis uncovered significant synergistic or antagonistic effects between key sites within gene segments, forming complex networks. For instance, in the NS segment, a positive prediction contribution was observed when sites 371, 698, and 419 were all G.
Conclusions: This study advances our mechanistic understanding of IAV host adaptation, identifies molecular determinants for zoonotic risk stratification, and establishes a scalable machine learning framework for predicting viral host tropism through nucleotide signature analysis, thereby enhancing surveillance strategies and informing preventive measures against emerging viral threats.
Methods: A total of 64,000 IAV sequences from avian, human, swine, and bovine hosts were analyzed to build host-prediction models. A four-class classification framework (avian, human, swine, bovine) was constructed using nucleotide site features from all eight genomic segments (PB2, PB1, PA, HA, NP, NA, MP, NS). Eight machine learning algorithms (logistic regression, decision tree, random forest, SVM, KNN, gradient boosting, XGBoost, LightGBM) were benchmarked via 10-fold stratified cross-validation. Model performance was evaluated using accuracy, precision, recall, F1-score, AUPRC, and AUC. SHAP (SHapley Additive exPlanations) analysis prioritized critical nucleotide sites, while bivariate association tests identified synergistic/antagonistic interactions between sites. Nucleotide composition profiles were compared across host groups using hierarchical clustering and heatmap visualization.
Results: The XGBoost algorithm demonstrated the best and most stable performance, achieving an AUC value of over 0.95 in distinguishing human-derived sequences from non-human ones. SHAP analysis identified the top 20 critical nucleotide sites for each gene segment, such as sites 46 and 698 in the NS segment. Nucleotide composition analysis revealed high similarity between human and swine sequences in the HA and PB2 segments, and between avian and bovine sequences. The HA segment was particularly challenging in differentiating human from swine strains. Bivariate site association analysis uncovered significant synergistic or antagonistic effects between key sites within gene segments, forming complex networks. For instance, in the NS segment, a positive prediction contribution was observed when sites 371, 698, and 419 were all G.
Conclusions: This study advances our mechanistic understanding of IAV host adaptation, identifies molecular determinants for zoonotic risk stratification, and establishes a scalable machine learning framework for predicting viral host tropism through nucleotide signature analysis, thereby enhancing surveillance strategies and informing preventive measures against emerging viral threats.
See Also:
Latest articles in those days:
- T cell help is a limiting factor for rare anti-influenza memory B cells to reenter germinal centers and generate potent broadly neutralizing antibodies 20 hours ago
- Wild birds drive the introduction, maintenance, and spread of H5N1 clade 2.3.4.4b high pathogenicity avian influenza viruses in Spain, 2021-2022 20 hours ago
- [preprint]FluNexus: a versatile web platform for antigenic prediction and visualization of influenza A viruses 20 hours ago
- Salpingitis and multiorgan lesions caused by highly pathogenic avian influenza A(H5N1) virus in a cat associated with consumption of recalled raw milk in California 20 hours ago
- Detection of highly pathogenic avian influenza A(H5N1) virus 2.3.4.4b in alpacas 20 hours ago
[Go Top] [Close Window]


