Tran H, Berke O, Ricker N, Poljak Z. Evaluating machine learning approaches for host prediction using H3 influenza genomic data. PLoS One. 2025 Nov 5;20(11):e0336142
Background: H3 influenza A viruses (IAV) have been shown to frequently cross the species barrier which can be an important factor in sustained transmission and spread. Machine learning methods have been widely explored for host prediction of IAV using genomic data; however, this is often done using data from only one of the eight IAV segments or by using all available IAV data to predict broad categories of hosts.
Objective: The objective of this study was to combine machine learning algorithms with H3 IAV sequence data from all eight segments to train predictive machine learning models for distinct host prediction and validate model performance.
Methods: Models were trained on both k-mers and amino acid properties alongside machine learning algorithms that included random forest and XGBoost for each of the eight IAV genome segments. Models were then validated on a test dataset through analytics of model class predicted probabilities and subsequently used to investigate between-species transmission patterns within case studies including canine H3N8, swine H3N2 2010.2, and duck H3 sequences.
Results: Models demonstrated strong performance in host prediction across all eight segments on the test dataset, with overall accuracies and κ (kappa) values ranging from 0.995-0.997, 0.984-0.990, respectively. Misclassified test dataset sequences with high predicted probabilities (> 90%) were validated using available literature and were identified to be frequently associated with between-species transmission events. Between-species transmission patterns within case study model class predicted probabilities were also identified to be consistent with the literature in cases of both correct and incorrect classification.
Conclusions: These models allow for rapid and accurate host prediction of H3 IAV datasets from any of the eight IAV segments and provide a solid framework that allows for identification of variants with higher than typical between-species transmission potential. However, results obtained on selected case studies suggest further improvements of the training and validation processes should be considered.
Objective: The objective of this study was to combine machine learning algorithms with H3 IAV sequence data from all eight segments to train predictive machine learning models for distinct host prediction and validate model performance.
Methods: Models were trained on both k-mers and amino acid properties alongside machine learning algorithms that included random forest and XGBoost for each of the eight IAV genome segments. Models were then validated on a test dataset through analytics of model class predicted probabilities and subsequently used to investigate between-species transmission patterns within case studies including canine H3N8, swine H3N2 2010.2, and duck H3 sequences.
Results: Models demonstrated strong performance in host prediction across all eight segments on the test dataset, with overall accuracies and κ (kappa) values ranging from 0.995-0.997, 0.984-0.990, respectively. Misclassified test dataset sequences with high predicted probabilities (> 90%) were validated using available literature and were identified to be frequently associated with between-species transmission events. Between-species transmission patterns within case study model class predicted probabilities were also identified to be consistent with the literature in cases of both correct and incorrect classification.
Conclusions: These models allow for rapid and accurate host prediction of H3 IAV datasets from any of the eight IAV segments and provide a solid framework that allows for identification of variants with higher than typical between-species transmission potential. However, results obtained on selected case studies suggest further improvements of the training and validation processes should be considered.
See Also:
Latest articles in those days:
- High-throughput pseudovirus neutralisation maps the antigenic landscape of influenza A/H1N1 viruses 5 hours ago
- Timely vaccine strain selection and genomic surveillance improve evolutionary forecast accuracy of seasonal influenza A/H3N2 5 hours ago
- Evaluation of a Novel Data Source for National Influenza Surveillance: Influenza Hospitalization Data in the National Healthcare Safety Network, United States, September 2021-April 2024 5 hours ago
- Scenarios for pre-pandemic zoonotic influenza preparedness and response 5 hours ago
- Stability of Avian Influenza A(H5N1) Virus in Milk from Infected Cows and Virus-Spiked Milk 1 days ago
[Go Top] [Close Window]


