Heterogeneity of genetic sequence within quasi-species of influenza virus revealed by single-molecule sequencing

Influenza viruses exhibit high mutation rates and extensive genetic diversity, which hinder effective vaccine development and facilitate immune evasion (Taubenberger and Morens, 2006; Barr et al., 2010). These mutations arise from the error-prone viral RNA-dependent RNA polymerase, generating highly heterogeneous viral populations within individual hosts that conform to the quasi-species model of a cloud of related genomes evolving under selection (Domingo et al., 2012). Accurate characterization of this intra-host diversity is crucial for understanding viral evolution and improving vaccine design, yet conventional RNA sequencing often fails to detect low-frequency variants because of technical errors during sample preparation and sequencing. Here, we implement a single unique molecular identifier strategy that reduces sequencing artifacts and achieves an error rate of ~10??, enabling single-particle-level quantification of quasi-species diversity. Mutation frequencies greatly exceeding background error confirm their biological origin, while information-theoretic metrics such as Shannon entropy and Jensen-Shannon divergence reveal non-random mutation distributions under selective constraints. This framework supports detailed studies of intra-host viral evolution and may inform artificial intelligence-driven prediction of mutational trajectories and more effective influenza vaccine strategies.