A stacked ensemble model with NNLS-based weighting for influenza forecasting: a case study of Anhui Province, China

Background: Influenza poses a significant global public health threat, with its pandemic potential and seasonal variability presenting formidable challenges to prediction accuracy. This study leverages high-quality weekly data (incidence rates, viral subtypes, and meteorological indicators) from the provincial influenza surveillance system in Anhui Province, eastern China, spanning 2015-2025. A multi-source data fusion model was developed to overcome the limitations of traditional methods in modeling nonlinear transmission dynamics and multi-factor synergistic effects.

Methods: Single models were constructed using ARIMA, Prophet, and XGBoost, then stacked into an interpretable ensemble model (Stacked-NNLS) using non-negative least squares (NNLS). Performance was comprehensively evaluated using R 2 (explained variance), RMSE (root mean square error), MAE (mean absolute error), and MAPE (mean absolute percentage error).

Results: ARIMA exhibits poor fit for non-stationary sequences (training set R 2 = -3.66; test set R 2 = 0.03). Prophet effectively captures long-term trends (training/test set R 2 = 0.38/0.88). XGBoost shows overfitting (training/test set R 2 = 0.99/0.74). The Stacked-NNLS model demonstrated significantly superior robustness (training/test R 2 = 0.94/0.94), outperforming baseline models across all metrics.

Conclusion: By integrating statistical, seasonal, and nonlinear modeling approaches, Stacked-NNLS demonstrated robust predictive performance in capturing influenza trends, seasonal fluctuations, and complex interactions among multiple factors, suggesting its potential utility for infectious disease early warning and public health decision-making.