Risk prediction system for dengue transmission based on high resolution weather data
Background Dengue is the fastest spreading vector-borne viral disease, resulting in an estimated 390 million infections annually. Precise prediction of many attributes related to dengue is still a challenge due to the complex dynamics of the disease. Important attributes to predict include: the risk of and risk factors for an infection; infection severity; and the timing and magnitude of outbreaks. In this work, we build a model for predicting the risk of dengue transmission using high-resolution weather data. The level of dengue transmission risk depends on the vector density, hence we predict risk via vector prediction. Methods and findings We make use of surveillance data on Aedes aegypti larvae collected by the Taiwan Centers for Disease Control as part of the national routine entomological surveillance of dengue, and weather data simulated using the IBM’s Containerized Forecasting Workflow, a high spatial- and temporal-resolution forecasting system. We propose a two stage risk prediction system for assessing dengue transmission via Aedes aegypti mosquitoes. In stage one, we perform a logistic regression to determine whether larvae are present or absent at the locations of interest using weather attributes as the explanatory variables. The results are then aggregated to an administrative division, with presence in the division determined by a threshold percentage of larvae positive locations resulting from a bootstrap approach. In stage two, larvae counts are estimated for the predicted larvae positive divisions from stage one, using a zero-inflated negative binomial model. This model identifies the larvae positive locations with 71% accuracy and predicts the larvae numbers producing a coverage probability of 98% over 95% nominal prediction intervals. This two-stage model improves the overall accuracy of identifying larvae positive locations by 29%, and the mean squared error of predicted larvae numbers by 9.6%, against a single-stage approach which uses a zero-inflated binomial regression approach. Conclusions We demonstrate a risk prediction system using high resolution weather data can provide valuable insight to the distribution of risk over a geographical region. The work also shows that a two-stage approach is beneficial in predicting risk in non-homogeneous regions, where the risk is localised.