Wastewater-Based Prediction of COVID-19 Cases Using a Random Forest Algorithm with Strain Prevalence Data: a Case Study of Five Municipalities in Latvia
Science of the Total Environment 2023
Brigita Dejus, Pavels Cacivkins, Dita Gudra, Sandis Dejus, Maija Ustinova, Ance Roga, Mārtiņš Strods, Juris Kibilds, Guntis Boikmanis, Karina Ortlova, Laura Krivko, Liga Birzniece, Edmunds Skinderskis, Aivars Berzins, Davids Fridmanis, Tālis Juhna

Wastewater-based epidemiology (WBE) is a rapid and cost-effective method that can detect SARS-CoV-2 genomic components in wastewater and can provide an early warning for possible COVID-19 outbreaks up to one or two weeks in advance. However, the quantitative relationship between the intensity of the epidemic and the possible progression of the pandemic is still unclear, necessitating further research. This study investigates the use of WBE to rapidly monitor the SARS-CoV-2 virus from five municipal wastewater treatment plants in Latvia and forecast cumulative COVID-19 cases two weeks in advance. For this purpose, a real-time quantitative PCR approach was used to monitor the SARS-CoV-2 nucleocapsid 1 (N1), nucleocapsid 2 (N2), and E genes in municipal wastewater. The RNA signals in the wastewater were compared to the reported COVID-19 cases, and the strain prevalence data of the SARS-CoV-2 virus were identified by targeted sequencing of receptor binding domain (RBD) and furin cleavage site (FCS) regions employing next-generation sequencing technology. The model methodology for a linear model and a random forest was designed and carried out to ascertain the correlation between the cumulative cases, strain prevalence data, and RNA concentration in the wastewater to predict the COVID-19 outbreak and its scale. Additionally, the factors that impact the model prediction accuracy for COVID-19 were investigated and compared between linear and random forest models. The results of cross-validated model metrics showed that the random forest model is more effective in predicting the cumulative COVID-19 cases two weeks in advance when strain prevalence data are included. The results from this research help inform WBE and public health recommendations by providing valuable insights into the impact of environmental exposures on health outcomes.


Keywords
Parameter importance; Random forest model; SARS-CoV-2; Wastewater-based epidemiology
DOI
10.1016/j.scitotenv.2023.164519
Hyperlink
https://www.sciencedirect.com/science/article/pii/S0048969723031406?via%3Dihub

Dejus, B., Cacivkins, P., Gudra, D., Dejus, S., Ustinova, M., Roga, A., Strods, M., Kibilds, J., Boikmanis, G., Ortlova, K., Krivko, L., Birzniece, L., Skinderskis, E., Berzins, A., Fridmanis, D., Juhna, T. Wastewater-Based Prediction of COVID-19 Cases Using a Random Forest Algorithm with Strain Prevalence Data: a Case Study of Five Municipalities in Latvia. Science of the Total Environment, 2023, Vol. 891, Article number 164519. ISSN 0048-9697. Pieejams: doi:10.1016/j.scitotenv.2023.164519

Publication language
English (en)
The Scientific Library of the Riga Technical University.
E-mail: uzzinas@rtu.lv; Phone: +371 28399196