Predicting COVID-19 cases using Reddit posts and other online resources


This paper evaluates the ability to predict COVID-19 caseloads in local areas using the text of geographically specific subreddits, in conjunction with other features. The problem is constructed as a binary classification task on whether the caseload change exceeds a threshold or not. We find that including Reddit features, alongside other informative resources, improves the models’ performance in predicting COVID-19 cases. On top of this, we show that exclusive use of Reddit features can act as a strong alternative data source for predicting a short-term rise in caseload due to its strong performance and the fact that it is readily available and updates instantaneously.

In SwissText 2021
Felix Drinkall
Felix Drinkall
PhD Candidate and British Rowing Athlete

My main research interest is the intersection between Natural Language Processing and Time Series Forecasting.