Forecasting COVID-19 Caseloads Using Unsupervised Embedding Clusters of Social Media Posts

Name: Forecasting COVID-19 Caseloads Using Unsupervised Embedding Clusters of Social Media Posts
Start: 2022-07-10T00:00:00Z
End: 2022-07-15T00:00:00Z
Location: Hyatt Regency Seattle

Abstract

We present a novel approach incorporating transformer-based language models into infectious disease modelling. Text-derived features are quantified by tracking high-density clusters of sentence-level representations of Reddit posts within specific US states' COVID-19 subreddits. We benchmark these clustered embedding features against features extracted from other high-quality datasets. In a threshold-classification task, we show that they outperform all other feature types at predicting upward trend signals, a significant result for infectious disease modelling in areas where epidemiological data is unreliable. Subsequently, in a time-series forecasting task we fully utilise the predictive power of the caseload and compare the relative strengths of using different supplementary datasets as covariate feature sets in a transformer-based time-series model.

Date

Jul 10, 2022 12:00 AM — Jul 15, 2022 12:00 AM

Event

NAACL 2022

Location

Hyatt Regency Seattle

808 Howell Street, Seattle, Washington 98101

Forecasting COVID-19 Caseloads Using Unsupervised Embedding Clusters of Social Media Posts

Abstract

Felix Drinkall

Oxford PhD Student and ex-GB Athlete