The social media activity in a particular region, reflective of both official figures and unofficial local knowledge, may help predict an impending novel coronavirus disease (COVID-19) outbreak, according to a recent study.
“Our study shows statistically significant evidence that COVID-19 related mean tweet intensity (TI) per region, at the first endogenous attention spike, is able to significantly forecast the spreading of COVID-19 severity, as measured by number of deaths, 1 month forward,” the researchers said.
To quantify the predictive power of wisdom of the crowds, the researchers looked at the initial TI at three critically affected countries—Italy, Spain, and the US—and assessed its correlation with mortality a month after.
In Italy and Spain, TI peaks occurred between 21–24 and 24–26 February 2020, respectively; the US, in comparison, saw a later peak at 3–4 March 2020. Notably, in Italy, the first official regional data for the pandemic was released on the 24th of February, suggesting that the online crowds reacted to both national and global news, in the absence of regional figures, and accurately anticipated the index case in their locality. [Sci Rep 2021;11:13678]
In terms of outbreak severity, the researchers looked at the link between TI and mortality a month forward. They found not only a clear correlation but also that the reaction on social media could also rank COVID-19’s impact on specific regions ahead of official data. For instance, in Italy, initial TI was highest in Lombardy, which would see one of the country’s largest outbreak a month later.
In Spain, the social media clamour was strongest in Madrid, predicting a large outbreak in the city one month later. In the US, New York saw the most severe epidemic in terms of death, which was preceded by a spike in TI 1 month before.
For this analysis, the researchers chose to use mortality rather than case counts, as the latter was too dependent on each country’s testing strategy, and could introduce more confounders into the analysis.
“The crowd’s reaction to COVID-19 spreading measured through tweet intensity on a regional basis is a complex quantity rich of information. People react to both official information and to local knowledge gathered at personal level,” the researchers explained. “Tweets are a process involving both sharing and comparing such information which includes a level of collective processing and assessing of the reliability of the source.”
For Italy and Spain, in particular, tweets provided a source of information largely free from biases, as these countries experienced their worst outbreaks very early on in the pandemic, and their online discussions were less influenced by the general global status of COVID-19.
“At the beginning of an epidemic when it is extremely difficult to have precise information … modelers and public officials are obliged to use general statistical quantities to produce forecasts and consequently implement decisions,” the researchers said.
“Our work shows that the information locally available to the population permeates through Twitter and social media and can be made available to modelers’ and policymakers at the early stages of a crisis when it is most needed,” they added.