COVID-19 and the Weather: Data from Italy

Slater Stich
Insight
Published in
5 min readApr 6, 2020

--

Do temperature and humidity affect the spread of coronavirus? We looked for correlations between weather and SARS-CoV-2 spread across the 21 regions of Italy, since these different regions have had different weather over the last few weeks. We didn’t find any strong relationships in the data we have so far. Below, we explain where we looked, and what we found.

Regional Regression Analysis

Italy is divided into 21 administrative regions, and these regions have had some weather variation over the last few weeks. For example, from March 1st to March 23rd, the average hourly temperature in Aosta Valley was only 3.4˚C, but the average temperature in Sicily was 13.6˚C.

We ran a regression of cumulative daily case growth against average temperature and relative humidity for the period from March 1st to March 23rd. We attempted to control for population density and GDP per capita, though it’s worth noting that these numbers can vary substantially within each region. We didn’t find any significant coefficients:

Different regions of Italy had outbreaks that started at different times, so another way to conduct this analysis is to index off of an initial threshold value, such as reaching 100 confirmed cases. We ran another regression, looking at the weather 7 days before and 7 days after a region first reported a cumulative case count above 100. Again, we didn’t find any significant coefficients:

A problem with both these regressions is that we don’t have a lot of data — there’s only one row per region, so 21 rows in total. And of course, case count data is inherently messy and incomplete. For the interested, we’re copying the summary data for our regression here.

Regional Scatterplots

We also drew some scatter plots of case counts against temperature and humidity. This is effectively the same data behind the first regression above. The box in the middle represents the temperature range between 5°C and 11°C degrees and the relative humidity range between 47% and 79%. We looked at this box in particular because one early preprint hypothesized that SARS-CoV-2 might only be able to achieve significant community spread within these ranges.

While it’s true that the highest case-count regions are inside the box, so is most of the country. And significant outbreaks (1000+ cases) occurred outside the given temperature and humidity ranges.

As with the regressions, we can also index this data off of reaching a threshold case count, rather than looking at a fixed time period. Here’s what we see if we look at temperature and humidity in the two weeks before each region reached 250+ cases.

Again, we don’t see a strong pattern here; about half the regions are inside the box. We can also look specifically at the amount of time spent “in band” (i.e. within the temperature and relative humidity ranges above):

It’s tempting to draw a regression line on this plot, but we should remember that regional scatterplots like this one can be misleading — in this case, regions that are close together are likely to have similar weather and are also likely to have correlated outbreaks (because people are more likely to travel to adjacent regions). Veneto, Emilia Romagna, and Piedmont all border Lombardy.

Time Series

As a final note, we thought it would be interesting to look at the time series for individual regions. The naive idea is that if temperate and humidity are strong drivers of transmissibility, then whenever temperature and humidity are inhospitable to SARS-CoV-2, we might be able to see lower growth rates in case counts approximately 5 days later. (We’re saying “naive” because it’d be hard to see this effect directly in the time series, even if it existed — there are a lot of confounders, and incubation periods range from 2 to 14 days.) Here’s the “heartbeat chart” for Sicily, for example:

(If you want to see the heartbeat graphs for other regions, you can get them here.) Again, we weren’t able to see meaningful patterns here.

Other Work & Conclusions

Even though we weren’t able to see the impact of temperature and humidity in our data set, others have. Several epidemiological teams have researched this, and the tentative conclusion is that temperature and humidity probably do matter, though the effect size may be small. We think that the CEBM has a good summary of this emerging research.

Data

We decided to look at Italy alone because different countries test for COVID-19 very differently; by looking at a single country, we hoped to partially control for this. Our weather dataset comes from Dark Sky, collected at the lat/long pair that Google Maps assigns to each region in Italy, and our case counts come from the Italian COVID-19 GitHub repo.

Article by Slater Stich & Adam Azzam. Thanks to Jake Klamka and Paul Graham for feedback on drafts of this.

Are you interested in working on high-impact projects and transitioning to a career in data? Learn more about the Insight Fellows programs.

--

--