Skip to content

Unified dataset for a better understanding of COVID-19

License

GPL-3.0, Unknown licenses found

Licenses found

GPL-3.0
LICENSE
Unknown
LICENSE.md
Notifications You must be signed in to change notification settings

estellad/COVID19

 
 

Repository files navigation

COVID-19 Data Hub Twitter URL

DOI eRum2020::CovidR

This repository aggregates COVID-19 data at a fine-grained spatial resolution from several sources and makes them available in the form of ready-to-use CSV files available at https://covid19datahub.io

What's included

Variable Description
confirmed Cumulative number of confirmed cases.
deaths Cumulative number of deaths.
recovered Cumulative number of patients released from hospitals or reported recovered.
tests Cumulative number of tests.
vaccines Cumulative number of total doses administered.
people_vaccinated Cumulative number of people who received at least one vaccine dose.
people_fully_vaccinated Cumulative number of people who received all doses prescribed by the vaccination protocol.
hosp Number of hospitalized patients on date.
icu Number of hospitalized patients in intensive therapy on date.
vent Number of patients requiring invasive ventilation on date.
population Total population.

The dataset also includes policy measures by Oxford's government response tracker, and a set of keys to match the data with Google and Apple mobility reports, with the Hydromet dataset, and with spatial databases such as Eurostat for Europe or GADM worldwide.

Download the data

All the data are available to download at the download centre

Interactive visualization

Interactive visualization of the latest data is available here

How it works

COVID-19 Data Hub is developed around 2 concepts:

  • data sources
  • countries

To extract the data for one country, different data sources may be required. For this reason, it is important to keep the two concepts distinct. The code in the R folder is organized in two main types of files:

  • files representing a data source (prefix ds_)
  • files representing a country (prefix iso_)

The ds_ files implement a wrapper to pull the data from a provider and import them in an R data.frame with standardized column names. The iso_ files take care of merging all the data sources needed for one country, and to map the identifiers used by the provider to the id listed in the CSV files. Finally, the function covid19 takes care of downloading the data for all countries at all levels.

The code is run continuously on a dedicated Linux server to crunch the data from the providers. In principle, one can use the function covid19 from the repository to generate the same data we provide at the download centre. However, this takes between 1-2 hours, so that downloading the pre-computed files is typically more convenient.

Contribute

If you find some issues with the data, please report a bug. Suggestions about where to find data that we do not currently provide are also very welcome! Help our project grow: star the repo!

Academic publications

See the publications that use COVID-19 Data Hub.

Terms of use

By using COVID-19 Data Hub, you agree to our terms of use.

Cite as

We have invested a lot of time and effort in creating COVID-19 Data Hub, please cite the following reference when using it:

Guidotti, E., Ardia, D., (2020), "COVID-19 Data Hub", Journal of Open Source Software 5(51):2376, doi: 10.21105/joss.02376.

A BibTeX entry for LaTeX users is:

@Article{,
    title = {COVID-19 Data Hub},
    year = {2020},
    doi = {10.21105/joss.02376},
    author = {Emanuele Guidotti and David Ardia},
    journal = {Journal of Open Source Software},
    volume = {5},
    number = {51},
    pages = {2376}
}

Supported by

R Consortium IVADO HEC Montréal Hack Zurich Università degli Studi di Milano

About

Unified dataset for a better understanding of COVID-19

Resources

License

GPL-3.0, Unknown licenses found

Licenses found

GPL-3.0
LICENSE
Unknown
LICENSE.md

Stars

Watchers

Forks

Packages

No packages published

Languages

  • R 99.3%
  • Other 0.7%