The repository aims at unifying COVID-19 datasets across different sources in order to simplify the data acquisition process and the subsequent analysis. You are welcome to join and contribute by extending the number of supporting data sources as a joint effort against COVID-19.
The data are available to the end-user via the R package COVID19 or in csv format (see below or on Kaggle).
Provide the research community with a unified data hub by collecting worldwide fine-grained data merged with demographics, air pollution, and other exogenous variables helpful for a better understanding of COVID-19.
The data are collected with the R package COVID19. For R users, the COVID19 package is the recommended way to interact with the dataset. For non R users, the data are provided in csv format and regularly updated (see below or on Kaggle).
Whether or not you are an R user... take part in the data collection! Your contribution will be gratefully acknowledged. See how to contribute.
Simple, yet effective R package to acquire tidy format datasets of the 2019 Novel Coronavirus COVID-19 (2019-nCoV) epidemic. The data are downloaded in real-time, cleaned and matched with exogenous variables.
# Install COVID19
install.packages("COVID19")
# Load COVID19
require("COVID19")
covid19(ISO = NULL, level = 1, start = "2019-01-01", end = Sys.Date(), vintage = FALSE, raw = FALSE, cache = TRUE)
Argument | Description |
---|---|
ISO |
vector of ISO codes to retrieve (alpha-2, alpha-3 or numeric). Each country is identified by one of its ISO codes |
level |
integer. Granularity level. 1: country-level data. 2: state-level data. 3: city-level data. |
start |
the start date of the period of interest. |
end |
the end date of the period of interest. |
vintage |
logical. Retrieve the snapshot of the dataset at the end date instead of using the latest version? Default FALSE . |
raw |
logical. Skip data cleaning? Default FALSE . |
cache |
logical. Memory caching? Significantly improves performance on successive calls. Default TRUE . |
The raw data are cleaned by filling missing dates with NA
values. This ensures that all countries share the same grid of dates and no single day is skipped. Then, NA
values are replaced with the previous non-NA
value or 0
for the following variables:
deaths
,confirmed
,tests
,recovered
,icu
,hosp
,vent
driving
,walking
,transit
If no data is available at a granularity level (country/state) but is available at a lower level (state/city), the higher level data are obtained by aggregating the lower level data.
# Worldwide data by country
covid19()
# Worldwide data by state
covid19(level = 2)
# US data by state
covid19("USA", level = 2)
# Swiss data by state (cantons)
covid19("CHE", level = 2)
# Italian data by state (regions)
covid19("ITA", level = 2)
# Italian and US data by city
covid19(c("ITA","USA"), level = 3)
Variable | Description |
---|---|
id |
location identifier. |
date |
observation time. |
deaths |
cumulative number of deaths. |
confirmed |
cumulative number of confirmed cases. |
tests |
cumulative number of tests. |
recovered |
cumulative number of patients released from hospitals or reported recovered. |
hosp |
number of hospitalized patients on date. |
icu |
number of hospitalized patients in ICUs on date. |
vent |
number of patients requiring invasive ventilation on date. |
driving |
relative volume of (driving) directions requests compared to a baseline volume on January 13th, 2020. https://www.apple.com/covid19/mobility |
walking |
relative volume of (walking) directions requests compared to a baseline volume on January 13th, 2020. https://www.apple.com/covid19/mobility |
transit |
relative volume of (transit) directions requests compared to a baseline volume on January 13th, 2020. https://www.apple.com/covid19/mobility |
country |
administrative area of top level. |
state |
administrative area of a lower level, usually states, regions or cantons. |
city |
administrative are of a lower level, usually cities or municipalities. |
lat |
latitude. |
lng |
longitude. |
pop |
total population. |
pop_14 |
population ages 0-14 (% of total population)*. |
pop_15_64 |
population ages 15-64 (% of total population).** |
pop_65 |
population ages 65+ (% of total population). |
pop_age |
median age of population. |
pop_density |
population density per km2. |
pop_death_rate |
population mortality rate. |
* Switzerland: ages 0-19
** Switzerland: ages 20-64
CSV datasets of the 2019 Novel Coronavirus COVID-19 (2019-nCoV) epidemic. The files are generated with the R package COVID19 and updated daily.
Clean data
- https://storage.guidotti.dev/covid19/data-1.csv (country-level)
- https://storage.guidotti.dev/covid19/data-2.csv (state-level)
- https://storage.guidotti.dev/covid19/data-3.csv (city-level)
Raw data
- https://storage.guidotti.dev/covid19/rawdata-1.csv (country-level)
- https://storage.guidotti.dev/covid19/rawdata-2.csv (state-level)
- https://storage.guidotti.dev/covid19/rawdata-3.csv (city-level)
Help improve the data coverage and add new countries and variables. See how to contribute.
The following sources are gratefully acknowledged for making the data available to the public.
The following people have contributed to the data collection as a joint effort against COVID-19.
deaths | confirmed | tests | recovered | hosp | icu | vent | driving | walking | transit | lat | lng | pop | pop_14 | pop_15_64 | pop_65 | pop_age | pop_density | pop_death_rate | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
World | |||||||||||||||||||
by country | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | ||||
US | |||||||||||||||||||
by state | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | ||||||||||||||
by city | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | ||||||||||||||
Italy | |||||||||||||||||||
by region | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | ||||
by city | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | |||||||||
Switzerland | |||||||||||||||||||
by canton | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti | E.Guidotti |
- Monitoring the advancement of the COVID–19 contagion in the regions of Italy
- Covid19 Incidence History
Emanuele Guidotti, “Coronavirus COVID-19 (2019-nCoV) Epidemic Datasets.” Kaggle, doi: 10.34740/KAGGLE/DS/574488.