The downloadable data files contain information about hospitalisation and Intensive Care Unit (ICU) admission rates and current occupancy for COVID-19 by date and country. Each row contains the corresponding data for a certain date (day or week) and per country.
Install the libraries from CRAN as follows
install.packages('tidyverse')
install.packages('readxl')
install.packages('ggplot2')
install.packages('readr')
OR with devtools package
as follows from github
devtools::install_github("tidyverse/tidyverse")
- Downloadable data from
European Centre for Disease Prevention and Control
Project Phases
-
Data Collection
- Data have been downloaded from European Centre for Disease Prevention and Control
-
Data Cleaning
- This process includes all the necessary edits in order to make the data tidy and in format that can be worked in order to produce insights, such as
i) Coverting dates ii) Changing format of some variables iii) Deleting columns iv) Detecting outliers and NA values
-
Exploratory Data Analysis
- In this stage of the project we produce insights and key visualization that will help ups understand the data and spot trends among them. For this reason we rely on
barplots histograms boxplots
- Importing the dataset
df <- read_excel("TestingForCovid.xlsx")
- Changing format in some variables
df$year <- as.numeric(df$year)
df$new_cases <- as.numeric(df$new_cases)
- Extracting the final dataset into excel
write.xlsx(df, file = 'TestCovid.xlsx')
The goal of this project aims to analyze certain information about hospitalization, ICU(Intensive Care Units) and current occupancy of COVID-19.
- Analzying this dataset i came across with varius problems such as:
1. The column names must be replaced
2. Columns with no use on dataset must be excluded from the dataframe
3. The year_week variable is not following the 3rd normality form
4. The week variable must be inserted in a new column
5. Scientific notation
Tableau Public
,
Microsoft Excel
R Programming language
RStudio