2020-03-19 | Watch the video | This folder contains the presentation and sample notebooks
Hi to all data scientists (amateur or professional):
My name is Denny Lee and I’m a Developer Advocate at Databricks. But before this, I was a biostatistician working on HIV/AIDS research at the Fred Hutchinson Cancer Research Center and University of Washington Virology Lab in the Seattle-area. Watching my friends and colleagues working the front lines of this current pandemic has inspired me to see if we - as the data scientist community - can potentially help with “flattening the curve”. But before we dive into data science, remember - the most important thing you can do is wash your hands and social distancing! A great reference is How to Protect Yourself.
With the current concerns over SARS-Cov-2 and COVID-19, there are now available various COVID-19 datasets on Kaggle and GitHub as well as competitions such as the COVID-19 Open Research Dataset Challenge (CORD-19). Whether you are a student or a professional data scientist, we thought we could help out by providing a primer session with notebooks on how to start analyzing these datasets.
For this primer session we will review (and shortly publish thereafter) iPython notebooks working with Apache Spark and/or Pandas (or both) for the following datas sets.
- South Korea COVID-19 Dataset
- 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE
- COVID-19 Open Research Dataset Challenge (CORD-19)
If interested, we will also follow up with deeper dive workshop sessions for each of these notebooks and datasets to help you jump start your analysis of these datasets. In this uncertain time, let's see what we can do to make sense of this data and help each other out!
And don't forget that washing your hands and social distancing are the most important things you can do to help!
Thanks!
Denny Lee (@dennylee)