Hi everyone! Welcome to the AI in Medicine Student Society's Intro to Data Pre-Processing workshop.
This repository stores the data we are going to be using and the steps to set up the Colab notebook.
We need somewhere to keep the data and the code:
- Visit https://drive.google.com/drive/my-drive
- Create folder
aimss_data_workshop
Because we're going to be using many different datasets, I made them available in a .zip file.
- Above, click
⭳ Code > Download ZIP
, and save zip file above to your local machine - Unzip file to extract data
- Upload all newly-extracted files to
aimss_data_workshop
in your Google Drive
In order to edit the demo notebook, you must save your own copy on your drive.
- Visit https://colab.research.google.com/drive/1J9osIwMhiFeuf_XfQnK06NScdhTvysL0?usp=sharing
- Follow
File > Save a Copy In Drive
- OPTIONAL: On your copy, follow
File > Locate in Drive
. Move notebook toaimss_data_workshop
. (If this is not done, it will be in a new folder in your Drive calledColab Notebooks
)
The slides that are shared during the presentation can be found here for your reference: https://docs.google.com/presentation/d/1--nUUAL-1UUa9Ecr4y_hpF0LBME3B0EmmLTx60_oqtE/edit?usp=sharing
Here are all the original links for the datasets (plus some extra!)
https://www.kaggle.com/mirichoi0218/insurance
https://www.kaggle.com/kmader/siim-medical-images
https://www.kaggle.com/tboyle10/medicaltranscriptions
https://www.kaggle.com/jboysen/mri-and-alzheimers
https://gdc.cancer.gov/about-data/publications/pancanatlas
https://data.world/associatedpress/covid-impact-survey-public-data
https://www.kaggle.com/ymirsky/medical-deepfakes-lung-cancer
http://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+%28diagnostic%29