Skip to content

uchicago-msca-club/DD_PredictingHeartDisease

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Driven Data Competition - Warm Up: Machine Learning with a Heart

Link with dataset : https://www.drivendata.org/competitions/54/machine-learning-with-a-heart/

The file SamplePipeline.ipynb contains a baseline pipeline for working with the dataset

Getting access to this notebook -

  1. Fork this repo into your personal github profile
  2. Ensure you can view/open the notebook by either opening it in Github or using https://nbviewer.jupyter.org/

    NOTE : NBViewer requires the repo to be public

  3. Clone the repo into your local desktop/laptop and run the IPYNB file using jupyter

    NOTE : Make sure the dataset and paths are matching.

Making changes to the notebook (example) -

  1. Uncomment lines within the EDA section in the notebook, run the whole notebook again

    NOTE: The pairplot will take some time

  2. commit the changes with a comment (mandatory) and push them into your personal repo
  3. Check in your online github profile to see if the notebook is rendering the changes (or use nbviewer)

Required improvments -

  1. Code documentation in steps such as "Are there any missing data points?"
  2. Train and test split MUST be done BEFORE encoding and scaling

Merging changes from this repo to your forked repo

This article is where I learnt it from : https://digitaldrummerj.me/git-sync-fork-to-master/

In a nutshell, Open the terminal in your working folder (the folder where your forked repo is), then type

  1. git remote add upstream [original repo path].git
  2. git fetch upstream
  3. git merge upstream/master
    • Resolve merge conflicts if any
  4. git push

About

Analysis pipeline for sample data set from Driven data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published