Skip to content

yangzhang33/datacamp-master

Repository files navigation

X-Datascience Datacamp

Datacamp class for master student - 5 days

The aim of this course is to learn data science by doing. All aspects of completing a data science pipeline will be covered, from exploratory data analysis (EDA), feature engineering, parameter optimization to advanced learning algorithms. You will also need to setup your own challenge!

Grade is a mix of your performance on the data challenge offered to the class as well as the challenge you will setup.

Each day you will have 50% of lectures and 50% of work on the competitive challenge using the RAMP website.

Instructors:

Location

The course will be during the week from Jan 10 to Jan 14 in person and on slack.

To join the slack channel use this URL

On GitHub you have some teaching materials at: https://github.com/x-datascience-datacamp

You must have a GitHub account to complete the course.

Day 1: Data wrangling

  • Advanced course on Pandas
  • Introduction to the workflow (VSCode, git, github, tests, ...)
  • Github assignments: numpy and pandas

Day 2: ML Pipelines and model evaluation

  • Advanced scikit-learn: Column transformer and pipelines
  • Parallel processing with joblib
  • Generalization and Cross Validation
  • Assignment sklearn
  • Getting started on RAMP & Introduction to the challenges (Isotopic inventory of a nuclear reactor core in operation, Follicle Detection and Classification)

Day 3: Metrics and dealing with unbalanced data

  • Presentation of the different ML metrics
  • Problem of the metric with imbalanced data
  • ML approaches to deal with imbalanced data
  • Working on data challenges

Day 4: Feature engineering and model inspection

  • Feature engineering and advanced encoding of categorical features
  • Model inspection: Partial dependence plots, Feature importance
  • Working on data challenges

Day 5: Ensemble methods and hyperparameter optimization

  • From trees to gradient boosting
  • Profiling with snakeviz
  • Hyperparameter optimization
  • Working on data challenges

About

Datacamp class for master student - 1 week

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 96.5%
  • Python 3.5%