Datacamp class for master student - 5 days
The aim of this course is to learn data science by doing. All aspects of completing a data science pipeline will be covered, from exploratory data analysis (EDA), feature engineering, parameter optimization to advanced learning algorithms. You will also need to setup your own challenge!
Grade is a mix of your performance on the data challenge offered to the class as well as the challenge you will setup.
Each day you will have 50% of lectures and 50% of work on the competitive challenge using the RAMP website.
- Alexandre Gramfort ([email protected])
- Thomas Moreau ([email protected])
The course will be during the week from Jan 10 to Jan 14 in person and on slack.
To join the slack channel use this URL
On GitHub you have some teaching materials at: https://github.com/x-datascience-datacamp
You must have a GitHub account to complete the course.
- Advanced course on Pandas
- Introduction to the workflow (VSCode, git, github, tests, ...)
- Github assignments: numpy and pandas
- Advanced scikit-learn: Column transformer and pipelines
- Parallel processing with joblib
- Generalization and Cross Validation
- Assignment sklearn
- Getting started on RAMP & Introduction to the challenges (Isotopic inventory of a nuclear reactor core in operation, Follicle Detection and Classification)
- Presentation of the different ML metrics
- Problem of the metric with imbalanced data
- ML approaches to deal with imbalanced data
- Working on data challenges
- Feature engineering and advanced encoding of categorical features
- Model inspection: Partial dependence plots, Feature importance
- Working on data challenges
- From trees to gradient boosting
- Profiling with snakeviz
- Hyperparameter optimization
- Working on data challenges