Datacamp class for master student - 5 days
The aim of this course is to learn data science by doing. All aspects of completing a data science pipeline will be covered, from exploratory data analysis (EDA), feature engineering, parameter optimization to advanced learning algorithms. You will also need to setup your own challenge!
Grade is a mix of your performance on the data challenge offered to the class as well as the challenge you will setup.
Each day you will have 50% of lectures and 50% of work on the competitive challenge using the RAMP website.
- Alexandre Gramfort ([email protected])
- Thomas Moreau ([email protected])
The course will be during the week from Jan 9 to Jan 13 in person.
To join the discord channel use this URL
On GitHub you have some teaching materials at: https://github.com/x-datascience-datacamp
You must have a GitHub account to complete the course.
- Advanced course on Pandas
- Introduction to the workflow (VSCode, git, github, tests, ...)
- Github assignments: numpy and pandas
- Advanced scikit-learn: Column transformer and pipelines
- Parallel processing with joblib
- Generalization and Cross Validation
- Assignment sklearn
- Getting started on RAMP & Introduction to the challenges.
- Presentation of the different ML metrics
- Problem of the metric with imbalanced data
- ML approaches to deal with imbalanced data
- Working on data challenges
- Feature engineering and advanced encoding of categorical features
- Model inspection: Partial dependence plots, Feature importance
- Working on data challenges
- From trees to gradient boosting
- Profiling with snakeviz
- Hyperparameter optimization
- Working on data challenges