Datacamp class for master student - 4 days
The aim of this course is to learn data science by doing. All aspects of completing a data science pipeline will be covered, from exploratory data analysis (EDA), feature engineering, parameter optimization to advanced learning algorithms. You will also need to setup your own challenge!
Grade is a mix of your performance on the data challenge offered to the class as well as the challenge you will setup.
Each you will 50% of lectures and 50% of work on the competitive challenge using the RAMP website.
- Alexandre Gramfort ([email protected])
- Thomas Moreau ([email protected])
- Introduction to the workflow (VSCode, git, github, tests, ...)
- Advanced course on Pandas
- Github assignments: numpy and pandas
- Advanced scikit-learn: Column transformer and pipelines
- Generalization and Cross Validation
- Getting started on RAMP: Challenge.0 - Brevet des colleges
- Presentation of the different ML metrics
- Problem of the metric with imbalanced data
- ML approaches to deal with imbalanced data
- Introduction of the challenges:
- Feature engineering and dealing with categorical features
- Model inspection: Partial dependence plots, Feature importance, SHAP