X-Datascience Datacamp

Datacamp class for master student - 5 days

The aim of this course is to learn data science by doing. All aspects of completing a data science pipeline will be covered, from exploratory data analysis (EDA), feature engineering, parameter optimization to advanced learning algorithms. You will also need to setup your own challenge!

Grade is a mix of your performance on the data challenge offered to the class as well as the challenge you will setup.

Each day you will have 50% of lectures and 50% of work on the competitive challenge using the RAMP website.

The slides used in some of the lectures are available here.

Instructors:

Location

The course will be during the week from Dec 16 to Dec 20 2024 in person.

To join the discord channel use this URL.

On GitHub you have some of the teaching materials at: https://github.com/x-datascience-datacamp

You must have a GitHub account to complete the course.

Setup:

We will be using many Python packages in this course such as pandas, sklearn, and matplotlib, and they can all be downloaded and installed using a package-management system. We recommend you to use mamba but you will be fine if you already have conda installed in your computer.

NB: Windows users should be sure to closely follow the instructions for installing mamba and conda, since many common problems come from not having properly setup the PATH variable for the system.

Day 1: Data wrangling

Introduction to the workflow (VSCode, python distribution, git, github, tests, ...)
Advanced course on Pandas
Github assignments: numpy and pandas

Day 2: ML Pipelines and model evaluation

Advanced scikit-learn: Column transformer and pipelines
Parallel processing with joblib
Generalization and Cross Validation
Assignment sklearn
Getting started on RAMP & Introduction to the challenges.

Day 3: Metrics and dealing with unbalanced data

Presentation of the different ML metrics
Problem of the metric with imbalanced data
ML approaches to deal with imbalanced data
Working on data challenges

Day 4: Working with complex data

Feature engineering and advanced encoding of categorical features
Working with signals and time series
Model inspection: Partial dependence plots, Feature importance
Working on data challenges

Day 5: Ensemble methods and hyperparameter optimization

From trees to gradient boosting
Profiling with snakeviz
Hyperparameter optimization
Working on data challenges

Institutional information

This class is teached in the context of the Master Data Science at Institut Polytechnique de Paris.
It receives support from Hi!Paris and DataIA.

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
01_pandas		01_pandas
02_pipelines_and_column_transformers		02_pipelines_and_column_transformers
03_generalization_and_cv		03_generalization_and_cv
04_metrics		04_metrics
05_imbalance_learning		05_imbalance_learning
06_feature_engineering		06_feature_engineering
07_model_interpretation		07_model_interpretation
08_visualization		08_visualization
09_trees_gradient_boosting		09_trees_gradient_boosting
10_hyperparameter_optimization		10_hyperparameter_optimization
images		images
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

X-Datascience Datacamp

Instructors:

Location

Setup:

Day 1: Data wrangling

Day 2: ML Pipelines and model evaluation

Day 3: Metrics and dealing with unbalanced data

Day 4: Working with complex data

Day 5: Ensemble methods and hyperparameter optimization

Institutional information

About

Releases

Packages

Contributors 5

Languages

License

x-datascience-datacamp/datacamp-master

Folders and files

Latest commit

History

Repository files navigation

X-Datascience Datacamp

Instructors:

Location

Setup:

Day 1: Data wrangling

Day 2: ML Pipelines and model evaluation

Day 3: Metrics and dealing with unbalanced data

Day 4: Working with complex data

Day 5: Ensemble methods and hyperparameter optimization

Institutional information

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages