X-Datascience Datacamp

Datacamp class for master student - 5 days

The aim of this course is to learn data science by doing. All aspects of completing a data science pipeline will be covered, from exploratory data analysis (EDA), feature engineering, parameter optimization to advanced learning algorithms. You will also need to setup your own challenge!

Grade is a mix of your performance on the data challenge offered to the class as well as the challenge you will setup.

Each day you will have 50% of lectures and 50% of work on the competitive challenge using the RAMP website.

Instructors:

Alexandre Gramfort ([email protected])
Thomas Moreau ([email protected])

Location

The course will be during the week from Jan 9 to Jan 13 in person.

To join the discord channel use this URL

On GitHub you have some teaching materials at: https://github.com/x-datascience-datacamp

You must have a GitHub account to complete the course.

Day 1: Data wrangling

Advanced course on Pandas
Introduction to the workflow (VSCode, git, github, tests, ...)
Github assignments: numpy and pandas

Day 2: ML Pipelines and model evaluation

Advanced scikit-learn: Column transformer and pipelines
Parallel processing with joblib
Generalization and Cross Validation
Assignment sklearn
Getting started on RAMP & Introduction to the challenges.

Day 3: Metrics and dealing with unbalanced data

Presentation of the different ML metrics
Problem of the metric with imbalanced data
ML approaches to deal with imbalanced data
Working on data challenges

Day 4: Feature engineering and model inspection

Feature engineering and advanced encoding of categorical features
Model inspection: Partial dependence plots, Feature importance
Working on data challenges

Day 5: Ensemble methods and hyperparameter optimization

From trees to gradient boosting
Profiling with snakeviz
Hyperparameter optimization
Working on data challenges

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
01_pandas		01_pandas
02_pipelines_and_column_transformers		02_pipelines_and_column_transformers
03_generalization_and_cv		03_generalization_and_cv
04_metrics		04_metrics
05_imbalance_learning		05_imbalance_learning
06_feature_engineering		06_feature_engineering
07_model_interpretation		07_model_interpretation
08_visualization		08_visualization
09_trees_gradient_boosting		09_trees_gradient_boosting
10_hyperparameter_optimization		10_hyperparameter_optimization
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

X-Datascience Datacamp

Instructors:

Location

Day 1: Data wrangling

Day 2: ML Pipelines and model evaluation

Day 3: Metrics and dealing with unbalanced data

Day 4: Feature engineering and model inspection

Day 5: Ensemble methods and hyperparameter optimization

About

Releases

Packages

Languages

License

yangzhang33/datacamp-master

Folders and files

Latest commit

History

Repository files navigation

X-Datascience Datacamp

Instructors:

Location

Day 1: Data wrangling

Day 2: ML Pipelines and model evaluation

Day 3: Metrics and dealing with unbalanced data

Day 4: Feature engineering and model inspection

Day 5: Ensemble methods and hyperparameter optimization

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages