Datasciencegame

This was a competition between universities and research institutes around the globe hosted by datasciencegame as an in-class kaggle challenge. This challenge was based on Deezer's music streaming data with the task of predicting a user's probability to listen to a recommended song. I was the team leader of a group of 3 fellow data science students and our final predictions consisted of a blend of gradient boosting models. This placed us in front of other top universities such as University of Cambridge, Imperial College, Berkely and LSE.

The main challenge of this dataset was that training and test sets came from different distributions and thus led to inconsistent local cross-validation results compared to the public leaderboard scores on kaggle. To mitigate this, we used "adversarial validation" (http://fastml.com/adversarial-validation-part-one/) to sort the training data by its probability of being different from the test data samples. As a result of this procedure, we used a simple validation dataset which resembled the test set most, thus guaranteeing us a consistent evaluation process. Moreover, we reduced the amount of data (8+million samples) considerably by deleting 75% of the provided samples, leading to a huge speed up and even slight performance gain. Moreover, engineered features such as the days between listening and release date also proved to be powerful. The final predictions consisted of a blend of different gradient boosting models from the xgboost and lightgbm python libraries.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
README.md		README.md
adversarial_training.py		adversarial_training.py
blending_datasciencegame.py		blending_datasciencegame.py
blending_validation.py		blending_validation.py
check_for_iid(train_vs_test).py		check_for_iid(train_vs_test).py
context_type_models.py		context_type_models.py
id_models.py		id_models.py
kmeans_models.py		kmeans_models.py
lgb_datasciencegame.py		lgb_datasciencegame.py
lgb_tuning.py		lgb_tuning.py
load_data.py		load_data.py
load_data2.py		load_data2.py
meta_bag_datsciencegame.py		meta_bag_datsciencegame.py
rank_data.py		rank_data.py
simple_validation_datasciencegame.py		simple_validation_datasciencegame.py
stacking_light_datasciencegame.py		stacking_light_datasciencegame.py
tsne_datasciencegame.R		tsne_datasciencegame.R
vw.py		vw.py
xgb_datasciencegame.py		xgb_datasciencegame.py
xgb_tuning.py		xgb_tuning.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Datasciencegame

About

Releases

Packages

Languages

foxchopin/Datasciencegame-

Folders and files

Latest commit

History

Repository files navigation

Datasciencegame

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages