Skip to content

foxchopin/Datasciencegame-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Datasciencegame

This was a competition between universities and research institutes around the globe hosted by datasciencegame as an in-class kaggle challenge. This challenge was based on Deezer's music streaming data with the task of predicting a user's probability to listen to a recommended song. I was the team leader of a group of 3 fellow data science students and our final predictions consisted of a blend of gradient boosting models. This placed us in front of other top universities such as University of Cambridge, Imperial College, Berkely and LSE.

The main challenge of this dataset was that training and test sets came from different distributions and thus led to inconsistent local cross-validation results compared to the public leaderboard scores on kaggle. To mitigate this, we used "adversarial validation" (http://fastml.com/adversarial-validation-part-one/) to sort the training data by its probability of being different from the test data samples. As a result of this procedure, we used a simple validation dataset which resembled the test set most, thus guaranteeing us a consistent evaluation process. Moreover, we reduced the amount of data (8+million samples) considerably by deleting 75% of the provided samples, leading to a huge speed up and even slight performance gain. Moreover, engineered features such as the days between listening and release date also proved to be powerful. The final predictions consisted of a blend of different gradient boosting models from the xgboost and lightgbm python libraries.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published