Kung-hsiang (Steeve), Huang (Rosetta.ai); Yi-fu, Fu; Yi-ting, Lee; Tzong-hann, Lee; Yao-chun, Chan (National Taiwan University); Yi-hui, Lee (University of Texas, Dallas); Shou-de, Lin (National Taiwan University)
Contact: [email protected]
This repository contains RosettaAI's approach to the 2019 ACM Recys Challenge (paper, writeup). Instead of treating it as a ranking problem, we use Binary Cross Entropy as our loss function. Three different models were implemented:
- Neural Networks (based on DeepFM and this Youtube paper)
- LightGBM
- XGBoost
- Ubuntu 16.04
- CUDA 9.0
- Python==3.6.8
- Numpy==1.16
- Pandas==0.24.2
- PyTorch==1.1.0
- Sklearn==0.21.2
- Scipy==1.3.0
- LightGBM==2.2.4
- XGBoost==0.9
- timezonefinder==4.0.3
- geopy==1.20.0
├── input
├── output
├── src
└── weights
Run the following commands to create directories that conform to the structure of the project, then place the unzipped data into the input
directory.:
. setup.sh
Run the two python scripts to picklize the input data and obtain the utc offsets from countries:
cd src
python picklization.py
python country2utc.py
To enable the model to train on the whole data, set debug
and subsample
to False
in the config.py
file.
class Configuration(object):
def __init__(self):
...
self.debug = False
self.sub_sample = False
...
The models are all trained in an end-to-end fashion. To train and predict each of the three models, simply run the following commands:
python run_nn.py
python run_lgb.py
python run_xgb.py
The submission files are stored in the output
directory.
The results generated from LightGBM alone would place us at the 5th position in the public leaderboard. To ensemble these three models, change the output name of each model in Merge.ipynb
and run it.
Model | Local Validation MRR | Public Leaderboard MRR |
---|---|---|
LightGBM | 0.685787 | N/A |
XGBoost | 0.684521 | 0.681128 |
NN | 0.675206 | 0.672117 |