This is the code for a Top-aware Recommender Distillation framework - TRD with Deep Reinforcement Learning. The TRD can absorb the essence of state-of-the-art recommenders to further improve the performance of recommendations at the top positions.
For clarify, we use MovieLens-100k dataset as a example and treat the BPRMF method as the teacher model in the TRD framework.
-
- Filter the datasets and split the datasets into training set, validation set and test set.
-
- The teacher model can be well trained by using the historical user-item interaction data. After training, we can get the distilled knowledge (i.e., user and item embeddings as well as the basic recommendation list) from the teacher model.
-
- We treat the distilled knowledge as the input and adopt the Deep Q-Networks (DQN) [1] as the student model. The student model aims to reinforce and refine the basic recommendation list.
-
Firstly, we need install the dependent extensions.
python setup.py build_ext --inplace
-
Then we run the code to load the dataset and produce the experiment data. If you want to use other datasets, you need modify the code in
data_generator.py
python data_generator.py
-
Next, we run the code to get the distilled knowledge from the teacher model. More details of arguments are available in help message :
python run_pair_mf_train.py --help
python run_pair_mf_train.py --dataset=ml-100k --prepro=origin
-
Finally, we train the student model and generate the refined recommendation lists on test set. More details of arguments are available in help message :
python run_trd.py --help
python run_trd.py --dataset=ml-100k --prepro=origin --method=bprmf --n_actions=20 --pred_score=0
- Python 3.6
- Torch (>=1.1.0)
- Numpy (>=1.18.0)
- Pandas (>=0.24.0)
We refer to the following repositories to improve our code:
- state-of-the-art recommendation algorithms with daisyRec [2]
- DDPG part with RL_DDPG_Recommendation
[1] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G.Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski,565et al., Human-level control through deep reinforcement learning, Nature 518 (7540) (2015) 529-533.
[2] Sun, Zhu and Yu, Di and Fang, Hui and Yang, Jie and Qu, Xinghua and Zhang, Jie and Geng, Cong. Are we evaluating rigorously? benchmarking recommendation for reproducible evaluation and fair comparison. ACM RecSys, 2020.