This repo contains the source codes used for my final project in the class "Introduction to Deep Learning System".
The project is a TensorFlow 2.4.0-rc4 implementation of the GAN-based user models proposed in the paper Generative Adversarial User Model for Reinforcement Learning Based Recommendation System, by Chen et al. The implementation is then tested on the Retailrocket recommender system dataset.
To reproduce results of this project:
-
make sure Tensorflow >= v2.4.0-rc4 has been installed
-
install the user model package
ganrl-tfv2
withpip install -e
-
download data set from this link.
-
download the notebook "data_processing.ipynb" and extract data needed (experiment did not use the complete data set due to hardware constraints).
-
place the select data under the folder
./dropbox
and run./process_data.sh
to get the serialized pickle files
Simply run in terminal:
cd ganrl-tfv2/experiment_user_model/
./run_gan_user_model.sh
For more details regarding options for model configuration, check the ./common/cmd_args.py
.
Fine-tuning the reward model in an adversarial way helps increase generalizability of the resulting model
Also, we notice that pre-training with Shannon Entropy is essential for model convergence. We can see the result deterioated dramatically with the "no pretrain" model.
Skipping Pre Training Leads to not only bad generalizability but convergence.
Adding batchnormalization layers worsen convergence.
Finally, let's compare the final test results of all models
- Pre-training with Shannon Entropy helps reduce instability in the training process of our GAN user models and leads to better validation scores.
- Directly training the user models adversarially lead to instability in the training process, potential divergence, and bad generalization results.
- Models trained using Shannon Entropy improves slowly; however, fine-tuning adversarially afterwards would restart the progression and obtain better result.