Amazon Reviews 2023

[🌐 Website] · [🤗 Huggingface Datasets] · [📑 Paper] · [🔬 McAuley Lab]

This repository contains:

Scripts for processing Amazon Reviews 2023 dataset into recommendation benchmarks;
Checkpoints & implementations for BLaIR: "Bridging Language and Items for Retrieval and Recommendation";
Scripts for constructing Amazon-C4, a new dataset for evaluating product search performance under complex contexts.

Recommendation Benchmarks

Based on the released Amazon Reviews 2023 dataset, we provide scripts to preprocess raw data into standard train/validation/test splits to encourage benchmarking recommendation models.

More details here -> [datasets & processing scripts]

BLaIR

BLaIR, which is short for "Bridging Language and Items for Retrieval and Recommendation", is a series of language models pre-trained on Amazon Reviews 2023 dataset.

BLaIR is grounded on pairs of (item metadata, language context), enabling the models to:

derive strong item text representations, for both recommendation and retrieval;
predict the most relevant item given simple / complex language context.

More details here -> [checkpoints & code]

Amazon-C4

Amazon-C4, which is short for "Complex Contexts Created by ChatGPT", is a new dataset for the complex product search task.

Amazon-C4 is designed to assess a model's ability to comprehend complex language contexts and retrieve relevant items.

More details here -> [datasets & code]

Contact

Please let us know if you encounter a bug or have any suggestions/questions by filling an issue or emailing Yupeng Hou (@hyp1231) at [email protected].

Acknowledgement

If you find Amazon Reviews 2023 dataset, BLaIR checkpoints, Amazon-C4 dataset, or our scripts/code helpful, please cite the following paper.

@article{hou2024bridging,
  title={Bridging Language and Items for Retrieval and Recommendation},
  author={Hou, Yupeng and Li, Jiacheng and He, Zhankui and Yan, An and Chen, Xiusi and McAuley, Julian},
  journal={arXiv preprint arXiv:2403.03952},
  year={2024}
}

The recommendation experiments in the BLaIR paper are implemented using the open-source recommendation library RecBole.

The pre-training scripts refer a lot to huggingface language-modeling examples and SimCSE.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
amazon-c4		amazon-c4
assets		assets
benchmark_scripts		benchmark_scripts
blair		blair
seq_rec_results		seq_rec_results
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Amazon Reviews 2023

Recommendation Benchmarks

BLaIR

Amazon-C4

Contact

Acknowledgement

About

Releases

Packages

Languages

License

alllis/AmazonReviews2023

Folders and files

Latest commit

History

Repository files navigation

Amazon Reviews 2023

Recommendation Benchmarks

BLaIR

Amazon-C4

Contact

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages