Stars
LLM-dataset
3 repositories
A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, and eval.
[ICML 2024] Selecting High-Quality Data for Training Language Models
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.