Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

Python 2,330 168 Updated Feb 4, 2025

argilla-io / argilla

Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets

Python 4,300 405 Updated Feb 6, 2025

huggingface / text-clustering

Easily embed, cluster and semantically label text datasets

Python 499 40 Updated Mar 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

young_chao young-chao

Block or report young-chao

数据处理

bigcode-project / bigcode-dataset

modelscope / data-juicer

bigcode-project / bigcode-analysis

EleutherAI / github-downloader

src-d / go-license-detector

ekzhu / datasketch

allenai / wimbd

NVIDIA / NeMo-Curator

argilla-io / distilabel

argilla-io / argilla

huggingface / text-clustering