young-chao

young_chao young-chao

5 followers · 7 following

Lists (24)

Sort

Stars

wdndev / llm_interview_note

主要记录大语言大模型（LLMs）算法（应用）工程师相关的知识及面试题

HTML 4,940 571 Updated Oct 22, 2024

ADaM-BJTU / O1-CODER

AN O1 REPLICATION FOR CODING

Python 314 20 Updated Dec 11, 2024

Tebmer / Awesome-Knowledge-Distillation-of-LLMs

This repository collects papers for "A Survey on Knowledge Distillation of Large Language Models". We break down KD into Knowledge Elicitation and Distillation Algorithms, and explore the Skill & V…

768 46 Updated Oct 22, 2024

GAIR-NLP / O1-Journey

O1 Replication Journey

1,912 59 Updated Jan 14, 2025

arcee-ai / EvolKit

EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language Models (LLMs).

Jupyter Notebook 200 23 Updated Oct 30, 2024

facebookresearch / generative-recommenders

Repository hosting code for "Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations" (https://arxiv.org/abs/2402.17152).

Python 879 161 Updated Dec 16, 2024

hijkzzz / Awesome-LLM-Strawberry

A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.

6,345 353 Updated Feb 3, 2025

stas00 / ml-engineering

Machine Learning Engineering Open Book

Python 12,615 772 Updated Feb 1, 2025

wasiahmad / Awesome-LLM-Synthetic-Data

A reading list on LLM based Synthetic Data Generation 🔥

1,003 55 Updated Nov 5, 2024

huggingface / nanotron

Minimalistic large language model 3D-parallelism training

Python 1,410 142 Updated Jan 31, 2025

huggingface / text-clustering

Easily embed, cluster and semantically label text datasets

Python 495 39 Updated Mar 28, 2024

huggingface / cosmopedia

Python 490 45 Updated Nov 20, 2024

LearningOpt / pie

Python 43 4 Updated Jul 18, 2024

microsoft / PythonProgrammingPuzzles

A Dataset of Python Challenges for AI Research

Python 972 93 Updated Apr 24, 2024

HqWu-HITCS / Awesome-Chinese-LLM

整理开源的中文大语言模型，以规模较小、可私有化部署、训练成本较低的模型为主，包括底座模型，垂直领域微调及应用，数据集与教程等。

17,948 1,721 Updated Sep 19, 2024

williamliujl / CMExam

A Chinese National Medical Licensing Examination dataset and large languge model benchmarks

Python 54 8 Updated Dec 2, 2023

argilla-io / argilla

Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets

Python 4,252 400 Updated Jan 29, 2025

315386775 / DeepLearing-Interview-Awesome-2024

AIGC-interview/CV-interview/LLMs-interview面试问题与答案集合仓，同时包含工作和科研过程中的新想法、新问题、新资源与新项目

1,969 187 Updated Jan 13, 2025

argilla-io / distilabel

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

Python 2,195 160 Updated Feb 3, 2025