-
Peking University
- Oxford, UK
-
19:24
(UTC) - https://sites.google.com/view/jzfeng/home
Highlights
- Pro
Stars
verl: Volcano Engine Reinforcement Learning for LLMs
CYaRon: Yet Another Random Olympic-iNformatics test data generator
Fully open data curation for reasoning models
Clean, minimal, accessible reproduction of DeepSeek R1-Zero
Sky-T1: Train your own O1 preview model within $450
Fully open reproduction of DeepSeek-R1
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)
A bibliography and survey of the papers surrounding o1
This is a replicate of DeepSeek-R1-Zero and DeepSeek-R1 training on small models with limited data
A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.
Daily updated LLM papers. 每日更新 LLM 相关的论文,欢迎订阅 👏 喜欢的话动动你的小手 🌟 一个
A series of technical report on Slow Thinking with LLM
Let your Claude able to think
A reading list on LLM based Synthetic Data Generation 🔥
Real-time updated, fine-grained reading list on LLM-synthetic-data.🔥
The RedStone repository includes code for preparing extensive datasets used in training large language models.
SWE-bench [Multimodal]: Can Language Models Resolve Real-world Github Issues?