Lists (15)
Sort Name ascending (A-Z)
Stars
Making large AI models cheaper, faster and more accessible
Democratizing Reinforcement Learning for LLMs
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates
This is a replicate of DeepSeek-R1-Zero and DeepSeek-R1 training on small models with limited data
Fully open reproduction of DeepSeek-R1
DeepSeek R1 distilled into smaller OSS models
OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning
添加了simpo方法的OpenRLHF,个人修改,原仓库链接:https://github.com/OpenLLMAI/OpenRLHF
This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.
Lamorel is a Python library designed for RL practitioners eager to use Large Language Models (LLMs).
A framework for few-shot evaluation of autoregressive language models.
Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"
A library for advanced large language model reasoning
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.
implement reinforcement learning(RL)and chain of thought(COT)like o1.
[NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$
OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models
The Open Assistant API is a ready-to-use, open-source, self-hosted agent/gpts orchestration creation framework, supporting customized extensions for LLM, RAG, function call, and tools capabilities.…
RLHF implementation details of OAI's 2019 codebase
Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety. [ACL 2024]