Skip to content
View huoliangyu's full-sized avatar

Block or report huoliangyu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Making large AI models cheaper, faster and more accessible

Python 40,537 4,478 Updated Mar 4, 2025

Democratizing Reinforcement Learning for LLMs

Python 1,897 166 Updated Feb 16, 2025

ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates

Python 320 24 Updated Feb 17, 2025

This is a replicate of DeepSeek-R1-Zero and DeepSeek-R1 training on small models with limited data

Python 3,057 225 Updated Feb 19, 2025

Fully open reproduction of DeepSeek-R1

Python 22,114 1,983 Updated Mar 4, 2025

DeepSeek R1 distilled into smaller OSS models

Python 12 4 Updated Jan 21, 2025

OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning

Python 110 2 Updated Dec 24, 2024
Python 42 3 Updated Dec 17, 2024

添加了simpo方法的OpenRLHF,个人修改,原仓库链接:https://github.com/OpenLLMAI/OpenRLHF

Python 8 Updated Jun 19, 2024

This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.

Jupyter Notebook 291 29 Updated Aug 6, 2024

AN O1 REPLICATION FOR CODING

Python 327 21 Updated Dec 11, 2024

Awesome Reinforcement Fine Tuning

3 Updated Dec 8, 2024

Lamorel is a Python library designed for RL practitioners eager to use Large Language Models (LLMs).

Python 218 19 Updated Nov 5, 2024

A framework for few-shot evaluation of autoregressive language models.

Python 148 48 Updated Sep 13, 2024

Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"

Python 127 12 Updated Nov 11, 2024

A library for advanced large language model reasoning

Python 2,007 177 Updated Feb 21, 2025

A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.

6,538 365 Updated Feb 26, 2025

implement reinforcement learning(RL)and chain of thought(COT)like o1.

Python 1 Updated Oct 6, 2024

Large Reasoning Models

Python 800 45 Updated Dec 3, 2024

[NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$

Python 41 2 Updated Oct 23, 2024

O1 Replication Journey

1,965 65 Updated Jan 14, 2025

OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models

Python 1,702 129 Updated Jan 17, 2025

SFT/Reward Model/DPO/SPO

Python 1 1 Updated May 30, 2024
Python 9 Updated Jan 4, 2024

The Open Assistant API is a ready-to-use, open-source, self-hosted agent/gpts orchestration creation framework, supporting customized extensions for LLM, RAG, function call, and tools capabilities.…

Python 321 70 Updated Dec 14, 2024

北京大学博士后研究工作报告 LaTeX 模板

TeX 13 Updated Mar 13, 2023

大连海事大学博士后研究工作报告模版,基于中科大学位论文latex模版修改

TeX 1 Updated Nov 14, 2023

RLHF implementation details of OAI's 2019 codebase

Python 183 9 Updated Jan 14, 2024

Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety. [ACL 2024]

Python 194 9 Updated Jun 24, 2024

大模型多维度中文对齐评测基准 (ACL 2024)

Python 364 27 Updated Aug 16, 2024
Next