huoliangyu

Follow

Xuntian huoliangyu

Follow

3 followers · 5 following

Achievements

Achievements

Lists (15)

Sort

clean code

dataset

evaluation LLM

latex2word

Tools to convert *.tex to MS Word *.doc

LLM agent

NLP

other tools

student resources with edu email

10 repositories

RL codebase

RLHF

40 repositories

rllib相关

traffic4cast

transformer

alphastar tranformer

教程

星际争霸

网易比赛

Stars

hpcaitech / ColossalAI

Making large AI models cheaper, faster and more accessible

Python 40,537 4,478 Updated Mar 4, 2025

agentica-project / deepscaler

Democratizing Reinforcement Learning for LLMs

Python 1,897 166 Updated Feb 16, 2025

Gen-Verse / ReasonFlux

ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates

Python 320 24 Updated Feb 17, 2025

hkust-nlp / simpleRL-reason

This is a replicate of DeepSeek-R1-Zero and DeepSeek-R1 training on small models with limited data

Python 3,057 225 Updated Feb 19, 2025

huggingface / open-r1

Fully open reproduction of DeepSeek-R1

Python 22,114 1,983 Updated Mar 4, 2025

Emericen / deepseek-r1-distilled

DeepSeek R1 distilled into smaller OSS models

Python 12 4 Updated Jan 21, 2025

ADaM-BJTU / OpenRFT

OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning

Python 110 2 Updated Dec 24, 2024

thu-coai / SPaR

Python 42 3 Updated Dec 17, 2024

victorShawFan / OpenRLHF_add_simpo

添加了simpo方法的OpenRLHF，个人修改，原仓库链接：https://github.com/OpenLLMAI/OpenRLHF

Python 8 Updated Jun 19, 2024

YuxiXie / MCTS-DPO

This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.

Jupyter Notebook 291 29 Updated Aug 6, 2024

ADaM-BJTU / O1-CODER

AN O1 REPLICATION FOR CODING

Python 327 21 Updated Dec 11, 2024

XxFChen / awesome-reinforcement-fine-tuning

Awesome Reinforcement Fine Tuning

3 Updated Dec 8, 2024

flowersteam / lamorel

Lamorel is a Python library designed for RL practitioners eager to use Large Language Models (LLMs).

Python 218 19 Updated Nov 5, 2024

Stability-AI / lm-evaluation-harness

Forked from EleutherAI/lm-evaluation-harness

A framework for few-shot evaluation of autoregressive language models.

Python 148 48 Updated Sep 13, 2024

McGill-NLP / VinePPO

Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"

Python 127 12 Updated Nov 11, 2024

maitrix-org / llm-reasoners

A library for advanced large language model reasoning

Python 2,007 177 Updated Feb 21, 2025

hijkzzz / Awesome-LLM-Strawberry

A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.

6,538 365 Updated Feb 26, 2025

sfdeggb / RL_Like_o1

implement reinforcement learning(RL)and chain of thought(COT)like o1.

Python 1 Updated Oct 6, 2024

SimpleBerry / LLaMA-O1

Large Reasoning Models

Python 800 45 Updated Dec 3, 2024

junkangwu / beta-DPO

[NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$

Python 41 2 Updated Oct 23, 2024

GAIR-NLP / O1-Journey

O1 Replication Journey

1,965 65 Updated Jan 14, 2025

openreasoner / openr

OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models

Python 1,702 129 Updated Jan 17, 2025

jessicazhu123 / Deepspeed_LLM

SFT/Reward Model/DPO/SPO

Python 1 1 Updated May 30, 2024

gouqi666 / DPO-deepspeed

Python 9 Updated Jan 4, 2024

MLT-OSS / open-assistant-api

The Open Assistant API is a ready-to-use, open-source, self-hosted agent/gpts orchestration creation framework, supporting customized extensions for LLM, RAG, function call, and tools capabilities.…

Python 321 70 Updated Dec 14, 2024

Jiayin-Gu / PKUreport

北京大学博士后研究工作报告 LaTeX 模板

TeX 13 Updated Mar 13, 2023

4tarXu / dlmu_postdoctor_latex

大连海事大学博士后研究工作报告模版，基于中科大学位论文latex模版修改

TeX 1 Updated Nov 14, 2023

vwxyzjn / lm-human-preference-details

RLHF implementation details of OAI's 2019 codebase

Python 183 9 Updated Jan 14, 2024

thu-coai / SafetyBench

Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety. [ACL 2024]

Python 194 9 Updated Jun 24, 2024

THUDM / AlignBench

大模型多维度中文对齐评测基准 (ACL 2024)

Python 364 27 Updated Aug 16, 2024