Roythuly

Yu Luo Roythuly

8 followers · 1 following

Achievements

Stars

hkust-nlp / simpleRL-reason

This is a replicate of DeepSeek-R1-Zero and DeepSeek-R1 training on small models with limited data

Python 2,279 170 Updated Feb 7, 2025

Jiayi-Pan / TinyZero

Clean, minimal, accessible reproduction of DeepSeek R1-Zero

Python 9,145 1,187 Updated Feb 1, 2025

huggingface / open-r1

Fully open reproduction of DeepSeek-R1

Python 17,713 1,467 Updated Feb 8, 2025

lindermanlab / ssm

Bayesian learning and inference for state space models

Jupyter Notebook 595 205 Updated Aug 14, 2024

Roythuly / OMPO

Python 9 Updated May 29, 2024

Vance0124 / Token-level-Direct-Preference-Optimization

Reference implementation for Token-level Direct Preference Optimization(TDPO)

Python 126 13 Updated Jul 3, 2024

google-deepmind / torax

TORAX: Tokamak transport simulation in JAX

Python 395 40 Updated Feb 8, 2025

2toinf / IVM

[NeurIPS-2024] The offical Implementation of "Instruction-Guided Visual Masking"

Jupyter Notebook 32 2 Updated Nov 15, 2024

Roythuly / OBAC

Python 19 Updated May 27, 2024

yuanlong-o / Deep_widefield_cal_inferece

MATLAB 11 8 Updated Oct 19, 2023

Roythuly / off-policy

Python 7 1 Updated Jul 19, 2023

MurpheyLab / MaxDiffRL

Jupyter Notebook 59 11 Updated Mar 9, 2024

KindXiaoming / pykan

Kolmogorov Arnold Networks

Jupyter Notebook 15,363 1,446 Updated Jan 19, 2025

roboterax / humanoid-gym

Humanoid-Gym: Reinforcement Learning for Humanoid Robot with Zero-Shot Sim2Real Transfer https://arxiv.org/abs/2404.05695

Python 984 159 Updated Jan 26, 2025

posquit0 / Awesome-CV

📄 Awesome CV is LaTeX template for your outstanding job application

TeX 23,717 4,869 Updated Feb 6, 2025

tuna / thuthesis

LaTeX Thesis Template for Tsinghua University

TeX 4,700 1,093 Updated Jan 10, 2025

carlosferrazza / humanoid-bench

Python 454 58 Updated Sep 22, 2024

hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 39,736 4,878 Updated Feb 8, 2025

windshadow233 / This-Repo-Has-14-Stars

Thanks for stopping by! This repository now has 14 stars~🌟🌟🌟

Python 14 Updated Dec 31, 2024

imoneoi / openchat

OpenChat: Advancing Open-source Language Models with Imperfect Data

Python 5,297 404 Updated Sep 13, 2024

ikostrikov / jaxrl

JAX (Flax) implementation of algorithms for Deep Reinforcement Learning with continuous action spaces.

Jupyter Notebook 651 71 Updated Oct 26, 2022

ToruOwO / minimal-stable-PPO

A minimal and stable PPO.

Python 129 4 Updated Feb 9, 2024

qgallouedec / panda-gym

Set of robotic environments based on PyBullet physics engine and gymnasium.

Python 606 118 Updated Jul 23, 2024

chanb / rl_sandbox_public

PyTorch implementation of (Deep) Reinforcement Learning (RL) algorithms

Python 22 Updated Jun 26, 2022

vikashplus / robohive

A unified framework for robot learning

Python 548 86 Updated Nov 26, 2024

f / awesome-chatgpt-prompts

This repo includes ChatGPT prompt curation to use ChatGPT and other LLM tools better.

HTML 119,985 16,158 Updated Feb 5, 2025

jity16 / When-to-Update-Your-Model-Constrained-Model-based-Reinforcement-Learning

Official Pytorch Implementation of CMLO in the paper ”When to Update Your Model: Constrained Model-based Reinforcement Learning“

Python 10 1 Updated Nov 2, 2023

young-geng / CQL

Conservative Q Learning on top of SAC

Python 122 25 Updated Oct 15, 2022

pranz24 / pytorch-soft-actor-critic

PyTorch implementation of soft actor critic

Python 852 181 Updated Nov 9, 2021

openai / safety-starter-agents

Basic constrained RL agents used in experiments for the "Benchmarking Safe Exploration in Deep Reinforcement Learning" paper.

Python 400 113 Updated Apr 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yu Luo Roythuly

Achievements

Achievements

Block or report Roythuly

Stars

hkust-nlp / simpleRL-reason

Jiayi-Pan / TinyZero

huggingface / open-r1

lindermanlab / ssm

Roythuly / OMPO

Vance0124 / Token-level-Direct-Preference-Optimization

google-deepmind / torax

2toinf / IVM

Roythuly / OBAC

yuanlong-o / Deep_widefield_cal_inferece

Roythuly / off-policy

MurpheyLab / MaxDiffRL

KindXiaoming / pykan

roboterax / humanoid-gym

posquit0 / Awesome-CV

tuna / thuthesis

carlosferrazza / humanoid-bench

hiyouga / LLaMA-Factory

windshadow233 / This-Repo-Has-14-Stars

imoneoi / openchat

ikostrikov / jaxrl

ToruOwO / minimal-stable-PPO

qgallouedec / panda-gym

chanb / rl_sandbox_public

vikashplus / robohive

f / awesome-chatgpt-prompts

jity16 / When-to-Update-Your-Model-Constrained-Model-based-Reinforcement-Learning

young-geng / CQL

pranz24 / pytorch-soft-actor-critic

openai / safety-starter-agents