Stars
This is a replicate of DeepSeek-R1-Zero and DeepSeek-R1 training on small models with limited data
Clean, minimal, accessible reproduction of DeepSeek R1-Zero
Fully open reproduction of DeepSeek-R1
Bayesian learning and inference for state space models
Reference implementation for Token-level Direct Preference Optimization(TDPO)
TORAX: Tokamak transport simulation in JAX
[NeurIPS-2024] The offical Implementation of "Instruction-Guided Visual Masking"
Humanoid-Gym: Reinforcement Learning for Humanoid Robot with Zero-Shot Sim2Real Transfer https://arxiv.org/abs/2404.05695
📄 Awesome CV is LaTeX template for your outstanding job application
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Thanks for stopping by! This repository now has 14 stars~🌟🌟🌟
OpenChat: Advancing Open-source Language Models with Imperfect Data
JAX (Flax) implementation of algorithms for Deep Reinforcement Learning with continuous action spaces.
Set of robotic environments based on PyBullet physics engine and gymnasium.
PyTorch implementation of (Deep) Reinforcement Learning (RL) algorithms
This repo includes ChatGPT prompt curation to use ChatGPT and other LLM tools better.
Official Pytorch Implementation of CMLO in the paper ”When to Update Your Model: Constrained Model-based Reinforcement Learning“
PyTorch implementation of soft actor critic
Basic constrained RL agents used in experiments for the "Benchmarking Safe Exploration in Deep Reinforcement Learning" paper.