Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)

Python 552 59 Updated May 9, 2024

x35f / unstable_baselines

Re-implementations of SOTA RL algorithms.

Python 129 12 Updated Sep 7, 2023

alpc91 / SGRL

[ICML 2023 Oral] Official environments and implementations for "Subequivariant Graph Reinforcement Learning in 3D Environments"

Python 17 1 Updated Jul 24, 2023

pmineiro / smoothcb

Smoothed IGW for infinite action contextual bandits

ReScript 3 Updated Jul 2, 2022

facebookresearch / Pearl

A Production-ready Reinforcement Learning AI Agent Library brought by the Applied Reinforcement Learning team at Meta.

Jupyter Notebook 2,785 179 Updated Mar 7, 2025

pmineiro / linrepcb

SpannerIGW for linearly representable infinite action contextual bandits

Jupyter Notebook 4 Updated Jul 7, 2022

LantaoYu / MARL-Papers

Paper list of multi-agent reinforcement learning (MARL)

4,236 744 Updated Oct 17, 2024

tinkoff-ai / CORL

High-quality single-file implementations of SOTA Offline and Offline-to-Online RL algorithms: AWAC, BC, CQL, DT, EDAC, IQL, SAC-N, TD3+BC, LB-SAC, SPOT, Cal-QL, ReBRAC

Python 1,167 142 Updated Aug 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dchen48 dchen48

Block or report dchen48

Stars

karpathy / minGPT

bkitano / llama-from-scratch

wdndev / llm_interview_note

Hannibal046 / Awesome-LLM

mlabonne / llm-course

datamllab / rlcard

microsoft / jericho