-
Shanghai Jiao Tong University
- Shanghai
-
08:36
(UTC -12:00)
Highlights
- Pro
Stars
Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).
An implementation of the AlphaZero algorithm for Gomoku (also called Gobang or Five in a Row)
Implemention of the Decision-Pretrained Transformer (DPT) from the paper Supervised Pretraining Can Learn In-Context Reinforcement Learning.
Curated list of datasets and tools for post-training.
🐙 OctoPack: Instruction Tuning Code Large Language Models
Data and Code for Program of Thoughts (TMLR 2023)
Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'
MambaFormer in-context learning experiments and implementation for https://arxiv.org/abs/2402.04248
Simulation code for paper "Training Dynamics of Multi-Head Softmax Attention for In-Context Learning: Emergence, Convergence, and Optimality"
Ongoing research training transformer models at scale
Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling.
Paper list of multi-agent reinforcement learning (MARL)
Rainbow is all you need! A step-by-step tutorial from DQN to Rainbow
All notes and materials for the CS229: Machine Learning course by Stanford University
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
[ICML 2021] DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning | 斗地主AI
李宏毅(Hung-yi Lee) 2022年春季机器学习课程,包括课件和作业,
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)
The repository is for safe reinforcement learning baselines.
💼 another CV template for your job application, yet powered by Typst and more
An elegant \LaTeX\ résumé template. 大陆镜像 https://gods.coding.net/p/resume/git
🦜🔗 Build context-aware reasoning applications
为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, m…