tangxinvc

Follow

tangxinvc

Follow

2 followers · 2 following

Starred repositories

open-thought / reasoning-gym

procedural reasoning datasets

Python 348 34 Updated Feb 9, 2025

minosvasilias / simple_grpo

Simple GRPO scripts and configurations.

Python 46 4 Updated Feb 6, 2025

open-thoughts / open-thoughts

Open Thoughts: Fully Open Data Curation for Thinking Models

Python 615 37 Updated Feb 7, 2025

vwxyzjn / ppo-implementation-details

The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization

Python 687 103 Updated Mar 23, 2024

roboflow / maestro

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

Python 2,173 167 Updated Feb 7, 2025

deepseek-ai / DeepSeek-R1

68,764 8,848 Updated Feb 8, 2025

deepseek-ai / DeepSeek-VL2

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Python 3,200 1,378 Updated Feb 5, 2025

Goedel-LM / Goedel-Prover

Python 50 8 Updated Feb 7, 2025

satori-reasoning / Satori

36 1 Updated Feb 5, 2025

mshumer / OpenDeepResearcher

Jupyter Notebook 1,705 214 Updated Feb 3, 2025

Unakar / Logic-RL

Reproduce R1 Zero on Logic Puzzle

Python 1,054 65 Updated Feb 8, 2025

TIGER-AI-Lab / AceCoder

The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis"

Python 42 Updated Feb 8, 2025

Deep-Agent / R1-V

Witness the aha moment of VLM with less than $3.

Python 1,832 127 Updated Feb 8, 2025

harishsg993010 / LLM-Reasoner

Make any LLM to think like OpenAI o1 and deepseek R1

Python 387 19 Updated Feb 6, 2025

maitrix-org / llm-reasoners

A library for advanced large language model reasoning

Python 1,776 155 Updated Feb 6, 2025

superlinear-ai / microGRPO

🐭 A tiny single-file implementation of Group Relative Policy Optimization (GRPO) as introduced by the DeepSeekMath paper

Python 12 Updated Feb 7, 2025

simplescaling / s1

s1: Simple test-time scaling

Python 4,188 461 Updated Feb 8, 2025

rag-web-ui / rag-web-ui

RAG Web UI is an intelligent dialogue system based on RAG (Retrieval-Augmented Generation) technology.

TypeScript 970 93 Updated Feb 6, 2025

oumi-ai / oumi

Everything you need to build state-of-the-art foundation models, end-to-end.

Python 6,136 422 Updated Feb 8, 2025

Ingvarstep / open-r1-text2graph

Open replication of DeepSeek R1 for text-to-graph extraction.

Python 21 4 Updated Jan 31, 2025

mlfoundations / evalchemy

Automatic Evals for LLMs

HTML 177 18 Updated Feb 7, 2025

sail-sg / oat

🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.

Python 164 10 Updated Feb 8, 2025

TIGER-AI-Lab / CritiqueFineTuning

Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"

Python 76 7 Updated Feb 4, 2025

deepseek-ai / DeepSeek-Math

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Python 2,240 419 Updated Apr 15, 2024

ZihanWang314 / RAGEN

RAGEN is the first open-source reproduction of DeepSeek-R1 on AGENT training.

Python 735 46 Updated Feb 9, 2025

bespokelabsai / curator

Synthetic Data curation for post-training and structured data extraction

Python 710 52 Updated Feb 9, 2025

roboflow / notebooks

This repository offers a comprehensive collection of tutorials on state-of-the-art computer vision models and techniques. Explore everything from foundational architectures like ResNet to cutting-e…

Jupyter Notebook 6,809 1,062 Updated Feb 4, 2025

wdndev / llm_interview_note

主要记录大语言大模型（LLMs）算法（应用）工程师相关的知识及面试题

HTML 5,075 588 Updated Oct 22, 2024

hkust-nlp / simpleRL-reason

This is a replicate of DeepSeek-R1-Zero and DeepSeek-R1 training on small models with limited data

Python 2,290 172 Updated Feb 7, 2025

huggingface / open-r1

Fully open reproduction of DeepSeek-R1

Python 17,771 1,476 Updated Feb 8, 2025

Starred topics

large-language-models