Skip to content
View tangxinvc's full-sized avatar

Block or report tangxinvc

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

procedural reasoning datasets

Python 348 34 Updated Feb 9, 2025

Simple GRPO scripts and configurations.

Python 46 4 Updated Feb 6, 2025

Open Thoughts: Fully Open Data Curation for Thinking Models

Python 615 37 Updated Feb 7, 2025

The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization

Python 687 103 Updated Mar 23, 2024

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

Python 2,173 167 Updated Feb 7, 2025

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Python 3,200 1,378 Updated Feb 5, 2025
Python 50 8 Updated Feb 7, 2025
Jupyter Notebook 1,705 214 Updated Feb 3, 2025

Reproduce R1 Zero on Logic Puzzle

Python 1,054 65 Updated Feb 8, 2025

The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis"

Python 42 Updated Feb 8, 2025

Witness the aha moment of VLM with less than $3.

Python 1,832 127 Updated Feb 8, 2025

Make any LLM to think like OpenAI o1 and deepseek R1

Python 387 19 Updated Feb 6, 2025

A library for advanced large language model reasoning

Python 1,776 155 Updated Feb 6, 2025

🐭 A tiny single-file implementation of Group Relative Policy Optimization (GRPO) as introduced by the DeepSeekMath paper

Python 12 Updated Feb 7, 2025

s1: Simple test-time scaling

Python 4,188 461 Updated Feb 8, 2025

RAG Web UI is an intelligent dialogue system based on RAG (Retrieval-Augmented Generation) technology.

TypeScript 970 93 Updated Feb 6, 2025

Everything you need to build state-of-the-art foundation models, end-to-end.

Python 6,136 422 Updated Feb 8, 2025

Open replication of DeepSeek R1 for text-to-graph extraction.

Python 21 4 Updated Jan 31, 2025

Automatic Evals for LLMs

HTML 177 18 Updated Feb 7, 2025

🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.

Python 164 10 Updated Feb 8, 2025

Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"

Python 76 7 Updated Feb 4, 2025

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Python 2,240 419 Updated Apr 15, 2024

RAGEN is the first open-source reproduction of DeepSeek-R1 on AGENT training.

Python 735 46 Updated Feb 9, 2025

Synthetic Data curation for post-training and structured data extraction

Python 710 52 Updated Feb 9, 2025

This repository offers a comprehensive collection of tutorials on state-of-the-art computer vision models and techniques. Explore everything from foundational architectures like ResNet to cutting-e…

Jupyter Notebook 6,809 1,062 Updated Feb 4, 2025

主要记录大语言大模型(LLMs) 算法(应用)工程师相关的知识及面试题

HTML 5,075 588 Updated Oct 22, 2024

This is a replicate of DeepSeek-R1-Zero and DeepSeek-R1 training on small models with limited data

Python 2,290 172 Updated Feb 7, 2025

Fully open reproduction of DeepSeek-R1

Python 17,771 1,476 Updated Feb 8, 2025
Next