wangbinDL

🎯

Focusing

Bin Wang wangbinDL

🎯

Focusing

42 followers · 8 following

@ Shanghai AI Laboratory
China
https://wangbindl.github.io/

Achievements

x2 x2 x4

Achievements

x2 x2 x4

Organizations

Starred repositories

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 40,763 6,133 Updated Mar 8, 2025

allenai / olmocr

Toolkit for linearizing PDFs for LLM datasets/training

Python 8,833 575 Updated Mar 7, 2025

boyu-ai / Hands-on-RL

https://hrl.boyuai.com/

Jupyter Notebook 3,039 602 Updated Nov 22, 2022

huggingface / open-r1

Fully open reproduction of DeepSeek-R1

Python 22,351 2,003 Updated Mar 7, 2025

Alpha-Innovator / GeoX

[ICLR'25] Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training

Python 24 2 Updated Jan 25, 2025

CherryHQ / cherry-studio

🍒 Cherry Studio is a desktop client that supports for multiple LLM providers. Support deepseek-r1

TypeScript 18,473 1,490 Updated Mar 8, 2025

microsoft / autogen

A programming framework for agentic AI 🤖 PyPi: autogen-agentchat Discord: https://aka.ms/autogen-discord Office Hour: https://aka.ms/autogen-officehour

Python 40,945 6,093 Updated Mar 8, 2025

deepseek-ai / Janus

Janus-Series: Unified Multimodal Understanding and Generation Models

Python 16,615 2,177 Updated Feb 1, 2025

abi / screenshot-to-code

Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)

Python 68,910 8,470 Updated Feb 25, 2025

microsoft / OmniParser

A simple screen parsing tool towards pure vision based GUI agent

Jupyter Notebook 19,562 1,579 Updated Feb 23, 2025

google-research / big_vision

Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.

Jupyter Notebook 2,693 171 Updated Mar 6, 2025

QwenLM / Qwen-Agent

Agent framework and applications built upon Qwen>=2.0, featuring Function Calling, Code Interpreter, RAG, and Chrome extension.

Python 6,074 546 Updated Mar 7, 2025

deepseek-ai / DeepSeek-V3

Python 91,395 14,768 Updated Feb 24, 2025

Ucas-HaoranWei / Vary-tiny-600k

Vary-tiny codebase upon LAVIS （for training from scratch）and a PDF image-text pairs data (about 600k including English/Chinese)

Python 79 4 Updated Sep 21, 2024

LLaVA-VL / LLaVA-NeXT

Python 3,495 324 Updated Feb 24, 2025

Byaidu / PDFMathTranslate

PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译，支持 Google/DeepL/Ollama/OpenAI 等服务，提供 CLI/GUI/Docker/Zotero

Python 18,284 1,499 Updated Mar 6, 2025

black-forest-labs / flux

Official inference repo for FLUX.1 models

Python 20,645 1,456 Updated Feb 6, 2025

opendatalab / OmniDocBench

A Comprehensive Benchmark for Document Parsing and Evaluation

Python 270 23 Updated Feb 25, 2025

opendatalab / OHR-Bench

OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation

Python 67 12 Updated Feb 1, 2025

ZZZHANG-jx / Recommendations-Document-Image-Processing

This repository contains a paper collection of the methods for document image processing, including appearance enhancement, deshadowing, dewarping, deblurring, binarization and so on.

219 12 Updated Feb 12, 2025

sparkfish / augraphy

Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes

Python 391 48 Updated Feb 15, 2025

excalidraw / excalidraw

Virtual whiteboard for sketching hand-drawn like diagrams

TypeScript 93,944 9,018 Updated Mar 8, 2025

RapidAI / RapidTable

基于序列表格识别算法推理库，集成PP-Structure和modelscope等表格识别算法。

Python 227 19 Updated Jan 10, 2025

ZZZHANG-jx / DocRes

[CVPR 2024] DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks

Python 394 43 Updated Jan 28, 2025

X-PLUG / MobileAgent

Mobile-Agent: The Powerful Mobile Device Operation Assistant Family

Python 3,595 347 Updated Feb 21, 2025

Alpha-Innovator / DocGenome

DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Models

Jupyter Notebook 128 5 Updated Jan 13, 2025

opendilab / awesome-ui-agents

A curated list of of awesome UI agents resources, encompassing Web, App, OS, and beyond (continually updated)

152 17 Updated Mar 7, 2025

RapidAI / TableStructureRec

整理目前开源的最优表格识别模型，完善前后处理，模型转换为ONNX Organize the currently open-source optimal table recognition models, improve pre-processing and post-processing, and convert the models to ONNX.

Python 552 47 Updated Mar 8, 2025

QwenLM / Qwen2.5-VL

Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 8,457 595 Updated Mar 7, 2025

opendatalab / DocLayout-YOLO

DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception

Python 911 67 Updated Jan 16, 2025

Bin Wang wangbinDL

Organizations

Starred repositories

text-to-speech