Skip to content
View wangbinDL's full-sized avatar
🎯
Focusing
🎯
Focusing

Organizations

@opendatalab

Block or report wangbinDL

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 40,763 6,133 Updated Mar 8, 2025

Toolkit for linearizing PDFs for LLM datasets/training

Python 8,833 575 Updated Mar 7, 2025

https://hrl.boyuai.com/

Jupyter Notebook 3,039 602 Updated Nov 22, 2022

Fully open reproduction of DeepSeek-R1

Python 22,351 2,003 Updated Mar 7, 2025

[ICLR'25] Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training

Python 24 2 Updated Jan 25, 2025

🍒 Cherry Studio is a desktop client that supports for multiple LLM providers. Support deepseek-r1

TypeScript 18,473 1,490 Updated Mar 8, 2025

A programming framework for agentic AI 🤖 PyPi: autogen-agentchat Discord: https://aka.ms/autogen-discord Office Hour: https://aka.ms/autogen-officehour

Python 40,945 6,093 Updated Mar 8, 2025

Janus-Series: Unified Multimodal Understanding and Generation Models

Python 16,615 2,177 Updated Feb 1, 2025

Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)

Python 68,910 8,470 Updated Feb 25, 2025

A simple screen parsing tool towards pure vision based GUI agent

Jupyter Notebook 19,562 1,579 Updated Feb 23, 2025

Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.

Jupyter Notebook 2,693 171 Updated Mar 6, 2025

Agent framework and applications built upon Qwen>=2.0, featuring Function Calling, Code Interpreter, RAG, and Chrome extension.

Python 6,074 546 Updated Mar 7, 2025

Vary-tiny codebase upon LAVIS (for training from scratch)and a PDF image-text pairs data (about 600k including English/Chinese)

Python 79 4 Updated Sep 21, 2024
Python 3,495 324 Updated Feb 24, 2025

PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,提供 CLI/GUI/Docker/Zotero

Python 18,284 1,499 Updated Mar 6, 2025

Official inference repo for FLUX.1 models

Python 20,645 1,456 Updated Feb 6, 2025

A Comprehensive Benchmark for Document Parsing and Evaluation

Python 270 23 Updated Feb 25, 2025

OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation

Python 67 12 Updated Feb 1, 2025

This repository contains a paper collection of the methods for document image processing, including appearance enhancement, deshadowing, dewarping, deblurring, binarization and so on.

219 12 Updated Feb 12, 2025

Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes

Python 391 48 Updated Feb 15, 2025

Virtual whiteboard for sketching hand-drawn like diagrams

TypeScript 93,944 9,018 Updated Mar 8, 2025

基于序列表格识别算法推理库,集成PP-Structure和modelscope等表格识别算法。

Python 227 19 Updated Jan 10, 2025

[CVPR 2024] DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks

Python 394 43 Updated Jan 28, 2025

Mobile-Agent: The Powerful Mobile Device Operation Assistant Family

Python 3,595 347 Updated Feb 21, 2025

DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Models

Jupyter Notebook 128 5 Updated Jan 13, 2025

A curated list of of awesome UI agents resources, encompassing Web, App, OS, and beyond (continually updated)

152 17 Updated Mar 7, 2025

整理目前开源的最优表格识别模型,完善前后处理,模型转换为ONNX Organize the currently open-source optimal table recognition models, improve pre-processing and post-processing, and convert the models to ONNX.

Python 552 47 Updated Mar 8, 2025

Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 8,457 595 Updated Mar 7, 2025

DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception

Python 911 67 Updated Jan 16, 2025
Next