Stars
Cross-modal few-shot adaptation with CLIP
Stable Diffusion and Flux in pure C/C++
Tesseract Open Source OCR Engine (main repository)
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.
A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统
A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
OpenOCR: A general OCR system with accuracy and efficiency. Supporting 24 Scene Text Recognition methods trained from scratch on large-scale real datasets, and will continue to add the latest methods.
MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.
DocBank: A Benchmark Dataset for Document Layout Analysis
A Comprehensive Toolkit for High-Quality PDF Content Extraction
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
Stable Diffusion and Flux in pure C/C++
Command Line Interface for Managing ComfyUI
GGUF Quantization support for native ComfyUI models
A Python frontend and library for ComfyUI
A powerful tool that translates ComfyUI workflows into executable Python code.
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
A pytorch quantization backend for optimum
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Open Source Neural Machine Translation and (Large) Language Models in PyTorch
[ICML'23] StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型