Lists (2)
Sort Name ascending (A-Z)
Stars
Robust Speech Recognition via Large-Scale Weak Supervision
YOLOv5 π in PyTorch > ONNX > CoreML > TFLite
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training andβ¦
High-Resolution Image Synthesis with Latent Diffusion Models
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
OpenMMLab Detection Toolbox and Benchmark
ππ€ Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
We write your reusable computer vision tools. π
A generative world for general-purpose robotics & embodied AI learning.
LabelImg is now part of the Label Studio community. The popular image annotation tool created by Tzutalin is no longer actively being developed, but you can check out Label Studio, the open source β¦
Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection.
YOLOv10: Real-Time End-to-End Object Detection [NeurIPS 2024]
Refine high-quality datasets and visual AI models
OpenMMLab Semantic Segmentation Toolbox and Benchmark.
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"
Code for robust monocular depth estimation described in "Ranftl et. al., Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer, TPAMI 2022"
[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
OpenMMLab Text Detection, Recognition and Understanding Toolbox
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.
SuperGlue: Learning Feature Matching with Graph Neural Networks (CVPR 2020, Oral)
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
Hypermodern Python Cookiecutter
[CVPR 2024 Highlight] FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects