Starred repositories
Tensors and Dynamic neural networks in Python with strong GPU acceleration
YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and…
real time face swap and one-click video deepfake with only a single image
Making large AI models cheaper, faster and more accessible
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
Datasets, Transforms and Models specific to Computer Vision
⚡️HivisionIDPhotos: a lightweight and efficient AI ID photos tools. 一个轻量级的AI证件照制作算法。
Fast and flexible image augmentation library. Paper about the library: https://www.mdpi.com/2078-2489/11/2/125
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
Image Polygonal Annotation with Python (polygon, rectangle, circle, line, point and image-level flag annotation).
Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection.
🍀 Pytorch implementation of various Attention Mechanisms, MLP, Re-parameter, Convolution, which is helpful to further understand papers.⭐⭐⭐
An open source implementation of CLIP.
YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5 with MegEngine, ONNX, TensorRT, ncnn, and OpenVINO supported. Documentation: https://yolox.readthedocs.io/
Flexible and powerful tensor operations for readable and reliable code (for pytorch, jax, TF and others)
Minimal PyTorch implementation of YOLOv3
Object detection, 3D detection, and pose estimation using center point detection:
Large World Model -- Modeling Text and Video with Millions Context
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
NanoDet-Plus⚡Super fast and lightweight anchor-free object detection model. 🔥Only 980 KB(int8) / 1.8MB (fp16) and run 97FPS on cellphone🔥
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
[ECCV 2022] ByteTrack: Multi-Object Tracking by Associating Every Detection Box
A simple, fully convolutional model for real-time instance segmentation.
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark