Stars
Official implementation for paper TEVAD: Improved video anomaly detection with captions
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
GOT-OCR的GUI版本,提供OCR、导出PDF、批处理等功能,但不提供训练功能
“Dive Into OCR” is a textbook developed by the PaddleOCR community that integrates OCR theory and practice.
PaddleOCR inference in PyTorch. Converted from [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)
Muggled DPT: Depth estimation without the magic
The repo for "Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image" and "Metric3Dv2: A Versatile Monocular Geometric Foundation Model..."
Strengthened Pose Information for self-supervised monocular depth estimation. SPIdepth refines the pose network to improve depth prediction accuracy, achieving state-of-the-art results on benchmark…
Code for robust monocular depth estimation described in "Ranftl et. al., Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer, TPAMI 2022"
[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
[ECCV 2024] Diffusion Models for Monocular Depth Estimation: Overcoming Challenging Conditions
A PyTorch implementation of DTrOCR: Decoder-only Transformer for Optical Character Recognition
[CVPR 2023] DepGraph: Towards Any Structural Pruning
Simulated Chinese License Plate Character images
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
使用YOLOv5和LPRNet进行车牌检测+识别(CCPD数据集)
An implementation of "CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model".
Generate the fake Chinese license plate images for detection & recognition
生成车牌识别数据集
Efficient vision foundation models for high-resolution generation and perception.
(2020-2022)The PyTorch version of SiamFC,SiamRPN,DaSiamRPN, UpdateNet , SiamDW, SiamRPN++, SiamMask, SiamFC++, SiamCAR, SiamBAN, Ocean, LightTrack , TrTr, NanoTrack; Visual object tracking based on…
[ICML 2024] Official PyTorch implementation of "SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization"
RepViT: Revisiting Mobile CNN From ViT Perspective [CVPR 2024] and RepViT-SAM: Towards Real-Time Segmenting Anything
[CVPR 2024] Code release for TransNeXt model
Metric depth estimation from a single image