Skip to content
View zhangyan-ucas's full-sized avatar
  • Institute of Information Engineering, Chinese Academy of Sciences
  • Beijing, China

Block or report zhangyan-ucas

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 13,428 1,297 Updated Dec 25, 2024

Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Python 6,470 566 Updated Dec 31, 2024

Official implementation of the paper "Track the Answer: Extending TextVQA from Image to Video with Spatio-Temporal Clues"

Python 9 Updated Dec 20, 2024

[2024-NeurIPS] TextCtrl: Diffusion-based Scene Text Editing with Prior Guidance Control

Python 40 4 Updated Dec 16, 2024

Generate a transcript for your favourite Manga: Detect manga characters, text blocks and panels. Order panels. Cluster characters. Match texts to their speakers. Perform OCR.

307 12 Updated Dec 20, 2024

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022

Python 5,940 479 Updated Jul 11, 2024
Python 73 10 Updated Aug 7, 2023

[ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM

Python 64 2 Updated Oct 25, 2024

The official repo for [CVPR'23] "DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting" & [ArXiv'23] "DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Multi…

Python 253 34 Updated Aug 9, 2024

PyTorch implementation of paper "ARTrack" and "ARTrackV2"

Python 253 34 Updated Dec 23, 2024

[NeurIPS'24] GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term Matching

Python 20 1 Updated Dec 12, 2024

(CVPR 2024) Bridging the Gap Between End-to-End and Two-Step Text Spotting.

Python 51 Updated Jun 11, 2024

[NeurIPS2021] BOVText: A Large-Scale, Multidimensional Multilingual Dataset for Video Text Spotting

Python 67 6 Updated Oct 9, 2023

✨✨ Scene-Text Grounding for Text-Based Video Question Answering (arxiv)

Python 12 1 Updated Dec 31, 2024

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Python 4,046 242 Updated Dec 4, 2024

[IJCV 2024] TransDETR: End-to-end Video Text Spotting with Transformer

Python 104 12 Updated Mar 28, 2024