DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021)
This is the official repository of Action Progression Networks for Temporal Action Localization in Videos
Count the MACs / FLOPs of your PyTorch model.
we want to create a repo to illustrate usage of transformers in chinese
OVTrack: Open-Vocabulary Multiple Object Tracking [CVPR 2023]
Code for our paper "Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers"
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
Open-vocabulary Object Segmentation with Diffusion Models
[CVPR2023] Code Release of Aligning Bag of Regions for Open-Vocabulary Object Detection
A DETR-style framework for open-vocabulary detection (OVD). CVPR 2023
(ICCV 2023) MasQCLIP for Open-Vocabulary Universal Image Segmentation
[CVPRW'23] "A unified model for continuous conditional video prediction". Xi Ye, Guillaume-Alexandre Bilodeau.
Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models (AAAI 2024)
Pytorch implementation of SinMPI (SIGGRAPH Asia 2023)
[ICCV2023] Segment Every Reference Object in Spatial and Temporal Spaces
Implementation of Parti, Google's pure attention-based text-to-image neural network, in Pytorch
Latte: Latent Diffusion Transformer for Video Generation.
[ECCV 2024] The official code of paper "Open-Vocabulary SAM".
[ICML 2024] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
The repository for paper VPTR: Efficient Transformers for Video Prediction
[AAAI'24] "STDiff: Spatio-temporal Diffusion for Continuous Stochastic Video Prediction". Xi Ye, Guillaume-Alexandre Bilodeau
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
A Collection of Papers and Codes for CVPR2024/ECCV2024 AIGC
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures