Stars
[NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations
[CVPR 2024] SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design
A new codebase for popular Scene Graph Generation methods (2020). Visualization & Scene Graph Extraction on custom images/datasets are provided. It's also a PyTorch implementation of paper “Unbiase…
FACTUAL benchmark dataset, the pre-trained textual scene graph parser trained on FACTUAL.
[ICCV 2023] Accurate and Fast Compressed Video Captioning
Official pytorch implementation of paper "Dual-Level Collaborative Transformer for Image Captioning" (AAAI 2021).
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
[AAAI 2023 Oral] VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning
Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning
用于从头预训练+SFT一个小参数量的中文LLaMa2的仓库;24G单卡即可运行得到一个具备简单中文问答能力的chat-llama2.
Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"
总结梳理自然语言处理工程师(NLP)需要积累的各方面知识,包括面试题,各种基础知识,工程能力等等,提升核心竞争力
Official pytorch implementation of the AAAI 2021 paper "Semantic Grouping Network for Video Captioning"
Research code for CVPR 2022 paper "SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning"
[NeurIPS 2021] [T-PAMI] DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification
[CVPR2022] Official code for Hierarchical Modular Network for Video Captioning. Our proposed HMN is implemented with PyTorch.
pytorch implementation of video captioning
[arXiv22] Disentangled Representation Learning for Text-Video Retrieval
Simple program to learn CNN (LeNet-5) in pure C