Stars
Official implementation of paper ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding
A simple and efficient Mamba implementation in pure PyTorch and MLX.
Tesseract Open Source OCR Engine (main repository)
Official implemtation for paper "Vamos: Versatile Action Models for Video Understanding"
A paper list of some recent works about Token Compress for Vit and VLM
[ECCV 2024 & NeurIPS 2024] Official implementation of the paper TAPTR & TAPTRv2 & TAPTRv3
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
RelTR: Relation Transformer for Scene Graph Generation: https://arxiv.org/abs/2201.11460v2
Generate a comprehensive review from an arXiv paper, then turn it into a blog post. This project powers the website below for the HuggingFace's Daily Papers (https://huggingface.co/papers).
An open source implementation of CLIP.
[CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"
Large-scale text-video dataset. 10 million captioned short videos.
This repo contains the code for 1D tokenizer and generator
Official repository for "IntentQA: Context-aware Video Intent Reasoning" from ICCV 2023.
[EMNLP 2020] What is More Likely to Happen Next? Video-and-Language Future Event Prediction
Implementation of Slot Attention from GoogleAI
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑🔬
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
Adaptive Token Sampling for Efficient Vision Transformers (ECCV 2022 Oral Presentation)