Generate a comprehensive review from an arXiv paper, then turn it into a blog post. This project powers the website below for the HuggingFace's Daily Papers (https://huggingface.co/papers).

Python 699 77 Updated Jan 16, 2025

VRU-NExT / VideoQA

88 7 Updated Oct 19, 2022

mlfoundations / open_clip

An open source implementation of CLIP.

Python 10,812 1,020 Updated Jan 4, 2025

Beckschen / ViTamin

[CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"

Python 196 6 Updated Jun 9, 2024

m-bain / webvid

Large-scale text-video dataset. 10 million captioned short videos.

Python 617 39 Updated Aug 14, 2024

bytedance / 1d-tokenizer

This repo contains the code for 1D tokenizer and generator

Jupyter Notebook 650 34 Updated Jan 18, 2025

JoseponLee / IntentQA

Official repository for "IntentQA: Context-aware Video Intent Reasoning" from ICCV 2023.

Python 13 1 Updated Nov 29, 2024

jayleicn / VideoLanguageFuturePred

[EMNLP 2020] What is More Likely to Happen Next? Video-and-Language Future Event Prediction

Python 48 4 Updated Aug 20, 2022

lucidrains / slot-attention

Implementation of Slot Attention from GoogleAI

Python 405 32 Updated Aug 20, 2024

OpenGVLab / InternVideo

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Python 1,559 97 Updated Jan 17, 2025

X-PLUG / mPLUG-Owl

mPLUG-Owl: The Powerful Multi-modal Large Language Model Family

Python 2,395 177 Updated Nov 27, 2024

NVlabs / VILA

VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.

Python 2,769 221 Updated Jan 11, 2025

PKU-YuanGroup / Video-LLaVA

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Python 3,115 222 Updated Dec 3, 2024

SakanaAI / AI-Scientist

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑‍🔬

Jupyter Notebook 8,692 1,257 Updated Jan 14, 2025

facebookresearch / sam2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 13,682 1,353 Updated Dec 25, 2024

renwang435 / video-ttt-release

Python 57 3 Updated Jul 24, 2023

adaptivetokensampling / ATS

Adaptive Token Sampling for Efficient Vision Transformers (ECCV 2022 Oral Presentation)

Shell 97 15 Updated May 3, 2024

wdrink / OmniVid

Python 46 2 Updated Jun 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SofianChay

Achievements

Achievements

Block or report SofianChay

Stars

SCZwangxiao / video-ReTaKe

alxndrTL / mamba.py

tesseract-ocr / tesseract

facebookresearch / blt

brown-palm / Vamos

daixiangzi / Awesome-Token-Compress

IDEA-Research / TAPTR

Vision-CAIR / LongVU

DAMO-NLP-SG / VideoLLaMA2

yrcong / RelTR

state-spaces / mamba

huiwon-jang / CoordTok

deep-diver / paper-reviewer