Skip to content
View SofianChay's full-sized avatar

Block or report SofianChay

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Official implementation of paper ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding

Python 18 Updated Jan 6, 2025

A simple and efficient Mamba implementation in pure PyTorch and MLX.

Python 1,102 101 Updated Dec 4, 2024

Tesseract Open Source OCR Engine (main repository)

C++ 63,902 9,645 Updated Jan 17, 2025

Code for BLT research paper

Python 1,320 95 Updated Jan 17, 2025

Official implemtation for paper "Vamos: Versatile Action Models for Video Understanding"

Python 6 Updated May 28, 2024

A paper list of some recent works about Token Compress for Vit and VLM

281 15 Updated Jan 13, 2025

[ECCV 2024 & NeurIPS 2024] Official implementation of the paper TAPTR & TAPTRv2 & TAPTRv3

251 14 Updated Dec 13, 2024
Python 341 24 Updated Nov 5, 2024

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Python 1,025 66 Updated Jan 10, 2025

RelTR: Relation Transformer for Scene Graph Generation: https://arxiv.org/abs/2201.11460v2

Python 259 51 Updated Aug 20, 2024

Mamba SSM architecture

Python 13,807 1,188 Updated Jan 18, 2025
Python 23 Updated Dec 23, 2024

Generate a comprehensive review from an arXiv paper, then turn it into a blog post. This project powers the website below for the HuggingFace's Daily Papers (https://huggingface.co/papers).

Python 699 77 Updated Jan 16, 2025
88 7 Updated Oct 19, 2022

An open source implementation of CLIP.

Python 10,812 1,020 Updated Jan 4, 2025

[CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"

Python 196 6 Updated Jun 9, 2024

Large-scale text-video dataset. 10 million captioned short videos.

Python 617 39 Updated Aug 14, 2024

This repo contains the code for 1D tokenizer and generator

Jupyter Notebook 650 34 Updated Jan 18, 2025

Official repository for "IntentQA: Context-aware Video Intent Reasoning" from ICCV 2023.

Python 13 1 Updated Nov 29, 2024

[EMNLP 2020] What is More Likely to Happen Next? Video-and-Language Future Event Prediction

Python 48 4 Updated Aug 20, 2022

Implementation of Slot Attention from GoogleAI

Python 405 32 Updated Aug 20, 2024

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Python 1,559 97 Updated Jan 17, 2025

mPLUG-Owl: The Powerful Multi-modal Large Language Model Family

Python 2,395 177 Updated Nov 27, 2024

VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.

Python 2,769 221 Updated Jan 11, 2025

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Python 3,115 222 Updated Dec 3, 2024

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑‍🔬

Jupyter Notebook 8,692 1,257 Updated Jan 14, 2025

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 13,682 1,353 Updated Dec 25, 2024
Python 57 3 Updated Jul 24, 2023

Adaptive Token Sampling for Efficient Vision Transformers (ECCV 2022 Oral Presentation)

Shell 97 15 Updated May 3, 2024
Python 46 2 Updated Jun 4, 2024
Next