Skip to content
View tomchen-ctj's full-sized avatar
💨
Learning
💨
Learning

Block or report tomchen-ctj

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[ECCV 2024] DragAnything: Motion Control for Anything using Entity Representation

Python 467 16 Updated Jul 2, 2024
Python 14 Updated Jul 1, 2024

HART: Efficient Visual Generation with Hybrid Autoregressive Transformer

Python 407 19 Updated Oct 16, 2024

A machine learning framework for reconstructing articulated 3D animals from images

Python 87 3 Updated Dec 18, 2024

[NeurIPS 2024] Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding

Python 69 4 Updated Dec 3, 2024

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,410 94 Updated Aug 13, 2024

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

Python 1,740 85 Updated Oct 31, 2024

[CVPR 2024] Text-to-3D using Gaussian Splatting

Python 813 49 Updated Jan 7, 2024

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 13,696 1,354 Updated Dec 25, 2024

Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"

Python 15,480 2,061 Updated Oct 30, 2024

[ICCV 2023] ProPainter: Improving Propagation and Transformer for Video Inpainting

Python 5,839 676 Updated Sep 18, 2024

LLaRA: Large Language and Robotics Assistant

Python 164 3 Updated Oct 2, 2024

Long Context Transfer from Language to Vision

Python 356 19 Updated Nov 20, 2024

VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)

Python 283 32 Updated Aug 15, 2024

Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding

Python 245 15 Updated Aug 11, 2024

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 6,823 524 Updated Dec 25, 2024

✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

445 18 Updated Dec 14, 2024

1.5−3.0× lossless training or pre-training speedup. An off-the-shelf, easy-to-implement algorithm for the efficient training of foundation visual backbones.

Python 217 9 Updated Aug 23, 2024

【NeurIPS 2024】Dense Connector for MLLMs

Python 154 7 Updated Oct 14, 2024

(ECCV 2024) Empowering Multimodal Large Language Model as a Powerful Data Generator

Python 103 Updated Oct 17, 2024

Multilingual Medicine: Model, Dataset, Benchmark, Code

Python 179 9 Updated Oct 15, 2024

TALL: Temporal Activity Localization via Language Query

Python 195 48 Updated Mar 15, 2018

tiktoken is a fast BPE tokeniser for use with OpenAI's models.

Python 13,069 910 Updated Oct 3, 2024
Python 3,281 294 Updated Oct 16, 2024

[SIGGRAPH'24] 2D Gaussian Splatting for Geometrically Accurate Radiance Fields

Python 2,269 168 Updated Dec 30, 2024

FreeVA: Offline MLLM as Training-Free Video Assistant

Python 54 Updated Jun 9, 2024

CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)

Python 188 20 Updated Jan 28, 2024

[ECCV 2024 Oral] MotionDirector: Motion Customization of Text-to-Video Diffusion Models.

Python 866 54 Updated Aug 21, 2024

This repository contains curated prompts aimed at maximizing the effectiveness of Sora for generating videos.

16 1 Updated Jan 17, 2025

An efficient pure-PyTorch implementation of Kolmogorov-Arnold Network (KAN).

Python 4,196 376 Updated Aug 1, 2024
Next