-
Renmin University of China
- Beijing, China
- AlbertTan404.github.io
- https://scholar.google.com/citations?user=cELItK0AAAAJ
- https://www.zhihu.com/people/AlbertTan
Stars
A generative world for general-purpose robotics & embodied AI learning.
Xiaomi Home Integration for Home Assistant
[ECCV 2024] Official Implementation of the paper "HIMO: A New Benchmark for Full-Body Human Interacting with Multiple Objects"
OmniControl: Control Any Joint at Any Time for Human Motion Generation, ICLR 2024
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.
Let your Claude able to think
The reinforcement learning training code for AgiBot X1.
Allegro is a powerful text-to-video model that generates high-quality videos up to 6 seconds at 15 FPS and 720p resolution from simple text input.
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted fo…
[CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".
Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)
[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
[ICML'24 Oral] "MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions"
👾 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)
Official PyTorch implementation of the paper "CoVR: Learning Composed Video Retrieval from Web Video Captions".
TL;DR: We propose a large-scale cross-domain persuasion dataset covers 13,000 scenarios in 35 domains, with the developed PersuGPT model achieving the best performance, surpassing GPT-4 in both aut…
获取微信信息;读取数据库,本地查看聊天记录并导出为csv、html等格式用于AI训练,自动回复等。支持多账户信息获取,支持所有微信版本。
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
⚡️HivisionIDPhotos: a lightweight and efficient AI ID photos tools. 一个轻量级的AI证件照制作算法。
[Arxiv-2024] MotionLLM: Understanding Human Behaviors from Human Motions and Videos
Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding
[CVPR 2024] Official implementation of the paper "ReGenNet: Towards Human Action-Reaction Synthesis"
[COLING22] An End-to-End Library for Evaluating Natural Language Generation