LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 2,728 185 Updated Nov 14, 2024

microsoft / LLM2CLIP

LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.

Python 438 19 Updated Dec 13, 2024

richards199999 / Thinking-Claude

Let your Claude able to think

TypeScript 12,930 1,516 Updated Dec 3, 2024

kerrj / lerf

Code for LERF: Language Embedded Radiance Fields

Python 672 66 Updated Jul 9, 2024

Ericcsr / ARCap

Data collection part for ARCap

Jupyter Notebook 55 5 Updated Dec 21, 2024

AgibotTech / agibot_x1_train

The reinforcement learning training code for AgiBot X1.

Python 1,259 402 Updated Oct 23, 2024

rhymes-ai / Allegro

Allegro is a powerful text-to-video model that generates high-quality videos up to 6 seconds at 15 FPS and 720p resolution from simple text input.

Python 1,045 53 Updated Jan 2, 2025

mbzuai-oryx / Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted fo…

Python 1,260 110 Updated Aug 27, 2024

huangb23 / VTimeLLM

[CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".

Python 240 12 Updated Jun 13, 2024

LLaVA-VL / LLaVA-NeXT

Python 3,223 283 Updated Oct 16, 2024

TIGER-AI-Lab / UniIR

Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)

Python 119 12 Updated Oct 1, 2024

RenShuhuai-Andy / TimeChat

[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

Python 321 29 Updated Nov 19, 2024

google-deepmind / magiclens

[ICML'24 Oral] "MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions"

Python 154 10 Updated Oct 28, 2024

PolyU-ChenLab / ETBench

👾 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)

Python 48 1 Updated Nov 4, 2024

lucas-ventura / CoVR

Official PyTorch implementation of the paper "CoVR: Learning Composed Video Retrieval from Web Video Captions".

Python 96 8 Updated Dec 26, 2024

3b1b / manim

Animation engine for explanatory math videos

Python 73,762 6,445 Updated Jan 8, 2025

zju3dv / EasyMocap

Make human motion capture easier.

Python 3,802 465 Updated Jan 4, 2025

PersuGPT / PersuGPT.github.io

TL;DR: We propose a large-scale cross-domain persuasion dataset covers 13,000 scenarios in 35 domains, with the developed PersuGPT model achieving the best performance, surpassing GPT-4 in both aut…

HTML 7 Updated Sep 18, 2024

xaoyaoo / PyWxDump

获取微信信息；读取数据库，本地查看聊天记录并导出为csv、html等格式用于AI训练，自动回复等。支持多账户信息获取，支持所有微信版本。

Python 6,127 978 Updated Dec 28, 2024

jzhang38 / TinyLlama

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Python 8,086 486 Updated May 3, 2024

Zeyi-Lin / HivisionIDPhotos

⚡️HivisionIDPhotos: a lightweight and efficient AI ID photos tools. 一个轻量级的AI证件照制作算法。

Python 14,127 1,481 Updated Nov 20, 2024

IDEA-Research / MotionLLM

[Arxiv-2024] MotionLLM: Understanding Human Behaviors from Human Motions and Videos

Python 273 9 Updated Sep 8, 2024

Vision-CAIR / MiniGPT4-video

Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding

Python 576 63 Updated Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AlbertTan AlbertTan404

Achievements

Achievements

Block or report AlbertTan404

Stars

vision-x-nyu / thinking-in-space

OpenDriveLab / AgiBot-World

Genesis-Embodied-AI / Genesis

XiaoMi / ha_xiaomi_home

LvXinTao / HIMO_dataset

XiaoMi / MiLM-6B

neu-vi / OmniControl

ictnlp / LLaMA-Omni