LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 2,670 184 Updated Nov 14, 2024

microsoft / LLM2CLIP

LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.

Python 410 17 Updated Dec 13, 2024

richards199999 / Thinking-Claude

Let your Claude able to think

TypeScript 10,237 1,166 Updated Dec 3, 2024

kerrj / lerf

Code for LERF: Language Embedded Radiance Fields

Python 671 66 Updated Jul 9, 2024

Ericcsr / ARCap

Data collection part for ARCap

Jupyter Notebook 48 5 Updated Dec 21, 2024

AgibotTech / agibot_x1_train

The reinforcement learning training code for AgiBot X1.

Python 1,196 379 Updated Oct 23, 2024

rhymes-ai / Allegro

Allegro is a powerful text-to-video model that generates high-quality videos up to 6 seconds at 15 FPS and 720p resolution from simple text input.

Python 847 48 Updated Dec 19, 2024

mbzuai-oryx / Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted fo…

Python 1,251 110 Updated Aug 27, 2024

huangb23 / VTimeLLM

[CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".

Python 232 11 Updated Jun 13, 2024

LLaVA-VL / LLaVA-NeXT

Python 3,130 274 Updated Oct 16, 2024

TIGER-AI-Lab / UniIR

Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)

Python 116 12 Updated Oct 1, 2024

RenShuhuai-Andy / TimeChat

[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

Python 316 27 Updated Nov 19, 2024

google-deepmind / magiclens

[ICML'24 Oral] "MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions"

Python 150 8 Updated Oct 28, 2024

PolyU-ChenLab / ETBench

👾 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)

Python 44 Updated Nov 4, 2024

lucas-ventura / CoVR

Official PyTorch implementation of the paper "CoVR: Learning Composed Video Retrieval from Web Video Captions".

Python 93 8 Updated Dec 17, 2024

3b1b / manim

Animation engine for explanatory math videos

Python 72,151 6,324 Updated Dec 17, 2024

zju3dv / EasyMocap

Make human motion capture easier.

Python 3,772 464 Updated May 6, 2024

PersuGPT / PersuGPT.github.io

TL;DR: We propose a large-scale cross-domain persuasion dataset covers 13,000 scenarios in 35 domains, with the developed PersuGPT model achieving the best performance, surpassing GPT-4 in both aut…

HTML 7 Updated Sep 18, 2024

xaoyaoo / PyWxDump

获取微信信息；读取数据库，本地查看聊天记录并导出为csv、html等格式用于AI训练，自动回复等。支持多账户信息获取，支持所有微信版本。

Python 6,029 965 Updated Oct 19, 2024

jzhang38 / TinyLlama

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Python 8,037 478 Updated May 3, 2024

Zeyi-Lin / HivisionIDPhotos

⚡️HivisionIDPhotos: a lightweight and efficient AI ID photos tools. 一个轻量级的AI证件照制作算法。

Python 13,714 1,436 Updated Nov 20, 2024

IDEA-Research / MotionLLM

[Arxiv-2024] MotionLLM: Understanding Human Behaviors from Human Motions and Videos

Python 269 9 Updated Sep 8, 2024

Vision-CAIR / MiniGPT4-video

Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding

Python 569 61 Updated Dec 10, 2024

liangxuy / ReGenNet

[CVPR 2024] Official implementation of the paper "ReGenNet: Towards Human Action-Reaction Synthesis"

Python 44 2 Updated Sep 23, 2024

disi-unibo-nlp / nlg-metricverse

[COLING22] An End-to-End Library for Evaluating Natural Language Generation

Python 88 5 Updated Dec 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AlbertTan AlbertTan404

Block or report AlbertTan404

Stars

Genesis-Embodied-AI / Genesis

XiaoMi / ha_xiaomi_home

LvXinTao / HIMO_dataset

XiaoMi / MiLM-6B

neu-vi / OmniControl

ictnlp / LLaMA-Omni