Stars
Don't Judge by the Look: Towards Motion Coherent Video Representation (ICLR2024)
[ICLR'23] AIM: Adapting Image Models for Efficient Video Action Recognition
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Open Thoughts: Fully Open Data Curation for Thinking Models
Bringing BERT into modernity via both architecture changes and scaling
This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
[AAAI 2024 Oral] M2CLIP: A Multimodal, Multi-Task Adapting Framework for Video Action Recognition
A custom RPC framework implemented by Netty+Kyro+Zookeeper.(一款基于 Netty+Kyro+Zookeeper 实现的自定义 RPC 框架-附详细实现过程和相关教程。)
🚀 「大模型」3小时从0训练27M参数的视觉多模态VLM!🌏 Train a 27M-parameter VLM from scratch in just 3 hours!
openvla / openvla
Forked from TRI-ML/prismatic-vlmsOpenVLA: An open-source vision-language-action model for robotic manipulation.
Isaac Gym Environments for Legged Robots
Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models
OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
A generative world for general-purpose robotics & embodied AI learning.
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
[NeurIPS 2023] Official implementations of "Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models"
[NeurIPS 2024] BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language Models
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Llama中文社区,Llama3在线体验和微调模型已开放,实时汇总最新Llama3学习资料,已将所有代码更新适配Llama3,构建最好的中文Llama大模型,完全开源可商用
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted fo…