Skip to content

jaswu51/LLMs-Robotics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 

Repository files navigation

Awesome-MM-LLM

Multimodal Large Language Models


Surveys

  • [MM-LLM] "A Survey on Multimodal Large Language Models", arXiv, June 2023. [Paper] [Website]
  • [LAM] "Sparks of Large Audio Models:A Survey and Outlook", arXiv, Sep 2023. [Paper] [Website]

Vision

  • [CLIP] "Learning transferable visual models from natural language supervision", arXiv, Feb 2021. [Paper] [Website]

  • [BLIP] "BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation", arXiv, Jan 2022. [Paper] [Website]

  • [BLIP-2] "BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models", arXiv, Jan 2022. [Paper] [Website]

  • [MiniGPT-4] "MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models ", arXiv, Apr 2023. [Paper] [Website]

  • [Instruct-BLIP] "InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning", arXiv, May 2023. [Paper] [Website]

  • [LLaVA] "Visual Instruction Tuning", arXiv, Apr 2023. [Paper] [Website]


Audios

  • [AudioGPT] "AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head", arXiv, Apr 2023. [Paper] [Website]

Any-to-Any

  • [NExT-GPT] "NExT-GPT: Any-to-Any Multimodal LLM", arXiv, Sep 2023. [Paper] [Website]

MM-LLM with Robotics

  • [PaLM-E] "PaLM-E: An Embodied Multimodal Language Model", arXiv, Feb 2021. [Paper] [Website]

Datasets

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published