- [MM-LLM] "A Survey on Multimodal Large Language Models", arXiv, June 2023. [Paper] [Website]
- [LAM] "Sparks of Large Audio Models:A Survey and Outlook", arXiv, Sep 2023. [Paper] [Website]
-
[CLIP] "Learning transferable visual models from natural language supervision", arXiv, Feb 2021. [Paper] [Website]
-
[BLIP] "BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation", arXiv, Jan 2022. [Paper] [Website]
-
[BLIP-2] "BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models", arXiv, Jan 2022. [Paper] [Website]
-
[MiniGPT-4] "MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models ", arXiv, Apr 2023. [Paper] [Website]
-
[Instruct-BLIP] "InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning", arXiv, May 2023. [Paper] [Website]
-
[LLaVA] "Visual Instruction Tuning", arXiv, Apr 2023. [Paper] [Website]
- [AudioGPT] "AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head", arXiv, Apr 2023. [Paper] [Website]