Starred repositories
[ECCV 2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.
FashionCLIP is a CLIP-like model fine-tuned for the fashion domain.
[EMNLP 2022] An Open Toolkit for Knowledge Graph Extraction and Construction
《开源大模型食用指南》针对中国宝宝量身打造的基于Linux环境快速微调(全参数/Lora)、部署国内外开源大模型(LLM)/多模态大模型(MLLM)教程
[Paper List] Papers integrating knowledge graphs (KGs) and large language models (LLMs)
中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
SotA text-only image/video method (IJCAI 2023)
CaMML:Context-Aware MultiModal Learner for Large Models (ACL 2024)
The codes and datasets about our ACL 2024 Main Conference paper titled "Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment"
图深度学习(葡萄书),在线阅读地址: https://datawhalechina.github.io/grape-book
《大模型白盒子构建指南》:一个全手搓的Tiny-Universe
Awesome-LLM-RAG: a curated list of advanced retrieval augmented generation (RAG) in Large Language Models
"LightRAG: Simple and Fast Retrieval-Augmented Generation"
Curated tutorials and resources for Large Language Models, AI Painting, and more.
斯坦福 CS224W(2023 Winter)的中文笔记、作业与Colab | Chinese notes, homwork and colabs of Stanford CS224W (2023 Winter)
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
the official repo for EMNLP 2024 (main) paper "EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models"
中文nlp解决方案(大模型、数据、模型、训练、推理)