If your server is facing a poor connection to Huggingface, we provide an alternative way to Download Weights from ModelScope. Click in to see details.
对于中国大陆地区的使用者,若您的服务器连接huggingface存在一些困难,我们亦提供通过魔搭下载权重的方式。敬请点击参阅指南。
git clone https://github.com/haotian-liu/LLaVA.git
cd LLaVA
pip install -e .
See the codes and scripts below.
Example Code (Single Query)
from llava.mm_utils import get_model_name_from_path
from llava.eval.run_llava import eval_model
model_path = "teowu/llava_v1.5_7b_qinstruct_preview_v0.1"
prompt = "Rate the quality of the image. Think step by step."
image_file = "fig/sausage.jpg"
args = type('Args', (), {
"model_path": model_path,
"model_base": None,
"model_name": get_model_name_from_path(model_path),
"query": prompt,
"conv_mode": None,
"image_file": image_file,
"sep": ",",
})()
eval_model(args)
Example Code (CLI Demo for Multi-turn Conversation)
python -m llava.serve.cli \
--model-path teowu/llava_v1.5_7b_qinstruct_preview_v0.1 \
--image-file "fig/sausage.jpg" \
Note: The results may contain randomness as do_sample=True
is enabled during conversation mode.
Multi-choice question (MCQ) in Q-Bench.
python eval_scripts/llava_v1.5/eval_qbench_mcq.py
Image/Video Quality Assessment
Image Quality Assessment:
python eval_scripts/llava_v1.5/eval_image_quality.py
Video Quality Assessment:
python eval_scripts/llava_v1.5/eval_video_quality.py
For mPLUG-Owl-2, Only Single GPU Inference is supported now. Please set environmental variable (e.g. export CUDA_VISIBLE_DEVICES=0
) to make sure that the model can be loaded on only one device.
git clone https://github.com/X-PLUG/mPLUG-Owl.git
cd mPLUG_Owl/mPLUG_Owl2/
pip install -e .
Example Code (Single Query)
from mplug_owl2.mm_utils import get_model_name_from_path
from eval_scripts.mplug_owl_2.run_mplug_owl2 import eval_model
model_path = "teowu/mplug_owl2_7b_448_qinstruct_preview_v0.1"
prompt = "Rate the quality of the image. Think step by step."
image_file = "fig/sausage.jpg"
args = type('Args', (), {
"model_path": model_path,
"model_base": None,
"model_name": get_model_name_from_path(model_path),
"query": prompt,
"conv_mode": None,
"image_file": image_file,
"sep": ",",
})()
eval_model(args)
Example Code (CLI Demo for Multi-turn Conversation)
python -m mplug_owl2.serve.cli \
--model-path teowu/mplug_owl2_7b_448_qinstruct_preview_v0.1 \
--image-file "fig/sausage.jpg" \
Note: The results may contain randomness as do_sample=True
is enabled during conversation mode.
Multi-choice question (MCQ) in Q-Bench.
python eval_scripts/mplug_owl_2/eval_qbench_mcq.py
Image/Video Quality Assessment
Image Quality Assessment:
python eval_scripts/mplug_owl_2/eval_image_quality.py
Video Quality Assessment:
python eval_scripts/mplug_owl_2/eval_video_quality.py
InternLM-XComposer-VL has been integrated into Huggingface AutoModel
(remote code mode). You can directly start with the code below without a separate install process.
Example Code (Single Query)
import torch
from transformers import AutoModel, AutoTokenizer
torch.set_grad_enabled(False)
# init model and tokenizer
model = AutoModel.from_pretrained('DLight1551/internlm-xcomposer-vl-7b-qinstruct-full', trust_remote_code=True).cuda().eval()
tokenizer = AutoTokenizer.from_pretrained('DLight1551/internlm-xcomposer-vl-7b-qinstruct-full', trust_remote_code=True)
model.tokenizer = tokenizer
# Single-Turn Text-Image Dialogue
text = 'Describe and evaluate the quality of the image.'
image = 'fig/sausage.jpg'
response = model.generate(text, image)
print(response)
Example Code (Multi-Turn Conversation)
import torch
from transformers import AutoModel, AutoTokenizer
torch.set_grad_enabled(False)
# init model and tokenizer
model = AutoModel.from_pretrained('DLight1551/internlm-xcomposer-vl-7b-qinstruct-full', trust_remote_code=True).cuda().eval()
tokenizer = AutoTokenizer.from_pretrained('DLight1551/internlm-xcomposer-vl-7b-qinstruct-full', trust_remote_code=True)
model.tokenizer = tokenizer
# Multi-Turn Dialogue
text = 'Describe and evaluate the quality of the image.'
image = 'fig/sausage.jpg'
response, history = model.chat(text, image, history=None)
print(f'User: {text}')
print(f'Bot: {response}')
text = 'Which part of the pan is clearer, the top part of the bottom part?'
response, history = model.chat(text=text, image=None, history=history)
print(f'User: {text}')
print(f'Bot: {response}')
Multi-choice question (MCQ) in Q-Bench.
python eval_scripts/internlm_xcomposer_vl/eval_qbench_mcq.py
Image/Video Quality Assessment
Image Quality Assessment:
python eval_scripts/internlm_xcomposer_vl/eval_image_quality.py
Video Quality Assessment:
python eval_scripts/internlm_xcomposer_vl/eval_video_quality.py
See Model Zoo. Both huggingface and modelscope weights are provided.
At present, we only provide the training scripts with LLaVA-v1.5 (7B/13B). Please see Training Docs for more details.
Researchers and open-source developers are free to use the Q-Instruct dataset and the fine-tuned weights as provided for the four MLLMs. We also allow commercial use, while any commercial use should be pre-permitted by our team. Please email [email protected]
to gain the permission for commercial use.