Skip to content

Commit

Permalink
Update scripts and ScienceQA instructions
Browse files Browse the repository at this point in the history
  • Loading branch information
haotian-liu committed Aug 27, 2023
1 parent 053c284 commit 9ccab83
Show file tree
Hide file tree
Showing 9 changed files with 55 additions and 97 deletions.
99 changes: 12 additions & 87 deletions docs/ScienceQA.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,115 +5,40 @@
2. Generate ScienceQA dataset for LLaVA conversation-style format.

```Shell
python scripts/convert_sqa_to_llava \
python scripts/convert_sqa_to_llava.py \
convert_to_llava \
--base-dir /path/to/ScienceQA/data/scienceqa \
--prompt-format "QCM-LEA" \
--split {train,val,minival,test,minitest}
```

#### Training
**NOTE**: Due to that ScienceQA experiments were done earlier, the current checkpoints are trained *without* `<im_start>` and `<im_end>` tokens. Here we provide our training scripts for the current checkpoints.

<details>
<summary>1. Pretraining</summary>
1. Pretraining

```Shell
torchrun --nnodes=1 --nproc_per_node=8 --master_port=25001 \
llava/train/train_mem.py \
--model_name_or_path ./checkpoints/llama-vicuna-13b \
--data_path /path/to/cc3m_595k.json \
--image_folder /path/to/cc3m_595k \
--vision_tower openai/clip-vit-large-patch14 \
--tune_mm_mlp_adapter True \
--mm_vision_select_layer -2 \
--bf16 True \
--output_dir ./checkpoints/llava-13b-pretrain-no_im_start_end_token \
--num_train_epochs 1 \
--per_device_train_batch_size 16 \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps 1 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 2400 \
--save_total_limit 1 \
--learning_rate 2e-3 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--tf32 True \
--model_max_length 2048 \
--gradient_checkpointing True \
--lazy_preprocess True \
--report_to wandb
```
</details>
You can download our pretrained projector weights from our [Model Zoo](), or train your own projector weights using [`pretrain.sh`](https://github.com/haotian-liu/LLaVA/blob/main/scripts/pretrain.sh).

<details>
<summary>2. Finetuning</summary>
2. Finetuning

You may download our pretrained `llava-13b-v0-pretrain-no_im_start_end_token.bin` [here](https://huggingface.co/liuhaotian/LLaVA-13b-pretrain-projector-v0/blob/main/LLaVA-13b-pretrain-projector-v0-CC3M-595K-original_caption-no_im_token.bin).

```Shell
torchrun --nnodes=1 --nproc_per_node=8 --master_port=25001 \
llava/train/train_mem.py \
--model_name_or_path /path/to/llama-vicuna-13b \
--data_path /path/to/scienceqa/llava_train_QCM-LEPA.json \
--image_folder /path/to/scienceqa/images/train \
--vision_tower openai/clip-vit-large-patch14 \
--pretrain_mm_mlp_adapter ./checkpoints/llava-13b-pretrain-no_im_start_end_token/mm_projector.bin \
--mm_vision_select_layer -2 \
--bf16 True \
--output_dir ./checkpoints/llava-13b-pretrain-no_im_start_end_token-finetune_scienceqa \
--num_train_epochs 12 \
--per_device_train_batch_size 4 \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps 1 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 5000 \
--save_total_limit 3 \
--learning_rate 2e-5 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--tf32 True \
--fsdp "full_shard auto_wrap" \
--fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
--model_max_length 2048 \
--gradient_checkpointing True \
--lazy_preprocess True \
--report_to wandb
```
</details>
See [`finetune_sqa.sh`](https://github.com/haotian-liu/LLaVA/blob/main/scripts/finetune_sqa.sh).

#### Evaluation

1. Download our pretrained LLaVA-13B (delta) weights for ScienceQA dataset [here](https://huggingface.co/liuhaotian/LLaVA-13b-delta-v0-science_qa). Convert the delta weights to actual weights.

```Shell
python -m llava.model.apply_delta \
--base /path/to/llama-13b \
--target /path/to/LLaVA-13b-v0-science_qa \
--delta liuhaotian/LLaVA-13b-delta-v0-science_qa
```
2. [Option 1] Multiple-GPU inference
1. Multiple-GPU inference
You may evaluate this with multiple GPUs, and concatenate the generated jsonl files. Please refer to our script for [batch evaluation](https://github.com/haotian-liu/LLaVA/blob/main/scripts/sqa_eval_batch.sh) and [results gathering](https://github.com/haotian-liu/LLaVA/blob/main/scripts/sqa_eval_gather.sh).
3. [Option 2] Single-GPU inference
2. Single-GPU inference
(a) Generate LLaVA responses on ScienceQA dataset
```Shell
python -m llava.eval.model_vqa_science \
--model-path /path/to/LLaVA-13b-v0-science_qa \
--question-file /path/to/ScienceQA/data/scienceqa/llava_test.json \
--model-path liuhaotian/llava-lcs558k-scienceqa-vicuna-13b-v1.3 \
--question-file /path/to/ScienceQA/data/scienceqa/llava_test_QCM-LEA.json \
--image-folder /path/to/ScienceQA/data/scienceqa/images/test \
--answers-file vqa/results/ScienceQA/test_llava-13b.jsonl \
--answer-prompter \
--conv-mode llava_v0
--conv-mode llava_v1
```

(b) Evaluate the generated responses
Expand All @@ -126,4 +51,4 @@ python eval_science_qa.py \
--output-result vqa/results/ScienceQA/test_llava-13b_result.json \
```

For reference, we attach our prediction file [`test_llava-13b_result.json`](https://github.com/haotian-liu/LLaVA/blob/main/llava/eval/table/results/test_sqa_llava_13b_v0.json) for comparison when reproducing our results, as well as for further analysis in detail.
For reference, we attach our prediction file [`test_sqa_llava_13b_v0.json`](https://github.com/haotian-liu/LLaVA/blob/main/llava/eval/table/results/test_sqa_llava_13b_v0.json) for comparison when reproducing our results, as well as for further analysis in detail.
2 changes: 1 addition & 1 deletion scripts/convert_sqa_to_llava.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
from convert_sqa_to_llava_base_prompt import build_prompt_chatbot


def convert_to_llava(base_dir, split, prompt_format="QCM-LEPA"):
def convert_to_llava(base_dir, split, prompt_format="QCM-LEA"):
split_indices = json.load(open(os.path.join(base_dir, "pid_splits.json")))[split]
problems = json.load(open(os.path.join(base_dir, "problems.json")))

Expand Down
2 changes: 1 addition & 1 deletion scripts/finetune.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
################## LLaMA-2 ##################

deepspeed llava/train/train_mem.py \
--deepspeed /path/to/deepspeed.json \
--deepspeed ./scripts/zero2.json \
--model_name_or_path ./checkpoints/$MODEL_VERSION \
--version $PROMPT_VERSION \
--data_path ./playground/data/llava_instruct_80k.json \
Expand Down
2 changes: 1 addition & 1 deletion scripts/finetune_full_schedule.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
################## LLaMA-2 ##################

deepspeed llava/train/train_mem.py \
--deepspeed /path/to/deepspeed.json \
--deepspeed ./scripts/zero2.json \
--model_name_or_path ./checkpoints/$MODEL_VERSION \
--version $PROMPT_VERSION \
--data_path ./playground/data/llava_instruct_158k.json \
Expand Down
2 changes: 1 addition & 1 deletion scripts/finetune_lora.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
################## LLaMA-2 ##################

deepspeed llava/train/train_mem.py \
--deepspeed /path/to/deepspeed.json \
--deepspeed ./scripts/zero2.json \
--lora_enable True \
--model_name_or_path ./checkpoints/$MODEL_VERSION \
--version $PROMPT_VERSION \
Expand Down
2 changes: 1 addition & 1 deletion scripts/finetune_qlora.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
################## LLaMA-2 ##################

deepspeed llava/train/train_mem.py \
--deepspeed /path/to/deepspeed_zero2.json \
--deepspeed ./scripts/zero2.json \
--lora_enable True \
--bits 4 \
--model_name_or_path ./checkpoints/$MODEL_VERSION \
Expand Down
34 changes: 34 additions & 0 deletions scripts/finetune_sqa.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
#!/bin/bash

deepspeed llava/train/train_mem.py \
--deepspeed ./scripts/zero2.json \
--model_name_or_path lmsys/vicuna-13b-v1.3 \
--version $PROMPT_VERSION \
--data_path /Data/ScienceQA/data/scienceqa/llava_train_QCM-LEA.json \
--image_folder /Data/ScienceQA/data/scienceqa/images/train \
--vision_tower openai/clip-vit-large-patch14 \
--pretrain_mm_mlp_adapter ./checkpoints/huggingface/liuhaotian/llava-pretrain-vicuna-13b-v1.3/mm_projector.bin \
--mm_vision_select_layer -2 \
--mm_use_im_start_end False \
--mm_use_im_patch_token False \
--bf16 True \
--output_dir ./checkpoints/llava-vicuna-13b-v1.3-pretrain_lcs558k_plain-ScienceQA_QCM_LEA-12e \
--num_train_epochs 12 \
--per_device_train_batch_size 16 \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps 1 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 50000 \
--save_total_limit 1 \
--learning_rate 2e-5 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--tf32 True \
--model_max_length 2048 \
--gradient_checkpointing True \
--dataloader_num_workers 4 \
--lazy_preprocess True \
--report_to wandb
2 changes: 1 addition & 1 deletion scripts/pretrain.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ PROMPT_VERSION=plain
########### DO NOT CHANGE ###########

deepspeed llava/train/train_mem.py \
--deepspeed /path/to/deepspeed.json \
--deepspeed ./scripts/zero2.json \
--model_name_or_path ./checkpoints/$MODEL_VERSION \
--version $PROMPT_VERSION \
--data_path /path/to/pretrain_data.json \
Expand Down
7 changes: 3 additions & 4 deletions scripts/sqa_eval_batch.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,11 @@
CHUNKS=8
for IDX in {0..7}; do
CUDA_VISIBLE_DEVICES=$IDX python -m llava.eval.model_vqa_science \
--model-path ./checkpoints/LLaVA-13b-v0-science_qa \
--question-file ~/haotian/datasets/ScienceQA/data/scienceqa/llava_test_QCM-LEPA.json \
--model-path liuhaotian/llava-lcs558k-scienceqa-vicuna-13b-v1.3 \
--question-file ~/haotian/datasets/ScienceQA/data/scienceqa/llava_test_QCM-LEA.json \
--image-folder ~/haotian/datasets/ScienceQA/data/scienceqa/images/test \
--answers-file ./test_llava-13b-chunk$CHUNKS_$IDX.jsonl \
--num-chunks $CHUNKS \
--chunk-idx $IDX \
--answer-prompter \
--conv-mode llava_v0 &
--conv-mode llava_v1 &
done

0 comments on commit 9ccab83

Please sign in to comment.