This repository demonstrates the fine-tuning process of the multi-modal Qwen2-VL-7B model using Amazon SageMaker Hyperpod. It provides a comprehensive guide and code examples for leveraging the powerful Hyperpod infrastructure to efficiently fine-tune the Qwen2-VL-7B model, which combines vision and language capabilities. The repository includes Python scripts for the fine-tuning process and Slurm configurations for distributed training, enabling users to scale their workloads across multiple nodes. By following this guide, data scientists and machine learning engineers can harness the full potential of Qwen2-VL-7B for various multi-modal tasks while taking advantage of SageMaker Hyperpod's scalable and cost-effective distributed training capabilities. The provided scripts and configurations streamline the fine-tuning workflow, allowing users to optimize the model for their specific use cases.
OCR Results - Financial Statement
You can follow the AWS workshop content with step by step guidance. https://catalog.workshops.aws/sagemaker-hyperpod/en-US
Lifecycle scripts allow customization of your cluster during creation. They will be used to install software packages. The official lifecycle scripts is suitable for general use-cases.
To set up lifecycle scripts:
- Clone the repository and upload scripts to S3:
git clone --depth=1 https://github.com/aws-samples/awsome-distributed-training/ cd awsome-distributed-training/1.architectures/5.sagemaker-hyperpod/LifecycleScripts/ aws s3 cp --recursive base-config/ s3://${BUCKET}/src
- Prepare
cluster-config.json
andprovisioning_parameters.json
files. - Upload the configuration to S3:
aws s3 cp provisioning_parameters.json s3://${BUCKET}/src/
- Create the cluster:
aws sagemaker create-cluster --cli-input-json file://cluster-config.json --region $AWS_REGION
Example of cluster-config.json
and provisioning_parameters.json
can be found at in ClusterConfig
To increase worker instances:
- Update
cluster-config.json
with the new instance count. - Run:
aws sagemaker update-cluster \ --cluster-name ${my-cluster-name} \ --instance-groups file://update-cluster-config.json \ --region $AWS_REGION
aws sagemaker delete-cluster --cluster-name ${my-cluster-name}
- SageMaker HyperPod supports Amazon FSx for Lustre integration, enabling full bi-directional synchronization with Amazon S3.
- Ensure proper AWS CLI permissions and configurations.
- Validate the cluster configuration files before lauching the cluster
curl -O https://raw.githubusercontent.com/aws-samples/awsome-distributed-training/main/1.architectures/5.sagemaker-hyperpod/validate-config.py
pip3 install boto3
python3 validate-config.py --cluster-config cluster-config.json --provisioning-parameters provisioning_parameters.json
If you are using SageMaker HyperPod, you might follow the tutorial here to setup up SSH connection.
SSH into cluster
./easy-ssh.sh -c controller-machine ml-cluster
sudo su - ubuntu
SageMaker HyperPod supports connecting to the cluster via VSCode. You can setup a SSH Proxy via SSM and use that to connect in Visual Studio Code, following this guidance https://catalog.workshops.aws/sagemaker-hyperpod/en-US/05-advanced/05-vs-code
All the following steps will be executed on GPUs nodes i.e. 2 * g5.2xlarge, you can ssh into worker node https://catalog.workshops.aws/sagemaker-hyperpod/en-US/01-cluster/07-ssh-compute
sinfo
ssh ip-10-1-23-***
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
chmod +x Miniconda3-latest-Linux-x86_64.sh
./Miniconda3-latest-Linux-x86_64.sh -b -f -p ~/miniconda3
source ~/miniconda3/bin/activate
conda create -n llamafactory python=3.10
conda activate llamafactory
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip install torch==2.4.0 torchvision==0.19.0 --index-url https://download.pytorch.org/whl/cu121
pip install -e ".[torch,metrics,deepspeed,bitsandbytes,liger-kernel]" "transformers>=0.45.0"
# If you are finetuning Qwen2.5 VL, you might need to use transformers>4.49.0
# see issue: https://github.com/huggingface/transformers/pull/36188
# pip install git+https://github.com/huggingface/[email protected]
# or
# pip install git+https://github.com/huggingface/[email protected]
pip install flash-attn
cd ..
clone the current repository and cd into repo
git clone https://github.com/aws-samples/fine-tune-qwen2-vl-with-llama-factory.git
cd fine-tune-qwen2-vl-with-llama-factory
python ./preprocessing/process_fintabnet_en.py --output_dir ./data/fintabnet_en
Add pubtabnet format in ./data/dataset_info.json
(added fintabnet_en with this sample code)
PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models. PiSSA shares the same architecture as LoRA. However compared to LoRA, PiSSA updates the principal components while freezing the "residual" parts, allowing faster convergence and enhanced performance.
python ./train_configs/pissa_init.py --model_name_or_path Qwen/Qwen2-VL-7B-Instruct --output_dir models/qwen2_vl_7b_pissa_128 --lora_rank 128 --lora_target $'^(?!.*patch_embed).*(?:gate_proj|k_proj|fc2|o_proj|v_proj|up_proj|fc1|proj|down_proj|qkv|q_proj).*'
Prepare training config ./train_configs/train/qwen2_vl_7b_pissa_qlora_128_fintabnet_en.yaml
FORCE_TORCHRUN=1 llamafactory-cli train ./train_configs/train/qwen2_vl_7b_pissa_qlora_128_fintabnet_en.yaml
or use the Slurm sbatch. Example script here ./submit_train_singlenode.sh
for single node single GPU i.e. g5.2xlarge
sbatch submit_train_singlenode.sh
use the Slurm sbatch. Example script here ./submit_train_multinode.sh
for 2 nodes of single GPU i.e. 2 * g5.2xlarge
sbatch submit_train_multinode.sh
After completing a model training process, you'll get a file called finetune_output_multinode.log
. This is a log file that records all the details and progress of your training session (example here).
Example ./train_configs/export/export_qwen2_vl_7b_pissa_qlora_128_fintabnet_en.yaml
- Modify the adapter_name_or_path to your target lora folder path
- Modify the output directory export_dir to your target output folder path
llamafactory-cli export ./train_configs/export/export_qwen2_vl_7b_pissa_qlora_128_fintabnet_en.yaml
Run ./evaluation/qwen2vl_visual_evaluation.py
to evaluate the merged model's performance. The model generates HTML output that enables side-by-side comparison between the model's predictions and reference tables, making it easy to visually assess the accuracy of table structure recognition and content extraction through an interactive interface.
pip install qwen_vl_utils
python ./evaluation/qwen2vl_visual_evaluation.py
The model's performance is then evaluated using the financial-statement-table-html dataset, which provides standardized metrics for assessing table structure recognition and content extraction accuracy in financial statements.
pip install qwen_vl_utils
python ./evaluation/inference.py --log-path ./logs --model-name qwen2_vl --model-path models/qwen2_vl_7b_pissa_qlora_128_fintabnet_en
pip install distance apted lxml
python ./evaluation/calc_teds.py ./logs/$YOUR_TXT_PATH
AutoAWQ is an easy-to-use package for 4-bit quantized models. AutoAWQ speeds up models by 3x and reduces memory requirements by 3x compared to FP16. AutoAWQ implements the Activation-aware Weight Quantization (AWQ) algorithm for quantizing LLMs.
This might require more GPU memory. You can try ml.g5.24xlarge
for AWQ quantization.
pip install autoawq
CUDA_VISIBLE_DEVICES=0,1,2,3 python ./quantization/quant_awq.py --model_path ./models/qwen2_vl_7b_pissa_qlora_128_fintabnet_en --quant_path ./models/qwen2_vl_7b_pissa_qlora_128_fintabnet_en_awq_int4 --jsonl_file ./data/fintabnet_en/fintabnet.json --n_sample 16
You can either host fine-tuned Qwen2 VL model on SageMaker real-time endpoint, or use directly vLLM docker on your perferred environment such as EKS. You can check the deployment guidance here
Zhihao LIN, from AWS Generative AI Innovation Center, conducted extensive experimentation with generative AI models, focusing on optimizing model performance and resource utilization.
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.