Qwen2-VL-7B-InstructのLoRA

2024年9月21日 23:52

OCRなどで高性能と話題のQwen2-VL-7B-InstructをLoRAしたのでまとめました。

LoRAにはこのライブラリを用います。

環境としてDockerを用いました。
自分が使ったコマンドは以下です。

docker run -it --gpus all -v $(pwd):/mnt/workspace registry.cn-hangzhou.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda12.1.0-py310-torch2.1.2-tf2.14.0-1.13.1 /bin/bash

dockerのコンテナに入ったら環境をインストールします。

git clone https://github.com/modelscope/swift.git
cd swift
pip install -e .[llm]

pip install git+https://github.com/huggingface/transformers.git
pip install pyav qwen_vl_utils
cd ..

以下の形式でデータを用意します。
複数画像も可能な模様。

{"query": "指示", "response": "正解のレスポンス", "images": ["画像のパス"]}

以下のコマンドでLoRAすることができます。

CUDA_VISIBLE_DEVICES=3 swift sft \
  --model_type qwen2-vl-7b-instruct \
  --model_id_or_path qwen/Qwen2-VL-7B-Instruct \
  --num_train_epochs 5 \
  --sft_type lora \
  --dataset {あなたのデータ}.jsonl \
  --deepspeed default-zero2 \

ちなみにログはこんな感じ。A6000使ってます。

{'loss': 3.13047028, 'acc': 0.41471725, 'grad_norm': 5.28035643, 'learning_rate': 9.935e-05, 'memory(GiB)': 20.8, 'train_speed(iter/s)': 3.642661, 'epoch': 1.48, 'global_step/max_steps': '9700/98025', 'percentage': '9.90%', 'elapsed_time': '44m 13s', 'remaining_time': '6h 42m 40s'}
Train:  10%|████                                     | 9700/98025 [44:13<306:03:18, 12.47s/it]

途中から再開する場合は以下のオプションをつけるとckptを利用できます。

--resume_from_checkpoint

詳しくはこちら。https://github.com/modelscope/ms-swift/blob/main/docs/source_en/Multi-Modal/qwen2-vl-best-practice.md

VLM、LLMで何かやりたい方はお気軽にお問い合わせください。

いいなと思ったら応援しよう！

サポートありがとうございます！