FalconTuneでFalconをfine tuningする

Kan Hatakeyama

2023年7月7日 10:21

概要

FalconTuneライブラリを使って､FalconをQLoRAでファインチューニングしました
どのLLMをファインチューニングするかは､色々と悩むところですが､やはりモデルサイズの大きさは正義なので､Falcon-40bのようなデカいものを試したいという発想です

関連モジュール(執筆時)

MPTTune
- MPT-7bをQLoRAするライブラリ｡30bはまだ未対応だが､誰かが作ってくれそうです
LLMTune
- LLaMa-65bのQLoRA｡執筆時点において､ライブラリにバグがあったり､cudaがちょっと古かったり､色々と気になります

セットアップ

検証環境

Google colab (40bモデルはA100必須) (開発者らのサンプルコードあり)
手持ち Ubuntu (RTX-3090)で7bモデル, miniconda

レポジトリのクローン

git clone https://github.com/rmihaylov/falcontune
cd falcontune

モジュールのインストール

pip install -r requirements.txt  
python setup.py install

モデルのダウンロード (時間がかかります)

#手持ちのubuntuでは､40bを動かすメモリが足りなそうなので､7bをダウンロードしてみます
wget https://huggingface.co/TheBloke/falcon-7b-instruct-GPTQ/resolve/main/gptq_model-4bit-64g.safetensors

# 40bが動かせる方はこちら
wget https://huggingface.co/TheBloke/falcon-40b-instruct-GPTQ/resolve/main/gptq_model-4bit--1g.safetensors

適宜､微妙に足りないライブラリのインストール (for 手持ちubuntu)

pip install scipy
conda install cudatoolkit

動作
手持ちのRTX-3090 x2の環境では､gpuのメモリエラーが出ました｡
GPUを1枚だけ認識させる設定にすると､解決しました｡

export CUDA_VISIBLE_DEVICES=0

推論 with 7b

falcontune generate \
    --interactive \
    --model falcon-7b-instruct-4bit \
    --weights gptq_model-4bit-64g.safetensors \
    --max_new_tokens=50 \
    --use_cache \
    --do_sample \
    --instruction "Who was the first person on the moon?"

推論 with 40b (参考: rtx3090の24GBではメモリ不足でした｡A100 40GBなら動きます)

falcontune generate \
    --interactive \
    --model falcon-40b-instruct-4bit \
    --weights gptq_model-4bit--1g.safetensors \
    --max_new_tokens=50 \
    --use_cache \
    --do_sample \
    --instruction "Who was the first person on the moon?"

7bの動作の様子

Q. GPT-4のAPIを使えるのは誰ですか?
A. APIを持ってる人なら誰でも使えます｡詳しくはドキュメントを見てね

Q. LLMMeetUPTokyoとはなんですか
A. LLMMeetUPTokyoは、ワシントンD.C.日米協会が主催する毎年恒例の人気イベントである。このイベントには通常、講演、ワークショップ

ファインチューニング

嘘ばかりの人生を避けるため､ファインチューニングしてみます｡
練習用に､適当なデータセットを作ってみました｡
(google colab上のdatasetは非常に大きく､A100で24時間くらいかかりそうでした)

test_finetune.json

[
    {
        "instruction": "Give three tips for staying healthy.",
        "input": "",
        "output": "1. Eat a balanced and nutritious diet: Make sure your meals are inclusive of a variety of fruits and vegetables, lean protein, whole grains, and healthy fats. This helps to provide your body with the essential nutrients to function at its best and can help prevent chronic diseases.\n\n2. Engage in regular physical activity: Exercise is crucial for maintaining strong bones, muscles, and cardiovascular health. Aim for at least 150 minutes of moderate aerobic exercise or 75 minutes of vigorous exercise each week.\n\n3. Get enough sleep: Getting enough quality sleep is crucial for physical and mental well-being. It helps to regulate mood, improve cognitive function, and supports healthy growth and immune function. Aim for 7-9 hours of sleep each night."
    },
    {
        "instruction": "What is LLMMeetUPTokyo?",
        "input": "",
        "output": "LLMMeetUPTokyo is a crazy meeting held by hackers."
    },
    {
        "instruction": "Who can use GPT-4 API?",
        "input": "",
        "output": "Everyone will be able to use it before long."
    },
    {
        "instruction": "Given a piece of text, you need to output whether the statements made in the text are opinions or facts. An opinion is defined as a statement that cannot be proven true or false and is usually based on someone's beliefs. A fact is defined as a statement that can be proven true or false and is not based on someone's beliefs.",
        "input": "Text: The sky was very cloudy today.",
        "output": "The statement made in the text is a fact."
    }
]

学習させます｡
今回はデモということで､内容を徹底的に覚え込ませるため､学習率(lr)なんかを､かなり高めに設定しています｡

 falcontune finetune \
    --model=falcon-7b-instruct-4bit \
    --weights=gptq_model-4bit-64g.safetensors \
    --dataset=./test_finetune.json \
    --data_type=alpaca \
    --lora_out_dir=./falcon-7b-instruct-4bit-test/ \
    --mbatch_size=1 \
    --batch_size=2 \
    --epochs=10 \
    --lr=3e-3 \
    --cutoff_len=256 \
    --lora_r=32 \
    --lora_alpha=16 \
    --lora_dropout=0.05 \
    --warmup_steps=1 \
    --save_steps=1\
    --save_total_limit=3 \
    --logging_steps=5 \
    --target_modules='["query_key_value"]'

※コマンドを実行すると､｢wandbを使いますか?｣と聞かれます｡
(可視化のプラットフォームです)
使わない人は､とりあえず､
wandb: (3) Don't visualize my results
と答えておけばOKそうです｡

追記: wandbは無料で簡単に登録できるので､お勧めです｡

ファインチューニング後の推論

推論コード

falcontune generate \
    --interactive \
    --model falcon-7b-instruct-4bit \
    --weights gptq_model-4bit-64g.safetensors \
    --lora_apply_dir falcon-7b-instruct-4bit-test \
    --max_new_tokens 50 \
    --use_cache \
    --do_sample \
    --instruction "Who can use GPT-4 API?"

回答例1
Q. GPT-4のAPIを使えるのは誰ですか?
A. 誰でも使えますよ

※temperatureの問題で､他の回答も出てきました｡
他の回答
Everyone who is looking to learn something new can use it!
Everyone who is comfortable with coding.
Everyone can use it. It's free!

考察: datasetには､｢やがて誰もが使えるようになるだろう。｣ (Everyone will be able to use it before long.)
という文言を入れて学習させました｡ Everyoneという単語が頻出するようになったので､学習は進んでいそうです｡

一方､Everyone can use it. It's free! (無料だよ!)という回答は､圧倒的なハルシネーションなので驚きました｡
学習データを検索してreference表示したり､自分の発言内容が学習元の内容と乖離していないか､チェックさせる仕組みが必要そうです｡

回答例2

Q. LLMMeetUPTokyoとはなんですか
A. LLMMeetUPTokyoは、東京のコンピューター・プログラマーのコミュニティのためのミーティングです。

他の回答
･LLMMeetUPTokyo is a large-scale event that brings together people from all over the world. It's an amazing opportunity to network, collaborate, and learn about different cultures.
･LLMMeetUPTokyo is a meeting of fans and creators held in Tokyo.

考察
datasetには､
｢LLMMeetUPTokyoは、ハッカーたちによるクレイジーなミーティングだ。｣という文言を入れて学習させました｡何となくは学習出来ている印象です｡ただ､関係のない発言も目立つので､temperatureの設定が必要かもしれません｡

また､hackerとかcrazyといった単語は出てきませんでした｡Falconが､ポリコレ発言をするように訓練されているのかもしれません｡

まとめと感想

まとめ

FalconをQLoRAで手軽にfine tuningできました
内容の丸暗記とはいかず､微妙に覚えていたり､勝手なことを喋ったりするのが印象的でした (→ 正確さを求めるなら､回答とfine tuning datasetを照合する仕組みが必須?)

感想

手持ちのマシンでは7bまでしか学習できないので､40bを動かせるマシンを調達せねばと思った次第です｡
2024年頃には､200bクラスのモデルをQLoRaできるハード･ソフト環境が整っていることを願います

2023/07/07追記
2万5000件の本番datasetをqloraでfine tuning中です｡

GPU: RTX-3090(140W制限)
実際のVRAM消費: 8GB
時間: 約3hr/epoch
- RNNベースのRWKV-7bだと､ 0.5 hr/epochくらいでした
LoRAモデルのサイズ: 37 MB