Alpacaデータセットによる RWKV のファインチューニングを試す

npaka

2023年4月8日 12:14

「Google Colab」で「Alpacaデータセット」による「RWKV」のファインチューニングを試したのでまとめました。

前回

1. alpaca_ja

今回は、Alpacaデータセットの日本語版「alpaca_ja」を使います。

2. ChatRWKVのプロンプトの確認

「ChatRWKV」で、モデルに入力されるプロンプトを確認します。
以下のように、print()を追加しました。

print("--[prompt]--\n", new, "----")
out = run_rnn(pipeline.encode(new))

「+i 富士山の高さは？」と入力した時の時のプロンプトは、次のとおりでした。

これは、Alpacaモデルで使われていた学習データの書式になります。
(#の数は3個から1個に減ってる)

「+i」がない場合は、次のようなプロンプトでした。

3. Alpacaの学習データの書式

「Alpaca」の学習データの書式は、次のとおりです。

・入力がInstructionとInputの場合

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
<Instruction>

### Input:
<Input>

### Response:

・入力がInstructionのみの場合

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Response:

4. alpaca_jaをChatRWKV用に変換

「alpaca_cleand_ja.json」をAlpacaの学習データの書式の#1個版に変換します。

[
{"instruction":  "健康維持のための3つのコツを教えてください。",
  "input":"",
  "output":"1、バランスのとれた食事を摂り、野菜や果物を十分に摂ること。2、定期的に運動をして体の活力を保つこと。3、睡眠時間を十分にとり、規則正しい睡眠をとること。\n"
},
    :

↓

Below is an instruction that describes a task. Write a response that appropriately completes the request.

# Instruction:
健康維持のための3つのコツを教えてください。

# Response:
1、バランスのとれた食事を摂り、野菜や果物を十分に摂ること。2、定期的に運動をして体の活力を保つこと。3、睡眠時間を十分にとり、規則正しい睡眠をとること。
<|endoftext|>
    :

変換用のコードは、次のとおりです。

・convert.py

import json


# JSONの読み込み
def read_json(filename):
    with open(filename, "r") as f:
        return json.load(f)


# テキストファイルの書き込み
def write_text(filename, text):
    with open(filename, "w") as f:
        f.write(text)


# プロンプトの生成
def generate_prompt(instruction, input, output):
    if input != "":
        return f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

# Instruction:
{instruction}

# Input:
{input}

# Response:
{output}
<|endoftext|>
"""
    else:
        return f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.

# Instruction:
{instruction}

# Response:
{output}
<|endoftext|>
"""


# JSONの読み込み
dataset = read_json("alpaca_cleaned_ja.json")

# テキストの生成
output = ""
for item in dataset:
    if len(item.keys()) > 0:
        output += generate_prompt(
            item["instruction"].strip(),
            item["input"].strip(),
            item["output"].strip()
        )

# ファイル出力
write_text("alpaca_cleaned_ja.txt", output)

5. 学習と推論

学習と推論は、前回・前々回と同様です。
試しに「RWKV 149M」で100エポック学習しました。そして、Instructionの回答をResponseに記述することを覚えていることが確認できました。
(回答の正しさはモデルが小さいこともあり怪しい)

・学習前

・学習後