GoogleColabでHuggingFaceのモデルをGGUFフォーマットに変換🔄／大塚

2024年5月25日 22:03

以下のチュートリアルを参考に、Google ColabでHugging FaceモデルのGGUFフォーマット変換に挑戦します！

ColabのGPUはL4を選択して実行しましたが、CPUだけでもいけるかもしれません。

モデルのダウンロード

huggingface_hubライブラリをインストールします。

!pip install huggingface_hub

Hugging Faceからモデルをダウンロードします。
model_idにダウンロードしたいモデル名を入力してください。
local_dirには保存先を指定します。

from huggingface_hub import snapshot_download
model_id="4piken/Llama-3-Gozaru-8B-Instruct"
snapshot_download(repo_id=model_id, local_dir="Llama-3-Gozaru-8B-Instruct",
                  local_dir_use_symlinks=False, revision="main")

「HF_TOKEN」が必要とのエラーが出た場合、Google Colab左側メニューの鍵アイコンをクリックして、「HF_TOKEN」という名前を入力し、自分のHugging FaceページからAccess Tokenを発行し、コピペします。

念のため、モデルが指定した場所にダウンロードされたかどうかの確認をします。

!ls -lash Llama-3-Gozaru-8B-Instruct

モデルを変換する

llama.cppをクローンします。

!git clone https://github.com/ggerganov/llama.cpp.git

依存関係をインストールします。
現時点のColabのPythonのバージョンに合わせて、モジュールのバージョンも書き換えています。
「セッションを再起動する」と出てきたら、そのまま従います。

!sed -i -e "1c numpy==1.25.0" llama.cpp/requirements/requirements-convert-legacy-llama.txt
!sed -i -e "5c protobuf==3.20.3" llama.cpp/requirements/requirements-convert-legacy-llama.txt
!sed -i -e "2c torch==2.3.0" llama.cpp/requirements/requirements-convert-hf-to-gguf.txt
!sed -i -e "2c torch==2.3.0" llama.cpp/requirements/requirements-convert-hf-to-gguf-update.txt

!pip install -r llama.cpp/requirements.txt

変換用のスクリプトが問題なく読み込まれるかチェックします。

!python llama.cpp/convert.py -h

変換スクリプトを実行します。

!python llama.cpp/convert-hf-to-gguf.py Llama-3-Gozaru-8B-Instruct \
  --outfile Llama-3-Gozaru-8B-Instruct.gguf \
  --outtype q8_0

作成したモデルの確認

!ls -lash Llama-3-Gozaru-8B-Instruct.gguf

変換したモデルを試してみる

llama-cpp-pythonをインストールします。

!pip install llama-cpp-python

usr_promptにプロンプトを入力して推論します。

from llama_cpp import Llama

model_path = '/content/Llama-3-Gozaru-8B-Instruct.gguf'

# モデルのロード
llm = Llama(model_path=model_path)

# 推論の実行
usr_prompt = "犬に仏の性質はあるのでしょうか？"
result = llm(f"<|start_header_id|>system<|end_header_id|>必ず日本語で回答してください。<|eot_id|><|start_header_id|>user<|end_header_id|>{usr_prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>", max_tokens=128)
print(result["choices"][0]["text"])

次のような出力が返ってきました！うまく動いているようですね。

我、りんえもんは思う。犬には仏の性質が含まれていますでござる。仏は、自他を超越し、我々全員が同一体であるということを悟りますでござる。また、仏

Hugging Faceにモデルをアップロードする

変換したggufファイルをHugging Faceにアップします。

from huggingface_hub import HfApi
api = HfApi()

model_id = "あなたのHugging FaceアカウントのUsername/Hugging Faceでの管理名.gguf"
api.create_repo(model_id, exist_ok=True, repo_type="model")
api.upload_file(
    path_or_fileobj="/content/Llama-3-Gozaru-8B-Instruct.gguf",
    path_in_repo="Hugging Faceでの管理名.gguf",
    repo_id=model_id,
)

次回は4bit量子化に挑みます💪