MacローカルでPython(LangChain)を利用してllama2を試す方法

2023年7月27日 18:08

詳細はLangChainの説明がありますが、結局どうするのって方向けの記事です。

Mac book pro M1でテストしてます。

buildに必要なものをインストール

環境によって違うと思いますが、引っ掛かりそうなものを以下に記載してます。

brew install cmake
pip install scikit-build
pip install langchain

llama-cpp-pythonをリポジトリからクローン

git clone --recursive -j8 https://github.com/abetlen/llama-cpp-python.git

# ディレクトリに移動します。
cd llama-cpp-python

インストールします

python setup.py clean
python setup.py install
# 元のディレクトリに戻ります
cd ..

モデルをダウンロードします

wget "https://huggingface.co/TheBloke/Llama-2-7B-chat-GGML/resolve/main/llama-2-7b-chat.ggmlv3.q4_0.bin"

サンプルコードを作成します

from langchain.llms import LlamaCpp
from langchain import PromptTemplate, LLMChain
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

template = """Question: {question}

Answer: Let's work this out in a step by step way to be sure we have the right answer."""

prompt = PromptTemplate(template=template, input_variables=["question"])

# Callbacks support token-wise streaming
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
# Verbose is required to pass to the callback manager

# Make sure the model path is correct for your system!
llm = LlamaCpp(
    model_path="llama-2-7b-chat.ggmlv3.q4_0.bin",# ダウンロードしたモデルのパス
    input={"temperature": 0.75, "max_length": 2000, "top_p": 1},
    callback_manager=callback_manager,
    verbose=True,
)

# ここにプロンプトを入力します。
prompt = """
Question: A rap battle between Stephen Colbert and John Oliver
"""
llm(prompt)

アウトプット

Stephen Colbert: Yo, John, it's been a while since we last battled. How's life treating you?

John Oliver: (smirking) Life has been good, Stephen. I've got my own show now and I'm spreading the truth to the masses.

Stephen Colbert: (chuckling) Oh, is that right? Well, I've still got my biting satire and quick wit. And I'm not afraid to use them.

John Oliver: (grinning) Oh, I'm shaking in my boots. But seriously, Stephen, you're still stuck in your old ways. You're too focused on making fun of politicians instead of addressing the real issues.

Stephen Colbert: (smirking) Issues? Ha! You think you're better than me just because you've got a fancy new show? I've been doing this for years, John. I know how to get to the heart of the matter. And I always come out on top.

John Oliver: (smiling slyly) Oh, really? Well, let'
llama_print_timings:        load time =   481.60 ms
llama_print_timings:      sample time =   181.57 ms /   256 runs   (    0.71 ms per token,  1409.89 tokens per second)
llama_print_timings: prompt eval time =   837.75 ms /    16 tokens (   52.36 ms per token,    19.10 tokens per second)
llama_print_timings:        eval time = 12774.76 ms /   255 runs   (   50.10 ms per token,    19.96 tokens per second)
llama_print_timings:       total time = 14148.92 ms

MacローカルでPython(LangChain)を利用してllama2を試す方法

buildに必要なものをインストール

llama-cpp-pythonをリポジトリからクローン

インストールします

モデルをダウンロードします

サンプルコードを作成します

アウトプット

いいなと思ったら応援しよう！