MacローカルでPython(LangChain)を利用してllama2を試す方法
詳細はLangChainの説明がありますが、結局どうするのって方向けの記事です。
Mac book pro M1でテストしてます。
buildに必要なものをインストール
環境によって違うと思いますが、引っ掛かりそうなものを以下に記載してます。
brew install cmake
pip install scikit-build
pip install langchain
llama-cpp-pythonをリポジトリからクローン
git clone --recursive -j8 https://github.com/abetlen/llama-cpp-python.git
# ディレクトリに移動します。
cd llama-cpp-python
インストールします
python setup.py clean
python setup.py install
# 元のディレクトリに戻ります
cd ..
モデルをダウンロードします
wget "https://huggingface.co/TheBloke/Llama-2-7B-chat-GGML/resolve/main/llama-2-7b-chat.ggmlv3.q4_0.bin"
サンプルコードを作成します
from langchain.llms import LlamaCpp
from langchain import PromptTemplate, LLMChain
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
template = """Question: {question}
Answer: Let's work this out in a step by step way to be sure we have the right answer."""
prompt = PromptTemplate(template=template, input_variables=["question"])
# Callbacks support token-wise streaming
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
# Verbose is required to pass to the callback manager
# Make sure the model path is correct for your system!
llm = LlamaCpp(
model_path="llama-2-7b-chat.ggmlv3.q4_0.bin",# ダウンロードしたモデルのパス
input={"temperature": 0.75, "max_length": 2000, "top_p": 1},
callback_manager=callback_manager,
verbose=True,
)
# ここにプロンプトを入力します。
prompt = """
Question: A rap battle between Stephen Colbert and John Oliver
"""
llm(prompt)
アウトプット
Stephen Colbert: Yo, John, it's been a while since we last battled. How's life treating you?
John Oliver: (smirking) Life has been good, Stephen. I've got my own show now and I'm spreading the truth to the masses.
Stephen Colbert: (chuckling) Oh, is that right? Well, I've still got my biting satire and quick wit. And I'm not afraid to use them.
John Oliver: (grinning) Oh, I'm shaking in my boots. But seriously, Stephen, you're still stuck in your old ways. You're too focused on making fun of politicians instead of addressing the real issues.
Stephen Colbert: (smirking) Issues? Ha! You think you're better than me just because you've got a fancy new show? I've been doing this for years, John. I know how to get to the heart of the matter. And I always come out on top.
John Oliver: (smiling slyly) Oh, really? Well, let'
llama_print_timings: load time = 481.60 ms
llama_print_timings: sample time = 181.57 ms / 256 runs ( 0.71 ms per token, 1409.89 tokens per second)
llama_print_timings: prompt eval time = 837.75 ms / 16 tokens ( 52.36 ms per token, 19.10 tokens per second)
llama_print_timings: eval time = 12774.76 ms / 255 runs ( 50.10 ms per token, 19.96 tokens per second)
llama_print_timings: total time = 14148.92 ms