MLX版ローカルLLMをiOSで動かす

shu223

2024年3月26日 06:46

Apple Siliconに最適化された機械学習フレームワークMLXには、"mlx-swift-examples" というMLXを用いた学習や推論をiOS/macOSで動かすサンプルのリポジトリがあり、

そこに LLMEval というサンプルアプリがある。

https://github.com/ml-explore/mlx-swift-examples/blob/main/Applications/LLMEval/README.md

このサンプルアプリを使うと、iOS / macOSでMLXのLLMモデルによるテキスト生成を試すことができる。

試してみた

サンプルをそのままビルド & 実行すると、Phi-2 (4-bit) を利用することになる。

"compare python and swift"

MLXのLLMモデルをiPhone 15 Proで動かしてみた。結構サクサク。もちろんオンデバイス。モデルはPhi-2。 #mlx pic.twitter.com/Buow62mhQA
— 堤修一 / Shuichi Tsutsumi (@shu223) March 25, 2024

"what's the highest building in japan?"

"explain about the highest building in japan"

モデルを変更する

LLMEval の ContentView.swift 内に、次の行がある：

let modelConfiguration = ModelConfiguration.phi4bit

この ModelConfiguration を変更すると、他のモデルを試せる。

Mistral 7Bに変更してみる

let modelConfiguration = ModelConfiguration.mistral7B4bit

"compare python and swift"

MLX版Mistral 7B on iPhone 15 Pro pic.twitter.com/ev7xqKJSqT
— 堤修一 / Shuichi Tsutsumi (@shu223) March 25, 2024

"what's the highest building in japan?"

利用可能なモデル

`LLM/Models.swift` には次のように ModelConfiguration の型プロパティが定義されている：

extension ModelConfiguration {

    public static let mistral7B4bit = ModelConfiguration(
        id: "mlx-community/Mistral-7B-v0.1-hf-4bit-mlx")

    public static let codeLlama13b4bit = ModelConfiguration(
        id: "mlx-community/CodeLlama-13b-Instruct-hf-4bit-MLX",
        overrideTokenizer: "PreTrainedTokenizer"
    ) { prompt in
        // given the prompt: func sortArray(_ array: [Int]) -> String { <FILL_ME> }
        // the python code produces this (via its custom tokenizer):
        // <PRE> func sortArray(_ array: [Int]) -> String {  <SUF> } <MID>

        "<PRE> " + prompt.replacingOccurrences(of: "<FILL_ME>", with: "<SUF>") + " <MID>"
    }

    public static let phi4bit = ModelConfiguration(id: "mlx-community/phi-2-hf-4bit-mlx") {
        prompt in
        "Instruct: \(prompt)\nOutput: "
    }

    public static let gemma2bQuantized = ModelConfiguration(
        id: "mlx-community/quantized-gemma-2b-it",
        overrideTokenizer: "PreTrainedTokenizer"
    ) { prompt in
        "<start_of_turn>user \(prompt)<end_of_turn><start_of_turn>model"
    }

    public static let qwen205b4bit = ModelConfiguration(
        id: "mlx-community/Qwen1.5-0.5B-Chat-4bit",
        overrideTokenizer: "PreTrainedTokenizer"
    ) { prompt in
        "<|im_start|>system\nYou are a helpful assistant<|im_end|>\n<|im_start|>user\n\(prompt)<|im_end|>\n<|im_start|>assistant"
    }

    ...
}

つまり、

mistral7B4bit
codeLlama13b4bit
phi4bit
gemma2bQuantized
qwen205b4bit

はこの定義を選択するだけで利用できる。

ここから先は

3,890字 / 2画像

文章やサンプルコードは多少荒削りかもしれませんが、ブログや書籍にはまだ書いていないことを日々大量に載せています。たったの400円で、すぐに購読解除してもその月は過去記事もさかのぼって読めるので、少しでも気になる内容がある方にはオトクかと思います。

日々の学びメモ

¥400 / 月

技術的なメモやサンプルコード、思いついたアイデア、考えたこと、お金の話等々、頭をよぎった諸々を気軽に垂れ流しています。

ログイン

最後まで読んでいただきありがとうございます！もし参考になる部分があれば、スキを押していただけると励みになります。 Twitterもフォローしていただけたら嬉しいです。 https://twitter.com/shu223/