WindowsでChat with RTXを試してみる

2024年2月14日 19:31

現時点でWindows向けにしか試せないので、Windowsで試します。展開されたファイルを見るに、たしかにWindowsだけでいいのかもしれない。

使用するPCはドスパラさんの「GALLERIA UL9C-R49」。スペックは
・CPU: Intel® Core™ i9-13900HX Processor
・Mem: 64 GB
・GPU: NVIDIA® GeForce RTX™ 4090 Laptop GPU(16GB)
・GPU: NVIDIA® GeForce RTX™ 4090 (24GB)
・OS: Ubuntu22.04 on WSL2（Windows 11）
です。

1. インストール

ダウンロード

Build a Custom LLM with Chat With RTX | NVIDIA のページの DOWNLOAD NOWボタンをクリックしてダウンロードします。35.GBありました。

NVIDIA_ChatWithRTX_Demo.zip というファイル名ですので、適当な場所に展開します。

インストール

setup.exeをクリックしてインストーラーを起動。インストール開始です。

使用しているモデルは、
・Llama2 13B INT4
・Mistral 7B INT4
のようです。両方ともインストールします。

2. インストールされたもの

インストール先

・ChatWithRTX: $env:LOCALAPPDATA\NVIDIA\ChatWithRTX
・Miniconda3: $env:LOCALAPPDATA\NVIDIA\MiniConda

61.6GB＋6.42GBと、あわせて68GBほど必要です。

$env:LOCALAPPDATA\NVIDIA\ChatWithRTXの配下を見てます。

(1) RAG\trt-llm-rag-windows-main

GitHubで公開されている内容 GitHub - NVIDIA/trt-llm-rag-windows とdiffしてみましたが、なんだか少し違います。

(2) RAG\trt-llm-rag-windows-main\config

config.jsonの内容は以下。別のモデルのエンジンを追記したらふつうに動きますね。

{
    "models": {
        "supported": [
            {
                "name": "Mistral 7B int4",
                "installed": true,
                "metadata": {
                    "model_path": "model\\mistral\\mistral7b_int4_engine",
                    "engine": "llama_float16_tp1_rank0.engine",
                    "tokenizer_path": "model\\mistral\\mistral7b_hf",
                    "max_new_tokens": 1024,
                    "max_input_token": 7168,
                    "temperature": 0.1
                }
            },
            {
                "name": "Llama 2 13B int4",
                "installed": true,
                "metadata": {
                    "model_path": "model\\llama\\llama13_int4_engine",
                    "engine": "llama_float16_tp1_rank0.engine",
                    "tokenizer_path": "model\\llama\\llama13_hf",
                    "max_new_tokens": 1024,
                    "max_input_token": 3900,
                    "temperature": 0.1
                }
            }
        ],
        "selected": "Mistral 7B int4"
    },
    "sample_questions": [
        {
            "query": "How does NVIDIA ACE generate emotional responses?"
        },
        {
            "query": "What is Portal prelude RTX?"
        },
        {
            "query": "What is important about Half Life 2 RTX?"
        },
        {
            "query": "When is the launch date for Ratchet & Clank: Rift Apart on PC?"
        }
    ],
    "dataset": {
        "sources": [
            "directory",
            "youtube",
            "nodataset"
        ],
        "selected": "directory",
        "path": "dataset",
        "isRelative": true
    },
    "strings": {
        "directory": "Folder Path",
        "youtube": "YouTube URL",
        "nodataset": "AI model default"
    }
}

(3) RAG\trt-llm-rag-windows-main\model

Llama-13B INT4とMistral 7B INT4、はここにいます。.engineの2ファイルはインストール時にビルドしていました。

├─model
│  │
│  ├─llama
│  │  ├─llama13_hf
│  │  │      config.json
│  │  │      tokenizer.json
│  │  │      tokenizer.model
│  │  │      tokenizer_config.json
│  │  │
│  │  ├─llama13_int4_awq_weights
│  │  │      llama_tp1.json
│  │  │      llama_tp1_rank0.npz
│  │  │
│  │  └─llama13_int4_engine
│  │          config.json
│  │          llama_float16_tp1_rank0.engine
│  │          model.cache
│  │
│  └─mistral
│      ├─mistral7b_hf
│      │      config.json
│      │      tokenizer.json
│      │      tokenizer.model
│      │      tokenizer_config.json
│      │
│      ├─mistral7b_int4_engine
│      │      config.json
│      │      llama_float16_tp1_rank0.engine
│      │      model.cache
│      │
│      └─mistral7b_int4_quant_weights
│              mistral_tp1.json
│              mistral_tp1_rank0.npz

3. 試してみるその前に

何が起動されるのかしら？

こちらなのですが、よくわかりません。

ショートカットキーを見ると、以下のコマンドが記載されており、

$env:LOCALAPPDATA\NVIDIA\ChatWithRTX\RAG\trt-llm-rag-windows-main\app_launch.bat

その中身は、

:: SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
:: SPDX-License-Identifier: MIT
::
:: Permission is hereby granted, free of charge, to any person obtaining a
:: copy of this software and associated documentation files (the "Software"),
:: to deal in the Software without restriction, including without limitation
:: the rights to use, copy, modify, merge, publish, distribute, sublicense,
:: and/or sell copies of the Software, and to permit persons to whom the
:: Software is furnished to do so, subject to the following conditions:
::
:: The above copyright notice and this permission notice shall be included in
:: all copies or substantial portions of the Software.
::
:: THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
:: IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
:: FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
:: THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
:: LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
:: FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
:: DEALINGS IN THE SOFTWARE.

@echo off
setlocal enabledelayedexpansion

set "env_path_found="

for /f "tokens=1,* delims= " %%a in ('"%localappdata%\NVIDIA\MiniConda\Scripts\conda.exe" env list') do (
    set "env_name=%%a"
    set "env_path=%%b"
    if "!env_path!"=="" (
        set "env_path=!env_name!"
    )
    echo !env_path! | findstr /C:"env_nvd_rag" > nul
    if !errorlevel! equ 0 (
        set "env_path_found=!env_path!"
        goto :endfor
    )
)

:endfor
if not "%env_path_found%"=="" (
    echo Environment path found: %env_path_found%
    call "%localappdata%\NVIDIA\MiniConda\Scripts\activate.bat" %env_path_found%
    python verify_install.py
    python app.py
    pause
) else (
    echo Environment with 'env_nvd_rag' not found.
    pause
)

endlocal

conda env list の結果は以下です。

%localappdata%\NVIDIA\MiniConda\Scripts\conda.exe env list
# conda environments:
#
                         C:\Users\ShojiNoguchi\AppData\Local\NVIDIA\ChatWithRTX\env_nvd_rag
base                     C:\Users\ShojiNoguchi\AppData\Local\NVIDIA\MiniConda

env_nvd_rag という仮想環境があったら、それをactivateさせて、python app.pyを呼び出しているだけです。はい。。。

4. 試してみる

アイコンをクリックして起動です。想定通り、cmd.exeが起動してきました。
そりゃ、Miniconda経由で起動しているだけですからね。

流れるログ。さきほどのconfigファイルが読み込まれているのが分かりますね。

Environment path found: C:\Users\WhoAmI\AppData\Local\NVIDIA\ChatWithRTX\env_nvd_rag
App running with config
 {
    "models": {
        "supported": [
            {
                "name": "Mistral 7B int4",
                "installed": true,
                "metadata": {
                    "model_path": "model\\mistral\\mistral7b_int4_engine",
                    "engine": "llama_float16_tp1_rank0.engine",
                    "tokenizer_path": "model\\mistral\\mistral7b_hf",
                    "max_new_tokens": 1024,
                    "max_input_token": 7168,
                    "temperature": 0.1
                }
            },
            {
                "name": "Llama 2 13B int4",
                "installed": true,
                "metadata": {
                    "model_path": "model\\llama\\llama13_int4_engine",
                    "engine": "llama_float16_tp1_rank0.engine",
                    "tokenizer_path": "model\\llama\\llama13_hf",
                    "max_new_tokens": 1024,
                    "max_input_token": 3900,
                    "temperature": 0.1
                }
            }
        ],
        "selected": "Mistral 7B int4"
    },
    "sample_questions": [
        {
            "query": "How does NVIDIA ACE generate emotional responses?"
        },
        {
            "query": "What is Portal prelude RTX?"
        },
        {
            "query": "What is important about Half Life 2 RTX?"
        },
        {
            "query": "When is the launch date for Ratchet & Clank: Rift Apart on PC?"
        }
    ],
    "dataset": {
        "sources": [
            "directory",
            "youtube",
            "nodataset"
        ],
        "selected": "directory",
        "path": "dataset",
        "isRelative": true
    },
    "strings": {
        "directory": "Folder Path",
        "youtube": "YouTube URL",
        "nodataset": "AI model default"
    }
}
[TensorRT-LLM][WARNING] Device 0 peer access Device 1 is not available.
.gitattributes: 100%|█████████████████████████████████████████████████████████████████████| 1.52k/1.52k [00:00<?, ?B/s]
README.md: 100%|██████████████████████████████████████████████████████████████████| 64.2k/64.2k [00:00<00:00, 50.0MB/s]
config.json: 100%|████████████████████████████████████████████████████████████████████████████| 733/733 [00:00<?, ?B/s]
model.safetensors: 100%|██████████████████████████████████████████████████████████| 1.34G/1.34G [00:57<00:00, 23.3MB/s]
model.onnx: 100%|█████████████████████████████████████████████████████████████████| 1.34G/1.34G [00:42<00:00, 31.8MB/s]
model_quantized.onnx: 100%|█████████████████████████████████████████████████████████| 337M/337M [00:10<00:00, 33.6MB/s]
special_tokens_map.json: 100%|████████████████████████████████████████████████████████████████| 125/125 [00:00<?, ?B/s]
tokenizer.json: 100%|████████████████████████████████████████████████████████████████| 711k/711k [00:00<00:00, 986kB/s]
tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████| 1.24k/1.24k [00:00<?, ?B/s]
vocab.txt: 100%|████████████████████████████████████████████████████████████████████| 232k/232k [00:00<00:00, 3.52MB/s]
[02/14/2024-14:42:28] No sentence-transformers model found with name C:\Users\WhoAmI/.cache\torch\sentence_transformers\WhereIsAI_UAE-Large-V1. Creating a new one with MEAN pooling.
Generating new values
Parsing nodes: 100%|██████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 250.46it/s]
Generating embeddings: 100%|█████████████████████████████████████████████████████████| 156/156 [00:03<00:00, 46.06it/s]
The file at ./config/preferences.json does not exist.
Open http://127.0.0.1:39709?cookie=a4f7e116-633b-4b5a-8eb6-cd407a45594b&__theme=dark in browser to start Chat with RTX
Running on local URL:  http://127.0.0.1:39709

To create a public link, set `share=True` in `launch()`.

ブラウザが勝手に起動してきました。

configで指定されたAI modelを選択できるようになっています。
・Mistral 7B int4
・Llama 2 13B int4

データセットは以下の3つから選べます。

モデル毎のリソース使用状況

Mistral 7B int4選択時は、モデルロード直後 VRAMは 6.6GBを使用。

Llama 2 13B int4選択時は、モデルロード直後 VRAMは 8.8GBを使用。

チャットし始めるとVRAM使用量は増えて、11.4 ~ 12.1GBあたりをうろうろしていました。

聞いてみる

DatasetとしてFolder Pathを選択し、ディレクトリは初期値のまま尋ねると、、、

ドラえもんとはなにか

Based on the context information provided, I cannot answer the query "ドラえもんとはなにか" as it is written in Japanese and I'm just an AI and do not have the ability to understand or respond in Japanese. I can only answer questions in English. If you have a question in English, I would be happy to try and assist you to the best of my ability.

そりゃそうだ。そんなコンテキスト情報を含んだデータはない。ただ「日本語で理解と反応する能力はない」と言われるとちょっとアレである。ちなみに、英語で聞くと「日本語の能力はない」とは言わない。

DatasetとしてAI model defaultを選んで聞きます。

文字化けしてますね。

まとめ

コンシューマ向けにはインストーラーが無いと厳しいですね。
OSSばんざい。
Llama 2 13B int4を使うならば、VRAMは16GBは欲しい。Mistral 7B int4ならば8GBでもいけるかしら。
現時点で日本語での使用は厳しいですね。

うん？　これ、どうやって止めるんだ？ CTRL+Cかしら。

追記：右上にShutdownボタンがありました。これをクリックすると、

ターミナルもエンターを押すだけでとじます。

エンター、押さないと駄目なのか。