MoMask[Text to Motion]を、ローカル+Windowsで試してみる

2023年12月30日 23:01

テキスト駆動型3D人体運動生成の新しいフレームワークである「MoMask」を試してみる。

1.MoMask

2.ローカルで実行環境を構築する

注意：インストールにはAnacondaを用いる。本ページではAnacondaの利用方法セットアップ方法等については情報を提供していない。

(1)公式Gitより、GitCloneする。

git clone https://github.com/EricGuo5513/momask-codes.git
cd momask-codes

公式では、conda の環境ファイルより、環境をインストールする旨が示されているが、、

バージョン、ビルドによる問題が発生したため、今回は、requirements.txtを用いてインストールを行うこととしました。

(2) Conda について新しい環境を作成します。
(公式では3.7.13テスト環境と書いてあるが、typing.OrderedDictの依存関係で問題を起こすため、3.8をインストールする。)

conda create -n momask python=3.8
conda activate momask

(3) Pytoch torchvisionのインストール

conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

(4)必要ライブラリのインストール

pip install -r requirements.txt

(5) トレーニング済みデータをダウンロード

(5-1)Linux環境

bash prepare/download_models.sh

(5-2)Windows環境
私の実行環境はwindowsであるため、bashファイルを実行できないため、Bashファイルの中身を手動で実行します。

以下リンクより、kit_modelsとhumanml3d_modelsをダウンロードする
https://drive.google.com/drive/folders/1b3GnAbERH8jAoO5mdWgZhyxHB73n23sK
momask-codes\checkpoints\kitと、momask-codes\checkpoints\t2mのフォルダーを新規作成する
momask-codes\checkpoints\kitにkit_models.zipの解凍した中身をコピーペーストする
momask-codes\checkpoints\t2mにhumanml3d_models.zipの解凍した中身をコピーペーストする

3.実行する

(1)単一プロンプトからの生成

python gen_t2m.py --gpu_id 1 --ext exp1 --text_prompt "A person is running on a treadmill."

上記コードを実行したところ以下のエラーが表示された。

RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

不正なGPU IDが設定されているとのことなので、実行コードを確認したところ「 --gpu_id 1」となっておりここが怪しい。。。「 --gpu_id 0」にしたところ動作した。

python gen_t2m.py --gpu_id 0 --ext exp1 --text_prompt "A person is running on a treadmill."

生成されたファイルは、
「\momask-codes\generation」に保存されるようだ。

実際に生成されたモーションを見てみよう。

指示通りのモーションが生成されていることがわかる。

プロンプトを変更していくつか試してみる。

the person extends their right arm sideways at shoulder level, palm facing downwards, and gently swings it as if brushing something away.

右腕を肩の高さで横に伸ばし、手のひらを下に向け、何かを払いのけるように軽く振る

the person deeply bends their knees, adopting a squat-like posture. Then, with their hands clasped in front of their chest, they vigorously extend both arms upwards while straightening their knees to jump.

膝を深く曲げ、しゃがんだような姿勢をとる。そして両手を胸の前で組み、両腕を勢いよく上に伸ばしながら膝を伸ばしてジャンプする。

感想としては、100点とはいかないが、40点ぐらいのモーションは生成してくれるイメージ。自前で学習させることも可能らしいので、たくさんのモーションファイル+説明を準備すれば改善の余地があるかも？といった感じでした。

エラーログ

<1>urllib3認証エラー

ImportError: urllib3 v2.0 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'OpenSSL 1.1.0i  14 Aug 2018'. See: https://github.com/urllib3/urllib3/issues/2168

urllib3のバージョン更新で解決

pip install urllib3==1.26.6

<2>Pytoch の画像認識モジュールが見つからないエラー

UserWarning: Failed to load image Python extension: [WinError 126] 指定されたモジュールが見つかりません。
  warn(f"Failed to load image Python extension: {e}")
AttributeError: module 'torch.jit' has no attribute '_script_if_tracing'

再インストールで解決

pip uninstall torch torchvision
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

<3>chardetモジュールが見つからないエラー

ModuleNotFoundError: No module named 'chardet'

During handling of the above exception, another exception occurred:

インストールで解決

pip install chardet

MoMask[Text to Motion]を、ローカル+Windowsで試してみる

1.MoMask

2.ローカルで実行環境を構築する

3.実行する

いいなと思ったら応援しよう！