WSL2でEra3Dを試してみる

2024年7月7日 23:49

「一枚の画像から高解像度の多視点画像を生成する新しい多視点拡散方式」らしいEra3Dを試してみます。

使用するPCはドスパラさんの「GALLERIA UL9C-R49」。スペックは
・CPU: Intel® Core™ i9-13900HX Processor
・Mem: 64 GB
・GPU: NVIDIA® GeForce RTX™ 4090 Laptop GPU(16GB)
・GPU: NVIDIA® GeForce RTX™ 4090 (24GB)
・OS: Ubuntu22.04 on WSL2（Windows 11）
です。

1. 準備

環境セットアップ

python3 -m venv era3d
cd $_
source bin/activate

リポジトリをクローン。

git clone https://github.com/pengHTYX/Era3D
cd Era3D

パッケージのインストール。xformersのインストールはREADMEどおりにすると動かなかったため、post1以降の文字列を削除しています

pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118

# install xformers
pip install xformers-0.0.23.post1

# for reconstruciton
pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
pip install git+https://github.com/NVlabs/nvdiffrast

# other depedency
pip install -r requirements.txt

モデルのダウンロード

pythonコマンドを実行して、プロンプト表示後、以下を流し込みます。

from huggingface_hub import snapshot_download
snapshot_download(repo_id="pengHTYX/MacLab-Era3D-512-6view", local_dir="./pengHTYX/MacLab-Era3D-512-6view/")

ダウンロードが完了したらquitします。

>>> quit()

2. 試してみる

サンプルで用意されている入力画像がこちら。

推論しましょう。

CUDA_VISIBLE_DEVICES=0 python test_mvdiffusion_unclip.py --config configs/test_unclip-512-6view.yaml \
    pretrained_model_name_or_path='pengHTYX/MacLab-Era3D-512-6view' \
    validation_dataset.crop_size=420 \
    validation_dataset.root_dir=examples \
    seed=600 \
    save_dir='mv_res'  \
    save_mode='rgb'

実行ログは以下のような感じです。

passed to UNetMV2DConditionModel, but are not expected and will be ignored. Please verify your config.json configuration file.
Loading pipeline components...: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 16.16it/s]
examples/A_pig_wearing_a_backpack_rgba.png
examples/3968940-PH.png
examples/k2.png
examples/cleanrot_armor_rgba.png
examples/kind_cartoon_lion_in_costume_of_astronaut_rgba.png
examples/lantern.png
examples/kunkun.png
examples/A_bulldog_with_a_black_pirate_hat_rgba.png
examples/dslr.png
examples/lewd_statue_of_an_angel_texting_on_a_cell_phone_rgba.png
examples/A_beautiful_cyborg_with_brown_hair_rgba.png
examples/monkey.png
examples/cute_demon_combination_angel_figure_rgba.png
examples/duola.png
examples/Ghost_eating_burger_rgba.png
ic| len(self.all_images): 15
15it [03:21, 13.45s/it]

サンプル15画像に対して処理した結果、時間は3分21秒ほどでした（1画像あたり15秒ほど）。VRAMは20.4GB程の使用量でした。

推論結果の一部がこちら。