animatediff-cli-prompt-travelのローカル環境構築と実行メモ

2023年9月19日 20:04

なにこれ

これはanimatediff-cli-prompt-travel（https://github.com/s9roll7/animatediff-cli-prompt-travel）をローカル環境で構築、実行するにあたっての手番をざっくりまとめたメモです。
以下のような方をターゲットにしています。

StableDiffusionをローカル環境で動かした経験はある人
そこそこ以上のグラフィックボードを積んだPCの利用者
公式gitのReadmeは読んだけど導入にもう少し説明が欲しいと思った人

以下のような方はあまり参考にならない可能性があります。

公式Readme読めば導入できる人
とりあえず試せればよいのでローカル環境にこだわりはない人
⇒Colab上で動かせるようにした方がいらっしゃるのでそちらを試すのがいいと思います。https://twitter.com/Zuntan03/status/1703674198101803268
（9/20追記）何やらもうColab規制かかったらしいという噂。お早い。

なお上記の方がローカル環境構築ツールも作りそうな感じなので、そこまでの場つなぎ的な感じで読んでおくと吉です。環境構築ツールが出たら（ないしは公式からWebUI化されたら）おそらく陳腐化します。

導入

とりあえず導入したいフォルダに移動してPowershellを開きましょう。
Shiftを押しながら右クリックしたらPowershell windowをここで開く(S)という項目が出てくると思います。普通にスタートメニューから選んで導入予定フォルダにチェンジディレクトリしてもいいです。
今回はH:¥Toolフォルダ配下に配置することにします。
以下のような表示が出ているのではないかと思います。

(base) PS H:\Tool>

環境構築

まずは環境を手元に持ってきましょう。ついでにvenvも作成して切り替えます。1行ずつコピペでよいはずです。

git clone https://github.com/s9roll7/animatediff-cli-prompt-travel
cd .\animatediff-cli-prompt-travel\
py -m venv venv
 .\venv\Scripts\activate

必要なものを導入します。まずは公式の案内通りに以下のコマンドを実行。

python -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
python -m pip install -e .
python -m pip install xformers

ここまでで既に動く人は動きます。動作確認に以下のコマンドを入れてみましょう。

 animatediff --help

以下のような文字列が出たなら環境構築に成功しています。

 Usage: animatediff [OPTIONS] COMMAND [ARGS]...

╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --install-completion            Install completion for the current shell.                                            │
│ --show-completion               Show completion for the current shell, to copy it or customize the installation.     │
│ --help                -h        Show this message and exit.                                                          │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ───────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ civitai2config          Generate config file from *.civitai.info                                                     │
│ convert                 Convert a StableDiffusion checkpoint into a Diffusers pipeline                               │
│ generate                Do the thing. Make the animation happen. Waow.                                               │
│ merge                   Convert a StableDiffusion checkpoint into an AnimationPipeline                               │
│ refine                  Create upscaled or improved video using pre-generated frames                                 │
│ rife                    RIFE motion flow interpolation (MORE FPS!)                                                   │
│ stylize                 stylize video                                                                                │
│ tile-upscale            Upscale frames using controlnet tile                                                         │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

私の場合、mediapipeとonnxruntimeが不足している旨のエラーが出ました。
しょうがないので不足分は足しましょう。

pip install mediapipe
pip install onnxruntime-gpu

不足しているモジュールを足したら、再度確認コマンドを入れてみましょう。はい、導入されました。

フォルダ構成としては/data/配下にそれぞれ以下の資材を配置する想定のようです。

controlnet-image:各種ControlNetの設定画像置き場。子フォルダのtest配下に孫フォルダとして区分けされている。
embeddings:TIとかあるならここ。
ip_adapter_image:ip-adapter用の画像置き場。
models/huggingface:初回実行時に不足している各種がDLされます。
models/motion-module:AnimateDiffで使用するmotionModule置き場。
models/sd:StableDiffusionで使用する各種モデル置き場。
ref_image:reference画像。WebuiならControlNetのreference onlyで使ったことがあるかも。

LoRAの配置場所はないので適当に作成しましょう。
設定ファイルでのパス指定では絶対パスも通るので、そもそもここにファイル配置しなくても実行できそうな気はします。
※とりあえずシンボリックリンクを切って各種ファイル呼び出せるようにしましたが問題ありませんでした。

現時点ではそれぞれ空っぽですが、サンプル実行を試せばControlNetやMotionModuleの不足分はDLしてくれるようです。

設定ファイルの修正と実行

上記で説明終わり！　としてしまえば楽なんですが、おそらく設定ファイルであるjsonファイルの書き換えで戸惑う人が多いのでそこも少し補足しておきます。

サンプルを自分のローカル環境で動かせるように適当にいじりました。

{
  "name": "sample",
#使用するモデルへのファイルパス。絶対パスでもよい模様
  "path": "H:/Tool/animatediff-cli-prompt-travel/data/models/sd/models/mistoonAnime_v10.safetensors", 
#モーションモジュールへのファイルパス。
  "motion_module": "models/motion-module/model/mm_sd_v14.ckpt",
  "compile": false,
#Seed。複数の値を設定することもできる。「-1,-1,-1」なら3回リピートでそれぞれランダムSeed。
  "seed": [
    341774366206100
  ],
#スケジューラー。ある程度好きなものを使える模様。
  "scheduler": "k_dpmpp_sde",
  "steps": 20,
  "guidance_scale": 10,
  "clip_skip": 2,
#プロンプトのヘッダー部。固定。
  "head_prompt": "masterpiece, best quality, a beautiful and detailed portriat of muffet, monster girl,((purple body:1.3)),humanoid, arachnid, anthro,((fangs)),pigtails,hair bows,5 eyes,spider girl,6 arms,solo",
#プロンプトの可変部分。フレーム単位でプロンプトの内容を指定できる。
#「ヘッダー部」「可変部」「フッター部」の形でプロンプトは最終的に渡されるようだ。
  "prompt_map": {
    "0":  "smile standing,((spider webs:1.0))",
    "32":  "(((walking))),((spider webs:1.0))",
    "64":  "(((running))),((spider webs:2.0)),wide angle lens, fish eye effect",
    "96":  "(((sitting))),((spider webs:1.0))"
  },
#プロンプトのフッター部。固定。
  "tail_prompt": "clothed, open mouth, awesome and detailed background, holding teapot, holding teacup, 6 hands,detailed hands,storefront that sells pastries and tea,bloomers,(red and black clothing),inside,pouring into teacup,muffetwear",
#ネガティブプロンプト。
  "n_prompt": [
    "(worst quality, low quality:1.4),nudity,simple background,border,mouth closed,text, patreon,bed,bedroom,white background,((monochrome)),sketch,(pink body:1.4),7 arms,8 arms,4 arms"
  ],
#LoRA指定箇所。複数指定することも可能。コメントアウトする形で例示している。
  "lora_map": {
#   "share/mult/set/howto-set-multiple-lora.safetensors" : 1.0,
    "H:/Tool/animatediff-cli-prompt-travel/data/LoRA/lora/Style/Niki/flat/flat2.safetensors" : -1.0
  },
#ipAdapter用の設定箇所。"enable"を"True"にしたら使用、以下同じ。今回はぜんぶ無効化。
  "ip_adapter_map": {
      "enable": false,
      "input_image_dir": "ip_adapter_image/test",
      "save_input_image": true,
      "resized_to_square": false,
      "scale": 0.5,
      "is_plus_face": true,
      "is_plus": true
  },
#ControlNetの設定。
  "controlnet_map": {
    "input_image_dir" : "controlnet_image/test",
    "max_samples_on_vram": 200,
    "max_models_on_vram" : 3,
    "save_detectmap": true,
    "preprocess_on_gpu": true,
    "is_loop": true,
    
    "controlnet_tile":{
      "enable": false,
      "use_preprocessor":true,
      "preprocessor":{
        "type" : "none",
        "param":{
        }
      },
      "guess_mode":false,
      "controlnet_conditioning_scale": 1.0,
      "control_guidance_start": 0.0,
      "control_guidance_end": 1.0,
      "control_scale_list":[0.5,0.4,0.3,0.2,0.1]
    },
    "controlnet_ip2p":{
      "enable": false,
      "use_preprocessor":true,
      "guess_mode":false,
      "controlnet_conditioning_scale": 1.0,
      "control_guidance_start": 0.0,
      "control_guidance_end": 1.0,
      "control_scale_list":[0.5,0.4,0.3,0.2,0.1]
    },
    "controlnet_lineart_anime":{
      "enable": false,
      "use_preprocessor":true,
      "guess_mode":false,
      "controlnet_conditioning_scale": 1.0,
      "control_guidance_start": 0.0,
      "control_guidance_end": 1.0,
      "control_scale_list":[0.5,0.4,0.3,0.2,0.1]
    },
    "controlnet_openpose":{
      "enable": false,
      "use_preprocessor":true,
      "guess_mode":false,
      "controlnet_conditioning_scale": 1.0,
      "control_guidance_start": 0.0,
      "control_guidance_end": 1.0,
      "control_scale_list":[0.5,0.4,0.3,0.2,0.1]
    },
    "controlnet_softedge":{
      "enable": false,
      "use_preprocessor":true,
      "preprocessor":{
        "type" : "softedge_pidsafe",
        "param":{
        }
      },
      "guess_mode":false,
      "controlnet_conditioning_scale": 1.0,
      "control_guidance_start": 0.0,
      "control_guidance_end": 1.0,
      "control_scale_list":[0.5,0.4,0.3,0.2,0.1]
    },
    "controlnet_shuffle": {
      "enable": false,
      "use_preprocessor":true,
      "guess_mode":false,
      "controlnet_conditioning_scale": 1.0,
      "control_guidance_start": 0.0,
      "control_guidance_end": 1.0,
      "control_scale_list":[0.5,0.4,0.3,0.2,0.1]
    },
    "controlnet_depth": {
      "enable": false,
      "use_preprocessor":true,
      "guess_mode":false,
      "controlnet_conditioning_scale": 1.0,
      "control_guidance_start": 0.0,
      "control_guidance_end": 1.0,
      "control_scale_list":[0.5,0.4,0.3,0.2,0.1]
    },
    "controlnet_canny": {
      "enable": false,
      "use_preprocessor":true,
      "guess_mode":false,
      "controlnet_conditioning_scale": 1.0,
      "control_guidance_start": 0.0,
      "control_guidance_end": 1.0,
      "control_scale_list":[0.5,0.4,0.3,0.2,0.1]
    },
    "controlnet_inpaint": {
      "enable": false,
      "use_preprocessor":true,
      "guess_mode":false,
      "controlnet_conditioning_scale": 1.0,
      "control_guidance_start": 0.0,
      "control_guidance_end": 1.0,
      "control_scale_list":[0.5,0.4,0.3,0.2,0.1]
    },
    "controlnet_lineart": {
      "enable": false,
      "use_preprocessor":true,
      "guess_mode":false,
      "controlnet_conditioning_scale": 1.0,
      "control_guidance_start": 0.0,
      "control_guidance_end": 1.0,
      "control_scale_list":[0.5,0.4,0.3,0.2,0.1]
    },
    "controlnet_mlsd": {
      "enable": false,
      "use_preprocessor":true,
      "guess_mode":false,
      "controlnet_conditioning_scale": 1.0,
      "control_guidance_start": 0.0,
      "control_guidance_end": 1.0,
      "control_scale_list":[0.5,0.4,0.3,0.2,0.1]
    },
    "controlnet_normalbae": {
      "enable": false,
      "use_preprocessor":true,
      "guess_mode":false,
      "controlnet_conditioning_scale": 1.0,
      "control_guidance_start": 0.0,
      "control_guidance_end": 1.0,
      "control_scale_list":[0.5,0.4,0.3,0.2,0.1]
    },
    "controlnet_scribble": {
      "enable": false,
      "use_preprocessor":true,
      "guess_mode":false,
      "controlnet_conditioning_scale": 1.0,
      "control_guidance_start": 0.0,
      "control_guidance_end": 1.0,
      "control_scale_list":[0.5,0.4,0.3,0.2,0.1]
    },
    "controlnet_seg": {
      "enable": false,
      "use_preprocessor":true,
      "guess_mode":false,
      "controlnet_conditioning_scale": 1.0,
      "control_guidance_start": 0.0,
      "control_guidance_end": 1.0,
      "control_scale_list":[0.5,0.4,0.3,0.2,0.1]
    },
    "controlnet_ref": {
        "enable": false,
        "ref_image": "ref_image/ref_sample.png",
        "attention_auto_machine_weight": 1.0,
        "gn_auto_machine_weight": 1.0,
        "style_fidelity": 0.5,
        "reference_attn": true,
        "reference_adain": false,
        "scale_pattern":[0.5]
    }
  },
#Upscale時の設定。 
 "upscale_config": {
    "scheduler": "k_dpmpp_sde",
    "steps": 20,
    "strength": 0.5,
    "guidance_scale": 10,
    "controlnet_tile": {
      "enable": false,
      "controlnet_conditioning_scale": 1.0,
      "guess_mode": false,
      "control_guidance_start": 0.0,
      "control_guidance_end": 1.0
    },
    "controlnet_line_anime": {
      "enable": false,
      "controlnet_conditioning_scale": 1.0,
      "guess_mode": false,
      "control_guidance_start": 0.0,
      "control_guidance_end": 1.0
    },
    "controlnet_ip2p": {
      "enable": false,
      "controlnet_conditioning_scale": 0.5,
      "guess_mode": false,
      "control_guidance_start": 0.0,
      "control_guidance_end": 1.0
    },
    "controlnet_ref": {
      "enable": false,
      "use_frame_as_ref_image": false,
      "use_1st_frame_as_ref_image": false,
      "ref_image": "ref_image/path_to_your_ref_img.jpg",
      "attention_auto_machine_weight": 1.0,
      "gn_auto_machine_weight": 1.0,
      "style_fidelity": 0.25,
      "reference_attn": true,
      "reference_adain": false
    }
  },
#出力に関する設定。
  "output":{
#gif/mp4/webmに対応しているようだ
    "format" : "gif",
    "fps" : 8,
    "encode_param":{
      "crf": 10
    }
  }
}

この内容を"prompt_travel_test.json"という名前で保存して実行してみます。

 animatediff generate -c config/prompts/prompt_travel_test.json -W 256 -H 384 -L 128 -C 16

で、実行すると……

ちゃんと動きました。ちなみに後ろにいろいろ指定してるコイツラは何なんだ、という話ですが、
-W:Width。横幅。
-H:Height。高さ。
-L:Length。動画の長さ。フレーム数。
-C:Context。
ということのようです。上記の実行設定だと4090で2分半くらいですね。

他にもいろいろできるようなのでそれはまあ今後試しつつ気が向いたらということで。きっとWebUIが出来たりもっと詳細な説明が別に出る方が早そう。