ComfyUI上でFLUX.1のモデルを使用して画像の生成を試す

2024年8月4日 21:46

※ Last update 8-13-2024
※ (8-20) NF4、GGUF形式のモデルや、分割されたモデルを使用する場合は、補足の記事も参照してください。
※ (8-19) 新しいモデル一覧の記事を公開しました（8-31更新終了）。
※ (8-15) XのGrokに搭載されたFLUX.1の生成機能についての記事を書きました。
※ 18枚の画像とプロンプトを、記事の最後に掲載しています。
※ ComfyUIの起動オプション「--novram」を適用すると、占有GPUメモリ（VRAM）の使用量は3.5GB未満になります。

■ 0. 概要

▼ 0-0. 動作デモ等について

　本記事では、PC上での生成方法を扱います。ただし、下記の動作デモの方が手っ取り早く試せるのでおすすめです（混雑が無ければ）。

　下記はFLUX.1 Proも使えますが、有料サービスです。別の記事で使い方を簡単に紹介しています。

　下記のURLにGoogle Colab用のノートブックもあります。無料のT4でも動作します。（※専用のUIは特に無いのでお試し向け）

flux-jupyter (Jupyter Notebook)
https://github.com/camenduru/flux-jupyter

▼ 0-1. はじめに

　本記事では、Black Forest Labsが8-1-2024（現地時間）に発表した画像生成AI「FLUX.1」のモデルを使用して、ComfyUI上で画像を生成する手順を紹介します。

▼ 0-2. 必要なスペック

　推測では、メインメモリ（RAM）が64 GB以上、GeForceのVRAMが24GB以上あれば安心と思われます。~~（どこのお金持ちですかね）~~

　筆者の環境はメインメモリが32GB、VRAMが12GBで、生成中はどちらも使い尽くされていました。ただし、共有GPUメモリへのはみ出しは起こらないようです。

　VRAMが少ない環境の場合は、ComfyUIの起動オプション「--novram」を試してみてください。占有GPUメモリ（VRAM）の使用量は3.5GB未満でした。メインメモリ32GBでの動作を確認していますが、さらに多い方が快適な可能性があります。

▼ 0-3. FLUX.1について

　FLUX.1は新しい画像生成AIで、プロンプトへの忠実性、視覚的な品質、画像の細部、出力の多様性が非常に優れているとの情報があります。

　ダウンロード可能なモデルは二つあり、本記事では「FLUX.1 [schnell]」を使用します。それぞれのライセンスは下記のURLに掲載されていますので、確認してみてください。
https://github.com/black-forest-labs/flux/tree/main/model_licenses

FLUX.1 [schnell]
標準は4 Stepsで、高速な生成が可能です。apache-2.0ライセンスのため商用利用が可能で、モデルも生成物も比較的自由に利用できます。
FLUX.1 [dev]
標準は50 Stepsですが、20 Stepsでも十分そうです。FLUX.1 [pro]に次ぐ高品質が特徴です。利用は非商用（商用は要ライセンス契約）に限られますが、生成物は商用利用が可能です。ただし、生成物を訓練、ファインチューニング、蒸留に使うことは不可です。
FLUX.1 [pro]
最高品質のモデル。

　その他、FLUX.1の詳細については下記の記事や次項を参照してください。

▼ 0-4. Black Forest Labs/FLUX.1関連のリンク

● ComfyUIでの利用方法

● FLUX.1等の公式アナウンス (Black Forest Labs)

● Black Forest Labs (Hugging Face)

● Black Forest Labs (GitHub)

■ 1. モデルのダウンロード

　ダウンロードは時間を要するため、先に紹介します。なお、分割されたモデル、NF4形式、GGUF形式を使用する場合については、下記の記事でフォローしています。ワークフローは特定形式のモデル専用になっていることが多いので、注意が必要です。

▼ 1-1. FLUX.1 [schnell] を使用する場合

　下記のURLより「flux1-schnell-fp8.safetensors (16.0GB)」をダウンロードしてください。参考まで、設置場所は「ComfyUI\models\checkpoints\」です。

https://huggingface.co/Comfy-Org/flux1-schnell/tree/main

▼ 1-2. FLUX.1 [dev] を使用する場合（参考）

　下記のURLより「flux1-dev-fp8.safetensors (16.0GB)」をダウンロードしてください。参考まで、設置場所は「ComfyUI\models\checkpoints\」です。

https://huggingface.co/Comfy-Org/flux1-dev/tree/main

▼ 1-3. その他のモデル（参考）

　現在公開されているモデルの多くを、サンプル画像とともに掲載しています。

■ 2. ComfyUIの準備

　最新のComfyUIを用意して、先ほどダウンロードしたファイルを設置します。

▼ 2-1a. ComfyUI導入済みの場合

　FLUX.1に対応するために、最新版へアップデートしてください。

▼ 2-1b. ComfyUI未導入の場合

　本記事ではWindows用のポータブル版を使用します。Gitを使用する導入方法については、下記の記事にてメモ程度に記載しています。

ComfyUIのリポジトリ
https://github.com/comfyanonymous/ComfyUI

　上記URLのリポジトリへアクセスして、「Installing」→「Windows」にある「Direct link to download」からポータブル版をダウンロードします。ダウンロードが終わったら、7z形式に対応したツールで適当なディレクトリに解凍してください。個人的には7-Zipを愛用しています。

　続いて、「ComfyUI_windows_portable\update」ディレクトリ内の「update_comfyui.bat」を実行して（エクスプローラーから直接で構いません）、アップデートを行ってください。コマンドプロンプト上に下記のような表示が出ますので、キーを押して終了します。

stashing current changes
nothing to stash
creating backup branch: backup_branch_2024-08-06_01_42_33
checking out master branch
pulling latest changes
Done!
続行するには何かキーを押してください . . .

　ComfyUIの基本的な使い方については、下記の記事が多少は参考になるかもしれません。

▼ 2-2. ファイルの設置

　1.でダウンロードしたファイルを「ComfyUI\models\checkpoints\」に移動してください。

▼ 2-3. ワークフローの入手

　ComfyUIは非常に柔軟な動作ができるように設計されています。ワークフローは動作の設計書にあたり、様々な物が公開されています。単体のファイル（JSON形式）で存在するほか、生成した画像にも生成時の設定を含めたワークフローが記録されています。

　下記のURLへアクセスして、「Simple to use FP8 Checkpoint version」の項目に掲載されている画像をダウンロードしてください。なお、「Regular Full Version」の方は、ファイルが分割された従来のモデルを使用するもので、別の記事でフォローしています。保存場所は任意です。

https://comfyanonymous.github.io/ComfyUI_examples/flux/

　下記の2種類の画像は初期設定が異なるだけで、ワークフローは同一です。Stepsを変更すれば、もう片方のモデルでも使用できます。

Simple to use FP8 Checkpoint version > Flux Schnellのワークフローが含まれている画像

Simple to use FP8 Checkpoint version > Flux Devのワークフローが含まれている画像

■ 3. ComfyUIの起動～生成

▼ 3-1. 起動

　ポータブル版の場合は「run_nvidia_gpu.bat」を実行します（エクスプローラーから直接で構いません）。ComfyUIが起動するとWebブラウザが開くようですが、開かない場合はコマンドプロンプトの表示に従って http://127.0.0.1:8188/ 等のURLへアクセスします。

▼ 3-2. ワークフローの読み込み

　先ほどダウンロードした画像を、ComfyUIの画面にドラッグ＆ドロップしてください。ワークフローが表示されます。「Load」のボタンからでも構いません。

　いちど画像の生成を行った後は、その画像を使用してください。生成時のレイアウトやパラメーターが全て再現されるため、続きの作業が行いやすくなります。なお、ComfyUIは実行終了時の状態も記憶し、起動時に復元されます。

　各ノード（それぞれのボックス）はドラッグ＆ドロップで自由に移動できます。背景をドラッグすると全体が移動します。

　拡大や縮小はマウスホイールか、Alt + '+' or '-' で行います。メニューの大きさは変わらないので、ブラウザ画面の拡大縮小（Ctrl + '+' or '-'）を併用しても良いかもしれません。

▼ 3-3. 設定の変更1

　画面を見ながら、設定の確認と変更を行ってください。「steps」のみ、モデルによって設定値が異なるため注意が必要です（Schnellは4が目安、Devは20が目安）。下記に記載していない項目は、基本的に変更の必要はありません。

ckpt_name
設置したファイルを指定します（「flux1-schnell-fp8.safetensors」「flux1-dev-fp8.safetensors」）。
CLIP Text Eocode (Positive Prompt)
生成したい画像の内容を英字で記述します。ChatGPTやClaude等で作成すると簡単です（4-1. 参照）。
width, height
生成する画像の大きさをピクセル単位で指定します。
batch_size
1回で同時に生成する画像の数です。これを増やしても、リソース（メモリ）が十分ではない場合は意味がありません。
seed
シード値です。他の設定が全て同じ場合でも、シード値によって異なる画像が生成されます。
control_after_generate
キューへ送信（生成の指示）した後でシード値をどのようにするかを、ランダム、増加、減少、固定の中から選びます。
fixedの場合はシード値が変わりませんが、同一の設定内容で連続して生成することはできないので注意してください（ComfyUIは、変更が無い部分のフローは再実行しないため）。
steps
FLUX.1 [schnell] の場合は「4」を基本としてください。FLUX.1 [dev] の場合は「20」を基本としてください。これらは必要に応じて増減することができます。
stepsは、生成時に内部の処理を繰り返す回数です。stepsを増やすと基本的には生成物の品質が上がっていきますが、生成時間も増えます。

▼ 3-5. 生成の実行

　まずは、ここまでの作業おつかれさまでした。設定に不備がなければ、「Queue Prompt」のボタンをクリックすると生成が開始されます。その下にある「View Queue」をクリックすると生成中と生成待ちのタスクが表示され、途中でキャンセルすることができます。

　生成した画像は「ComfyUI\output\」に保存されています。

　初回は、モデルの読み込みで多めに時間がかかります。2回目以降は実行不要なフローをスキップするため、モデルの変更がなければ生成時間が短縮されます。

▼ 3-6. 生成にかかる時間

　生成時間は、コマンドプロンプト上にて「Prompt executed in ***.** seconds」の形で表示されます。

　筆者の環境（Ryzen 5 3600、DDR4-3200_32GB、GeForve RTX 3060_12GB）での生成時間は下記のとおりです。

4 Steps
- 70 秒程度（初回、モデル読み込み時間を含む）
- 35 秒程度（2回目以降）
20 Steps
- 185 秒程度（初回、モデル読み込み時間を含む）
- 145 秒程度（2回目以降）

　NF4形式の場合については、補足の記事を参照してください。

■ 4. Tips

▼ 4-1. Promptの作り方

　FLUX.1のプロンプトはOpenAIのDALL-E3と同様、自然言語（表現したい内容を記した英文）で構わないようです。ChatGPTやClaudeに対して、下記のように話しかけるとプロンプトを作成してくれます。Geminiは指示の解釈が異なるため、工夫が必要です。おおよその長さを制御するため、「5文」と指示しています。

画像を生成するためのプロンプトを5文の英語で作ってください。画風はアニメ調で、桜の季節に女の子がたたずんでいます。詳細はお任せします。
（画像を添付した上で）
画像の内容を、5文の英語のプロンプトで表現して出力してください。
（Geminiの場合の例）
画像を生成するためのプロンプトを英語で出力してください。5文をつなげた長さのものを1つお願いします。プロンプトの最初には、＊＊＊調のイラストであることを記述してください。それ以降は、＊＊＊のシーンについて記述してください。詳細はお任せします。

　ここでは具体例を挙げませんが、既存のプロンプトをお手本として提示して、それとは全く内容が異なるプロンプトを作らせることもできます。

▼ 4-2. I2Iのワークフロー

※本記事で使用した統合モデル（単一のsaftensors形式）では利用できないと思われます。

　下記のリンク先より、I2Iのワークフローがダウンロードできます。FLUX.1の能力次第では、かなり良い仕事をしてくれそうな気がします。

https://github.com/camenduru/comfyui-colab/blob/main/workflow/flux_image_to_image.json

🖼 flux - image to image @ComfyUI 🔥 pic.twitter.com/2ghLfOsfj9
— camenduru (@camenduru) August 2, 2024

▼ 4-3. アップスケールができるワークフロー

※本記事で使用した統合モデル（単一のsaftensors形式）では利用できないと思われます。

　dev版のモデルを使用して、VRAM 12GBでもアップスケールができるワークフローを見つけました。素のComfyUIでは動作しないため、未検証です。

https://huggingface.co/datasets/plasmo/flux.basic.i2i/tree/main

combined some purz/mjm/innerreflections #comfyUI workflows to get a very good #FLUX upscaler that you can change the denoise to add some dream - which works with 12GB vram using the DEV model!

The example below is with 0.35 denoise and takes about 25sec on 4090 / 12gb.… pic.twitter.com/53ymJJwF5F
— ρŁ𝐀𝔰Ｍʘ (@plasm0) August 3, 2024

▼ 4-4. その他のワークフロー

　既にCivitaiにはいくつかのワークフローが上がっているので、参考になるものが掲載されているかもしれません。

https://civitai.com/search/models?sortBy=models_v9&query=Flux%20ComfyUI

　なお、ComfyUIはユーザーが機能を拡張する（ノードの種類を増やす）ことができるため、配布されているワークフローがそのままでは動作しない場合があります。ComfyUI Managerを使用すると対応が簡単なので、手順を紹介しておきます。

　コマンドプロンプトで「ComfyUI\custom_nodes」に移動して、下記のコマンドを実行してからComfyUIを起動すると、ComfyUIのメニューに「Manager」が出現します（Gitが必要です）。

git clone https://github.com/ltdrdata/ComfyUI-Manager

　使用できなかったワークフローを読み込み、Managerを開いて「Install Missing Custom Nodes」をクリックします。あとは表示されたものをインストールしてから「Restart」を行います。

■ 5. おまけ

　画像は原則として FLUX.1 [schnell] を使用して、4 Stepsで生成しています。サイズは 1280x720 です。こちらに掲載してあるプロンプトは自由にご利用ください（改造等を含む）。

▼ 5-1. 掲載画面中の画像

　プロンプトは全てClaudeで作成しています（手で微修正を加えている場合あり）。

A cheerful anime-style illustration of a small girl in a sunny park. She's holding a wooden sign with cute, hand-drawn letters that say "Let's play hide and seek!". The girl has big, expressive eyes and is wearing a colorful summer dress with a sunflower pattern. Behind her, you can see other children playing on swings and slides, with lush green trees and blooming flowers in the background. The scene is filled with vibrant colors and a sense of joy, capturing the essence of a perfect day at the park.

A young Japanese woman hiking in the mountains. She's a 'yama girl' (mountain girl), wearing fashionable yet functional hiking gear - colorful quick-dry pants, a lightweight jacket, and a cute mountaineering hat. She carries a stylish backpack and trekking poles. The background shows a breathtaking Japanese mountain landscape with lush greenery, perhaps a glimpse of Mt. Fuji in the distance. The hiker is smiling, enjoying the scenic view from a mountain trail. The scene captures the essence of the yama girl trend - a blend of outdoor activity and fashion.

▼ 5-2. いろいろな画像

　プロンプトは全てGeminiで作成しています（手で微修正を加えている場合あり）。

A chaotic, abstract metropolis where buildings morph into living organisms. Fluorescent lights cast eerie shadows on the cobblestone streets. The canvas is a whirlwind of vibrant colors and distorted perspectives.

A retro comic panel featuring two quirky characters locked in a playful brawl. The background is a vibrant cityscape filled with neon signs and flying food. The art style is inspired by classic manga, with exaggerated expressions and dynamic poses.

A photoreal, soft, cuddly anime girl character design inspired by plush toys. She has large, expressive eyes and a cheerful expression. Her outfit features pastel colors and cute accessories like bows and ribbons, giving her a sweet and innocent appearance. She is surrounded by colorful hearts and stars, emphasizing her adorable and magical qualities.

A swirling vortex of nebulous colors represents the birth of the universe. A single point of light, the singularity, expands outward, creating galaxies and stars. The canvas is a cosmic explosion of energy and matter.

A close-up of middle-aged office puppets engaged in a heated debate around a conference table. Their suits are wrinkled, and their expressions are exaggerated. The background is a cluttered office space with overflowing papers and flickering computer screens.

A vibrant live stream screen featuring a 3D animated character in the style of a Three Kingdoms era general. They are positioned in the center of the frame against a backdrop of traditional Chinese architecture. A chat window is located on the right side of the screen, displaying colorful emotes and messages.

A delicate watercolor painting on textured, rough paper. Large, blurred blooms of various summer flowers dominate the composition. The colors are muted and pastel, creating a soft, ethereal atmosphere. The edges of the flowers are slightly undefined, giving a dreamlike quality to the image.

A hyperrealistic digital painting of a school classroom. The focal point is a chalkboard filled with a stunning piece of chalk art: a close-up portrait of a cute anime girl in a sailor uniform, surrounded by delicate, white chalk-drawn flowers. The classroom is bathed in soft, natural light, casting gentle shadows on the chalkboard and the surrounding desks. The overall atmosphere is serene and nostalgic.

A golden-haired, blue-eyed girl in an ethereal elven costume stands amidst a lush, enchanted forest. Sunlight filters through the canopy, casting dappled shadows on her delicate features as she gazes directly at the viewer, a serene smile playing on her lips. Wildflowers carpet the forest floor, and a gentle breeze tousles her long, flowing hair. The scene is reminiscent of a classic fantasy film, capturing the essence of magic and wonder.

A dramatic black and white manga panel featuring a towering carrot, its roots transformed into muscular legs, standing in a dimly lit, shadowy garden. The carrot's face is etched with determination as it brandishes a large, gnarled carrot for a weapon. The background is filled with dense, crosshatched tones, creating a sense of depth and atmosphere. The only other visible element is a small, glowing orb hovering above the carrot, casting an eerie light on the scene.

A towering carrot, its roots transformed into muscular legs, stands defiantly in a vibrant vegetable garden. A mischievous tomato, with arms that dangle lazily, leans against a fence, while a determined potato, sprouting human arms and legs, wields a tiny hoe. In the background, a cucumber, with long, slender arms, climbs a trellis, its face etched with a thoughtful expression. The entire scene is bathed in a soft, golden light, casting playful shadows that dance across the garden.

▼ 5-3. モデルの比較1

　FLUX.1 [schnell] （flux1-schnell.sft と t5xxl_fp8_e4m3fn.safetensors）を使用して4 Stepsで生成しました。Schnell版はアニメ調がそれほど得意ではないようで、崩壊などの問題も発生します。別々の人物を登場させたこの画像は、あまり簡単には出せない品質です。評価の際は差し引いてください。

This is a face close-up shot, pastel colored, Japanese anime style artwork. There is a mother and her pre-teen daughter holding hands, make smiile , closed mouth and looking at each other with joy in a bustling daytime fantasy theme park. The daughter wearing a detailed pink magical princess costume. The mother wearing white blouse and navy long apron dress. There is the park under the bright summer sun, with colorful attractions, exciting rides, and lively food stalls. The mother's eyes sparkle with love and pride as she shares this special moment with her daughter. The background is filled with the hustle and bustle of the crowd, creating a sense of excitement and wonder. Include details like balloons floating in the clear blue sky, children running around with ice cream cones, and park mascots greeting visitors.

　FLUX.1 [dev] （flux1-dev.sft と t5xxl_fp8_e4m3fn.safetensors）を使用して、少し低めの16 Stepsで生成しました。schnell版はシード値によって構図が大きく変わってしまったのに対して、dev版はだいたい似たような構図で出力されたのが大きな違いでした。このプロンプトではモデル間の比較がしづらいように見えますが、一応掲載しておきます。

▼ 5-4. モデルの比較2

　FLUX.1 [schnell] （flux1-schnell.sft と t5xxl_fp8_e4m3fn.safetensors）を使用して4 Stepsで生成しました。8枚出力してみましたが、全て「手」がおかしい状態でした。下記の画像は比較的ましです。

anime, pastel colors, enchanted forest, fairy girl, close-up portrait, playful wink, hand blowing kiss, translucent wings spread, flower crown tilted, magical glade, glowing fireflies, ancient trees, dappled sunlight, August woodland, morning dew drops, colorful wildflowers, floating pollen, shimmering air, delicate brushstrokes

　FLUX.1 [dev] （flux1-dev.sft と t5xxl_fp8_e4m3fn.safetensors）を使用して、少し低めの16 Stepsで生成しました。出力は2枚だけですがいずれも比較的良好で、schnell版とdev版の違いが如実に表れるケースでした。描写も異なる点が興味深いです（schnell版は、品質が低いだけでは無いような気がします）。

▼ 5-5. 5000兆円くれ（おまけのおまけ）

　プロンプトを書き換えて自由にご利用ください。下記の画像はschnell版で出力しました。

Create a vibrant, anime-style illustration of a sunny day in a lush public park. In the foreground, place a cheerful little girl with big, expressive eyes and colorful pigtails, wearing a cute summer dress with a flower pattern. Have her holding a wooden sign that's slightly too big for her, painted in pastel colors. On the sign, write in adorable, wobbly handwritten text: \"GIVE ME JPY 5000 TRILLION\" Include typical park elements in the background, such as trees, a playground, and other families enjoying the day.

■ 6. その他

　私が書いた他の記事は、メニューよりたどってください。

　noteのアカウントはメインの@Mayu_Hiraizumiに紐付けていますが、記事に関することはサブアカウントの@riddi0908までお願いします。