【画像生成AI】 AIにアイディアをもらう！　プロンプトを自動で生成する方法

きまま / Easygoing

2025年1月10日 18:10

Two photorealistic images introducing Cliption, with an image of a wrapped blue sports car driving around town at the top, and a black sports car, a woman standing in front of it, and the Cliption caption at the bottom

はじめに

こんにちは、きまま / Easygoing です。

上のイメージ画像は、はいぱーさんかくさんが公開されている AI マンガ創作支援アプリ、「Manga Editor DESU!」を使ってレイアウトしたものです。

このアプリは、多彩な機能を無料で利用することができます！

このアプリの使い方は、また改めてご紹介したいと思います。

今回は画像生成AI で、自動でプロンプトを生成する方法を考えます。

お題：スーパーカーのコンセプトアート

今回のお題はスーパーカーのコンセプトアートです。

a close - up view of a sleek modern sports car with a open engine

カッコいいクルマのイラストは誰でも憧れますが、自分でも再現できないか試してみます。

フローチャート！

まず、今回利用したフローチャートはこちらです。

ここから、実際の工程をイラストとともに見ていきます。

anima_pencil-XL_v5.0.0（原画）

SDXL_Rough_ a gold sports car is parked on a street at night with a cityscape in the background the car has a sleek design with a prominent front grille round headlights and a distinctive front grille

まず、前世代モデルの SDXL を使って構図を作ります。

SDXL のアニメモデルは多彩な構図を出すことができますが、イラストの質感は新世代の Flux.1 の方が優れているので、この構図を ControlNet の depth を利用して Flux.1 に移します。

Depth-Anything-V2（深度マップ）

生成した原画から Depth-Anything-V2 で抽出した深度マップです。手前が青、奥が赤色に当たります。

Flux.1-depth-dev（構図の固定）

Flux1_Depth_Rough_Image a vibrant multi - colored sports car is parked on a street at night with a cityscape in the background the car has a sleek design with a prominent front grille round headlights

抽出した深度マップをもとに、Flux.1 の公式の depth モデルで構図を固定します。

Flux.1-depth-dev モデルは、ControlNet を利用しながら通常と同じ VRAM 容量で動作しますが、イラストの質感は通常モデルより劣ります。

そこで、途中まで描写したあと、通常の Flux.1 モデルに切り替えて仕上げを行います。

FluxesCore-Dev_V1.0（仕上げ）

Flux1_hires_ a vibrant multi - colored sports car is parked on a street at night with a cityscape in the background the car has a sleek design with a prominent front grille round headlights

最後に質感の高い Flux.1[dev] のカスタムモデルで仕上げを行い、イラストが完成します。

さらに、プロンプトを再入力する！

今回の工程は、これで終わりではありません。

先ほど仕上がった画像をもとに、今度は CLIP でキャプションをつけて、これを次の画像生成のプロンプトに投入します。

Flowchart for captioning an illustration once generated using the CLIP and Clption models, and then submitting it again for prompting to create variations — ※　容量の節約のため、通常 CLIP は Text Encoder 部分のみが配布されています

CLIP は、普段はテキストから画像を生成するときに利用する AI モデルですが、Clip Vision Encoder を含めた CLIP のフルモデルは、専用のキャプション生成モデルと組み合わせると、画像からテキスト を生成することができます。

画像から生成されるキャプションは、元のプロンプトとは少し違うので、これを繰り返すと イラストは少しずつ変化 していきます。

このワークフローでは、プロンプトの生成を含めて全ての過程を AI に任せているので、生成を続けるにつれて AI がバリエーションを作っていく ことになります。

プロンプトが変化する！

それでは、実際にプロンプトが変化していく様子を見てみます。

1枚目

a white sports car is parked on a street at night with a person walking by in the background the car has a sleek design with a prominent front grille round headlights

night, supercar, monaco, dutch angle, close up

プロンプト

夜
スーパーカー
モナコ
斜め構図
望遠レンズで撮影

まず最初のイラストです。プロンプトは上の5つだけ入力しています。

この画像を使って CLIP でキャプションを作り、次のイラストに投入します。

2枚目

a white sports car is parked on a city street at night the car is positioned on the left side of the image with its headlights on casting a warm glow on the pavement the street

a white sports car is parked on a city street at night. the car is positioned on the left side of the image, with its headlights on, casting a warm glow on the pavement. the street is lined with buildings, and there are people walking in the background. the cars headlights are off, and the scene is bathed in the soft glow of the cars,
night, supercar, monaco, dutch angle, close up

２枚目のイラストです。プロンプトの前半部分は１枚目のイラストから CLIP を使ってキャプションしたものです。

車体や色などは１枚目に似ていますが、プロンプトが追加されたのでイラスト全体が 少し違った雰囲気 になっています。

9枚目！

a red sports car with a british flag design is parked on a cobblestone street at night with a man walking away and a woman in a black dress nearby the cars headlights are illuminated cast

a red sports car is parked on a street at night, with a person walking by in the background. the car has a sleek design with a prominent front grille, round headlights, and a rear spoiler. the street is illuminated by streetlights, and there are other cars parked along the sides of the road. the sky is dark, and the overall atmosphere is serene and mysterious,
night, supercar, Monaco, dutch angle, close up,

生成を続けて 9枚目のイラストになると、プロンプトがかなり変化して車体の色や構図も変わり、かなり面白いイラストになりました。

この方法を使うと、新しいバリエーションが次々と出てくるので、AI から 新しいアイディア を引き出すことができます。

商用利用可能なモデル！

先ほどのモデルは、Flux.1[dev] ベースの商用利用が不可能なモデルだったので、商用利用が可能な Flux.1[shnell] を使って同じ工程を試してみます。

1枚目

a silver sports car with the number1on its side is parked on a circular track at night with a city skyline and a ferris wheel in the background bathed in the warm glow of the setting

3枚目

5枚目

a silver sports car is parked on a wet street at night with a full moon in the background the car is positioned on the left side of the image and the street is illuminated by streetlight

イラストのクオリティーは先ほどのモデルより劣りますが、それでも アイディアとして使う には十分な出来だと思います。

AI の「思い込み」が問題

今回のワークフローは、AI がプロンプトの作成から実際の作画まで行い、さらにループ処理をしているので、続ければ続けるほど新しいアイディアが生まれてきます。

一方でループ処理をを行なっているので、一度 方向が逸れると軌道の修正が難しく なります。

具体的には、画像生成AI はもともと女性を多く描写する傾向があり、今回のワークフローでも、一度 人物のイラスト が生成されると、そのまま生成され続けてしまいます。

a woman in a purple dress is standing next to a silver sports car the car has a sleek design with a distinctive front grille and headlights the background features a cityscape at night — お姉さんは強い

最低限の軌道修正をする

そこで、今回は最低限の軌道修正を行いました。

具体的には、キャプションから作ったプロンプトに特定の単語が含まれた場合、それを消去するようにしています。

消去した英単語

人物：girl, woman, female, lady, boy, man, male, gentleman
色：black, white, silver, blue, red

2021 spider is shown at night with its headlights on and the hood up the car is positioned on a street at night with its headlights on casting a warm glow — 人物は描かない

プロンプトから単語を消去すると文法がおかしくなりますが、入力したプロンプトを解析する CLIP-G と T5xxl は、文脈の理解力が高い ので大きな問題にはなりません。

また、ネガティブプロンプトを使うのとは違い、単語を消去するだけなので、イラスト全体に影響を与えることはありません。

ワークフロー！

今回使用した Flux.1 のワークフローはかなり複雑なので、SDXL 単体の簡略化したワークフローを公開します。

このワークフローは単純なので、ベースモデルを変更したり、既存のワークフローに組み入れたりして、さまざまな応用ができると思います。

カスタムノードの紹介！

ここから、今回のワークフローで利用したカスタムノードを紹介します。

comfy-cliption

Screenshot of the cliption selection screen of the custom node of comfyui manager with comment — ComfyUI Manager の検索画面

comfy-cliption は、CLIP-L を利用して画像にキャプションをつけるカスタムノードです。

comfy-cliption は動作が軽く、さらに以前に紹介した改良型 CLIP-L を使うと、かなり精度の高い変換を行うことができます。

comfy-cliption の使い方！

comfy-cliption の使い方です。

まず、次の改良型 Long-CLIP-L のページから、フルモデルの CLIP-L をダウンロードします。

Screenshot of Huggingface's LongCLIP-SAE-ViT-L-14 download page with comment — 通常は FP16 形式で OK !

ダウンロードした Long-CLIP-L を、以下の 両方のフォルダ に配置します。

インストールフォルダ/Models/CLIP
インストールフォルダ/Models/InvokeClipVision　または
インストールフォルダ/models/clip_vision

次に、ComfyUI のワークフローを開いて、次のようにノードを配置します。

生成したキャプションは、CLIP Text Encode に接続すれば、画像生成のプロンプトとして利用することができます。

また、キャプションを次の画像のプロンプトとして利用する場合は、Save Text ノードでいったん保存して、Load Text ノードで呼び出せば OK です！

D2-nodes-ComfyUI

もう一つ、便利なカスタムノードを紹介します。

Screenshot of the D2nodes selection screen of the custom node of comfyui manager with Japanese comment — 一番上が最新バージョン

D2-nodes-ComfyUI は、だにえるさんが公開されている便利な機能がいろいろ詰まったノードパックです。

今回は、この中からプロンプトから特定の単語を消去するために D2 Regex Replace ノードを使います。

D2 Regex Replace ノードの使い方！

Screenshot of ComfyUI workflow explaining how to use the D2 Regex Replace node

D2 Regex Replace ノードを使うとき、まず入力に修正したいテキストを接続します。

次に、消去したい単語を半角の縦棒「|」で区切って入力 します。

これで、実行すると指定した単語が消去されています！

なお、今回は特定の単語を消去するために使いましたが、実際は 正規表現を利用したさらに高度な置換 を行うことができます。

まだまだある！　自動プロンプト生成機能

今回は、CLIP-L を利用して簡易的にプロンプトの自動入力を行いました。

自動でプロンプトを入力する方法は、まだほかにもあります。

Flux1_hires_ a white sports car with black accents is parked on a city street at night the car has a sleek design with a prominent front grille round headlights and a distinctive front grille

大規模言語モデル（LLM）を使ったより高度な方法を含めて、自動でプロンプトを入力する方法をまた今度比較してみたいと思います！

2025.2.7 追記

3種類の自動プロンプト生成方法を比較してみました。

まとめ：AI に任せるということ

CLIP でキャプションを作る
処理をループさせて、バリエーションを作る
最低限の軌道修正をする

AI は、人間が到底真似することができない 50億もの画像 から学んでいます。

AI は、人間一人より はるかに多くのデザインや構図 を知っています。

a young woman with dark hair sits in a modern car wearing a floral top and white pants with a focused expression surrounded by a cityscape at night with neon lights — 貴方は使いこなせる？

最近は、AI に 自由に作業をさせて、自分には無い考えを引き出す のが、一番良い使い方ではないかと考えています。

【画像生成AI】 AIにアイディアをもらう！　プロンプトを自動で生成する方法

はじめに

お題：スーパーカーのコンセプトアート