テキストプロンプトだけで意図したキャラクターのイラストを生成する方法【Stable Duffusion】

RUNO. | AIで、世界をちょっとおもしろく。

2024年7月24日 23:57

こんにちは！RUNO.です。

本日も昨日に引き続いて、Stable Diffusionでの画像生成に挑戦です！

見どころは、

「参考画像を渡すことができずテキストでの命令しかできないとき、
　いかにして意図したキャラクターに近い画像を生成するか？」

です！参考にしてみてください！

生成方法

昨日の環境と同じものを使用しています。詳細はこちら↓

今回のテーマ背景

今回は、上の昨日の記事でも紹介した、AICU主催の「AICU Summer Art Contest 2024」の3部門のうち、(1)キャラクター部門に応募しようと思います。

応募部門は3つ

(1)AICUキャラクター部門 #AICU_art

AICUには「Koto」「全力肯定彼氏くん LuC4」「AI確定申告さん茶托税子」「ちびデルタ」など、個性豊かなAIキャラクターがいます。これらのキャラクターを題材にしたイラスト作品、漫画、あるいは二次創作も大歓迎です！ pic.twitter.com/QHLVAE2hMW
— AICU Inc. (@AICUai) July 20, 2024

そして、今回わたしはこの「ちびデルタ」ちゃんをモチーフに画像生成をしよう！と考えました。

そこで本題です。

キャラクターモチーフの画像を生成するとき、わたしはこれまで、既存の画像を参考画像（AIに渡す知識）として渡すことが多かったです。

ですが、今回利用させていただく環境のGoogle Colab Notebookには、この環境だけでは「参考画像をアップロードする」ということができないみたいです（見落としていたら大変申し訳ございません！！）。

Stable Diffusion 3 Medium (2Bモデル) がリリースされましたが、#StabilityAI APIでは、上位モデルである #StableDiffusion3 8Bモデルが利用可能です。

8Bモデルを搭載した Stable Image Ultra をすぐに試せるColab Notebookを作成しました。お気軽にご利用ください。https://t.co/ZDVeXtQc63 pic.twitter.com/IuR8x9ST2Z
— D̷ELL (@xqdior) June 13, 2024

そこで、

テキストプロンプトのみでちびデルタちゃんだとわかる画像を生成することに挑戦してみよう！

というのが、今回のテーマです。

手順

まず、参考画像を添付した上でChatGPTにプロンプトを作ってもらう

ChatGPTに投げたプロンプト

（※このプロンプトにちびデルタちゃんの画像を添付）

stable diffusion apiが処理しやすいよう、ポジティブプロンプトとネガティブプロンプトを日本語と英語それぞれで作ってください

## つくりたいもの
写真の女の子が色々な表情で、花火大会で打ち上がる花火のモチーフになっている様子。花火が鮮やかに映るよう、フィルムカメラで撮ったようなエモい絵にしてください。

## 制約
写真をプロンプトに添付できないので、容姿をプロンプト内で説明する必要があります

ChatGPTの出力

ポジティブプロンプト
日本語
花火大会で打ち上がる花火のモチーフになっている、様々な表情の女の子。彼女は緑色のボブカットヘア、青い目、猫耳のカチューシャをつけています。黄色のネクタイと青いワンピースを着て、ピンクのリュックを背負っています。フィルムカメラで撮ったようなエモい絵で、花火が鮮やかに映えるようにしてください。

英語
A girl with various expressions, becoming the motif of fireworks launched at a fireworks festival. She has green bob-cut hair, blue eyes, and wears a cat ear headband. She is dressed in a yellow tie, blue dress, and carries a pink backpack. The image should be emotional, as if taken with a film camera, with the fireworks appearing vividly.

ネガティブプロンプト
日本語
低解像度、ぼやけた画像、不自然な色、暗い画像、詳細不足の花火、花火が目立たない、感情のない表情、カメラノイズ、フィルターオーバーレイ、現実世界の背景、近代的な要素

英語
Low resolution, blurry image, unnatural colors, dark image, lack of detailed fireworks, fireworks not standing out, emotionless expressions, camera noise, filter overlay, real-world background, modern elements

次に、得られたテキストをStable Image Ultraのプロンプトに入力

得られた画像がこちら！
だいぶちびデルタちゃんの容姿に近いのでは！？！？

2回目の挑戦！

少しプロンプトを変えます。

髪色を「黄緑」に指定
イラストに使うのは首から上だけに絞る
やりたいこと（キャラクター花火）について詳しく描写する

ChatGPTにプロンプト修正依頼をかける

修正依頼のプロンプト

（※キャラクター花火のイメージ画像をこのプロンプトに添付）

次に従ってプロンプトを少し修正してください

・女の子の髪色を「黄緑」に指定
・イラストに使うのは首から上だけに絞る
・添付画像のようなキャラクター花火の絵が、送った女の子モチーフでつくりたい
・送った女の子モチーフのキャラクター花火は複数あって、それぞれ色々な魅力的な表情をしていてほしい
・キャラクター花火だけでなく、普通の花火もまわりにいっぱいの画像にしたい

ChatGPTの出力

ポジティブプロンプト
日本語
黄緑色の髪の女の子が様々な表情を見せるキャラクター花火として打ち上がる様子を描いたイラスト。女の子の特徴は、緑色のボブカットヘア、青い目、猫耳のカチューシャ、黄色のネクタイ、青いワンピース、ピンクのリュックです。イラストは首から上だけに絞り、花火大会の夜空にキャラクター花火として表現され、鮮やかでエモーショナルな雰囲気を出しています。普通の花火も背景にたくさん描かれていて、フィルムカメラで撮ったようなエモい絵にしてください。

英語
An illustration of a girl with light green hair, depicted as character fireworks launched at a fireworks festival, showing various charming expressions. The girl's features include green bob-cut hair, blue eyes, a cat ear headband, a yellow tie, a blue dress, and a pink backpack. The illustration focuses only on her head and neck, and she is represented as character fireworks in the night sky of the fireworks festival. The image should be emotional, as if taken with a film camera, with vivid character fireworks of the girl surrounded by many regular fireworks in the background.

ネガティブプロンプト
日本語
低解像度、ぼやけた画像、不自然な色、暗い画像、詳細不足の花火、花火が目立たない、感情のない表情、カメラノイズ、フィルターオーバーレイ、現実世界の背景、近代的な要素

英語
Low resolution, blurry image, unnatural colors, dark image, lack of detailed fireworks, fireworks not standing out, emotionless expressions, camera noise, filter overlay, real-world background, modern elements

Stable Image SD3 Ultraで再度画像を生成する

かわいい！！！！そして鮮やか＆華やかになりました！！！！

3度目の正直

プロンプトを再調整します。

「キャラクター花火を知らない人でもわかるように、キャラクター花火について説明して」とChatGPTに聞き、その説明文をプロンプトに含みました。

「知らない人でもわかるように説明して」と頼むのがポイントです。
こうすることで、参考画像が渡せなくともよりイメージに近い画像生成を実現できます。

修正版のポジティブプロンプトがこちら↓

An illustration of a girl with light yellow green hair, depicted as character fireworks launched at a fireworks festival, showing various charming expressions. The girl's features include green bob-cut hair, blue eyes, a cat ear headband, a yellow tie, a blue dress, and a pink backpack. The illustration focuses only on her head and neck, and she is represented as character fireworks in the night sky of the fireworks festival. The image should be emotional, as if taken with a film camera, with vivid character fireworks of the girl surrounded by many regular fireworks in the background. The character fireworks are depicted using only dots, designed to resemble fireworks and appear as part of the fireworks display.

（細かいですが、髪色の黄緑の英訳が「light green hair」となっていたところを「light yellow green hair」に変更してあります）

さて、どうでしょうか！？！？！

4度目の正直

キャラクター花火の説明を自分で考えて書き換えてみました。

ポジティブプロンプト
日本語
暗闇に、次に示す女の子の顔を複数の魅力的な表情でドットのみで花火のように表現した絵を描いて。女の子は黄緑色のボブカットヘア、青い目、猫耳のカチューシャをしています。鮮やかでエモーショナルな雰囲気を出しています。普通の花火も背景にたくさん描かれていて、フィルムカメラで撮ったようなエモい絵にしてください。

英語
Draw an illustration in the dark, depicting the face of a girl with light green bob-cut hair, blue eyes, and a cat ear headband, using only dots to represent her face like fireworks with various charming expressions. The atmosphere is vibrant and emotional. Surround the character fireworks with many regular fireworks in the background, creating an image that looks like it was taken with a film camera, giving an emotional feel.

確かに鮮やかでエモーショナル！！！

5度目の正直

もう一度、ChatGPTにリライトさせてみた。

ChatGPTへのプロンプト

こういう、ドット絵の花火としてキャラクターを描いてほしくて次のプロンプトでstable diffusionに命令しているんだけど、うまくいかない。修正して。
Draw an illustration in the dark, depicting the face of a girl with light green bob-cut hair, blue eyes, and a cat ear headband, using only dots to represent her face like fireworks with various charming expressions. The atmosphere is vibrant and emotional. Surround the character fireworks with many regular fireworks in the background, creating an image that looks like it was taken with a film camera, giving an emotional feel.

ChatGPTの出力

Draw an illustration set in a dark sky, depicting the face of a girl with light green bob-cut hair, blue eyes, and a cat ear headband. Use only dots to represent her face, similar to fireworks, showing various charming expressions. The atmosphere should be vibrant and emotional, with many regular fireworks in the background. The image should look like it was taken with a film camera, enhancing the emotional feel.

・・・これ、もしかして、前の生成結果引き継いでる・・・？？？（無知）

リロードして初期化し、6度目の挑戦

同じプロンプトでチャレンジ。

リロードは関係なかったかも（ ; ; ）

気に入ったイラストを応募してみたよ！

今日は #AICU のキャラクター「ちびデルタ」ちゃんをモチーフに #StableDiffusion #StableImageUltra で
夏をテーマに画像生成してみました！！
かわいい！！！

AICU Summer Art Contest 2024の「(1)キャラクター部門」に応募してみます！！#AICU_art pic.twitter.com/jTpOsuFrfx
— RUNO | 生成AI勉強中 (@RUNOs_s) July 24, 2024

おわりに

本筋の「画像を伝えずにキャラクターの特徴を言語化し、テキストプロンプトのみで画像生成をする」というところはうまく行きました。

ちびデルタちゃんモチーフのキャラクター花火（ドット絵で描かれた花火）というのが難易度高かったようです。

今日も楽しかったです！

最後までお読みいただき、ありがとうございました！！！

いいなと思ったら応援しよう！

この記事が参加している募集

#勉強記録

18,024件