Midjourney V6(alpha)のための新しいプロンプトエンジニアリング探求の準備 - Blog 2023/12/21

2023年12月26日 20:57

V6 (alpha) と V6（full release）をわける

Midjourney Model V6 (alpha) のAesthetic systemを探求する準備を開始しました。アルファバージョンの試行錯誤は、完全版（full release）で通用するノウハウにならない可能性がありますので、V6 (alpha) と V6（full release）を異なるバージョンとして扱っていきたいと思います。

以下の記事の続きです。

Midjourney V6とV5はまったく別のもの！大半のプロンプトは共有できない - Blog 2023/12/21

V6 (alpha) のAesthetic system

V5との比較は前回の「Midjourney V6とV5はまったく別のもの！大半のプロンプトは共有できない」をご覧ください。
今回は、V6 (alpha) の探求を実施します。
※V6 (alpha) と V6（full release）は別けます。V6 (alpha) は調整中のバージョンなので、アルファバージョンのノウハウが V6（full release）で通用しない可能性があるからです。

V5のプロンプト：

angry, film still, super detail, 2020s, a photorealistic Cool 70 year old man , maximal facial detail, shot on fujifilm XT4 --ar 16:9 --v 6.0

V6 (alpha) では、高画質表現を狙った「photorealisticや4K、8K、フィルム名やレンズ等」のプロンプトは不要になりましたので、V5のプロンプトを参考にする場合、全て省きます。

Midjourneyからのアナウンス（Discord ）

Style and prompting for V6

Prompting with V6 is significantly different than V5. You will need to 'relearn' how to prompt.
V6 is MUCH more sensitive to your prompt. Avoid 'junk' like "award winning, photorealistic, 4k, 8k"

V6 (alpha) では不要なワード、フレーズ：

super detail
photorealistic
maximal facial detail
fujifilm XT4

V6 (alpha) 用のプロンプト：

こんなに短くなります…

angry, film still, 2020s, Cool 70 year old man --ar 16:9 --v 6.0

以下は、最小限のプロンプトです。
これだと、空間周波数が高すぎるので調整する必要があります。

Cool 70 year old man --ar 16:9 --v 6.0

今回は、V5との比較はしませんが、前回の復習として、どれだけ異なるのか確認しておきます。同じプロンプトをV5.2で生成すると以下のようになります。

V5.2の方が、プロンプトで意図したイメージ（クールな70歳の男性）になっています。
V6 (alpha) は、単なる「写真」ですね…
プロンプトの共有は無理だということがわかると思います。

Midjourneyからのアナウンス（Discord ）

Style and prompting for V6

Be explicit about what you want. It may be less vibey but if you are explicit it's now MUCH better at understanding you.
If you want something more photographic / less opinionated / more literal you should probably default to using --style raw

V6 (alpha) では、より明確にイメージを伝えるように、と記載されていますので、「映画に登場するかっこいい70歳の男性」にしてみます。
まずは、テストプロンプト用に「film still」を追加。

film still, Cool 70 year old man --ar 16:9 --v 6.0

多少「クールなおじいさん」に近づきましたが、「older man with dark sunglasses」のように具体的に書かないとダメそうですね。

それでは、V6 (alpha) で「かっこいい70歳の男性」を表現するプロンプトを書いてみます。「--style raw」を付加して、Aesthetic systemを軽減させます（プロンプトに対する忠実度を高めます）。

film still, an older man with dark sunglasses and a beard, black background, light silver and silver, steelpunk, matte photo, bold fashion photography, city portraits, norwegian nature --ar 16:9 --style raw --v 6.0

かなり良い感じになりました。
V5.2だと、超リアルな3DCGキャラクターのようになりますが、V6はほんとに写真ですね…

別のパターンです。

film still, man with white beard standing on black background, in the style of chromepunk, solarizing master, wavy, matte photo --ar 16:9 --style raw --v 6.0

V6 (alpha) の表現力を引き出すことができれば、画像生成AIでは最先端のクリエイティビティと言えるかもしれません（今のところ…）。
特に、アート系の表現で実力を発揮しそうです。

男性の生成は難易度が低いのですが、若い女性がかなり難しい。
V5では、過度に美化されていたのですが、V6 (alpha) は写実的でNatural-lookです。

film still, blizzard, a city of silver in a snowy country, center view extreme close-up, super cute 24 year old Japanese young woman, beautiful long brown hair, street style realism --ar 1:1 --s 175 --style raw --no freckles --v 6.0

V5.2で生成すると、以下のようになります。

V6 (alpha) は、V5とは異なり写実的でNatural-look

同じプロンプトをベースにして男性に変更。

Front of Pose Collection, full body center view profile photography, film still, blizzard, a city of silver in a snowy country, super cool 24 year old Japanese young man, cool long brown hair, street style realism --ar 1:1 --style raw --no freckles --v 6.0

余談：

試しに、V6 (alpha) で生成した高品質な画像をRunway Gen-2でビデオ生成してみましたが、やはりビデオの品質も向上しますね。
ビデオ生成については「動画生成AIの可能性」でまとめています。

再生時間：20秒

V5のプロンプトはV6 (alpha)で流用しにくい

V5で肌のディテールを表現する場合、maximal facial detail のような強めのフレーズを使いますが、このプロンプトをV6 (alpha) で流用すると効きすぎてしまいます。

film still, close-up, super detail, maximal facial detail, 18 year old Japanese girl who is a super cute fashion model, she has short brown hair in the Y2K fashion, Y2K Aesthetic Worldview, party kei --ar 4:3 --v 6.0