DALLE3 (ダリスリー) の無料教科書:初級編

2023年11月24日 06:02

DALL-E3を実装日からほぼ毎日使用し面白さにとりつかれています。何となくで触っているため, いまいち思った画像が出来ないことも多々ありましたので今回まじめに作成法を勉強してみました。初級編と名前がついているのは高等テクニックを教えるほどの技術がないだけで後に上級編が控えているという意味ではないです。

0. はじめに DALL-E とは

シンプルなテキストのみで画像がつくれるAIです。

語源は『ウォーリー探せ』と芸術家の『ダリ』から来ているみたいです。ウィーリーはある種の「探し物」をする, userが提示するテキストのプロンプトから隠された要素やまだ見ぬ画像を「探し出し」生成することらしいです。

1. 問題点, 主に著作権

やはり何と言っても著作権問題ではないでしょうか。現在法整備が進行しているところです。OpenAIはコンテンツポリシーを守れば商用利用OKといっています。企業 (AIを開発した) が著作権大丈夫です, 著作権を侵害するような画像は生成しませんといったところでuser側は常に不安があります。

現在安全利用できる画像生成AIはAdobe Fire Flyだけと認識しています。

悪意がなくてもDALL-Eからでてきた画像が著作権で保護されている作品のほぼコピーなら, それを有料記事に載せてることも罪に問われるかもしれません。

(2. 内部の仕様)

このChapterは内部の仕様, 原理的なことを書いています。不要, 蛇足と感じましたら実践編 3. 基本的な使い方へリンクお願いします。

あまり実践的な話ではなく, 知っていたら画像生成に役に立つかもしれない情報を書いています。文章量のわりに得られるものが少ないかもしれません。

A. 時間がなく使い方を効率よく学びたい→3章基本的な使い方へ進む
B. DALLEの特性に興味がある→2章内部の仕様にとどまる

この章はこのようなかたにおすすめ

2.1 いわゆる通常モードとDALLEモードの違い

ChatGPTとの対話は他のAIとくらべて自然的に感じます。しかしDALLEを用いた画像生成のモード時にはいつも不自然感を感じていました。違和感の正体ついてChatGPT本人に聞いてみました。GPTは画像生成時にはいわゆる普通のGPTと異なるモードに切り替わっているみたいです。以下3点で説明します。

① 文脈の不所持
通常のChatGPT4との会話では新しい質問をしたとき, 直近の過去の複数の会話のやり取りをもとに新しい回答を生成します。

しかしDALLE (画像作成) モードではコンテクストが保持されずに, 基本的に過去のやりとりは反映されず, 直前の要求に対する画像生成がされます (コンテクストを考慮してくださいと言えば別ですが..)。

② ChatGPT4はDALLE3を認識していない

回答は基本的には事前学習 (2022年までの知識) をもとに生成されます, そのため技術の使用などに対する質問は主にDALL-E2に関するもので本人 (GPT) もダリ3の存在を認知しておらず, up dateでuserに答えるように指示されているだけと勝手に解釈しています。

③ GPTのアップデートは事前学習との整合性を保つのが困難

① コンテクスト (文脈) の不所持
ChatGPT通常モードはユーザーとの対話中に情報を追跡し, 対話の流れを維持する機能を持っています。一方で、ダリ3は画像生成のためのツールであり, 各生成要求は独立しており以前のリクエストのコンテクストを保持しません。
② ChatGPT4はDALLE3を認識していない
ChatGPT4のモデルは2022年1月の情報で事前学習 (ダリ2まで) されており、それ以降の具体的な更新やデータについては直接の知識がありません。したがってChatGPTはダリ3について具体的な知識を持っていないもののユーザーからの情報や指示に基づいてダリ3の機能に関するガイダンスを提供できます。
③ GPTのアップデートは事前学習との整合性を保つのが困難
アップデートで新しい情報やアルゴリズムの改善が組み込まれると回答の内容が変わることがあります。これは, より精度の高い情報提供を目指す過程で発生し, 一貫性を完全に保つことは困難なため、時には以前のバージョンのモデル (事前学習) と全く異なる回答が出ることがあります。

DALLEモードで対話したときに感じる違和感の正体

ずっと感じていた違和感が解消されました。あまりにも矛盾した回答が多かったのでGPTに『何人の人格がいるんですか?』と聞いたら『私はAIです』と当たり前のことを言われました。

2.2 内部プロンプト画像生成時に使われるプロンプト (画像生成プロンプト)

DALL-Eは非常に簡単なプロンプトでも直感的な操作で高品質な画像が生成できることが特徴です。これはGPT4が仲介してuserの作りたいものを『予想』してプロンプト作成を『代行』してくれているからです。

『美味しいキャットフード』という短い文からランダムな設定をGPT4が考え詳細な場面を提示してくれています。

設定をこちら側が提示しなければランダムなものが返ってきます。それではどのような設定を与えたら良いのでしょうか。ChatGPTと猫の見た目という抽象的な情報の画像に関するやり取りをする時, 顔の表情やポーズなど無数の属性, 要素の表現方法があります。

本チュートリアルではそれらの属性を以下のプロンプト構成要素として定義しています。本来なら画風と色彩, 視点など分けるべきで正しくはないのですが, AIとのやりとり, 説明, プロンプトのしやすさから画風（Artistic style）を独立したカテゴリとして設けています。この属性は作成する画像の種類によって大幅な変更が必要です。

構成要素
1. 主題（Subject）
2. 環境（Setting）
3. アクション（Action）
4. 感情や表情（Emotion/Expression）
5. 具体的な物体（Objects）
6. 画風（Artistic style）

画像生成プロンプトの構成要素 (簡易表示)

1. 主題（Subject）: 描かれる主な対象やキャラクター。例えば, 猫, キャットフード, ペットボウルなど。
2. 環境（Setting）: 描写されるシーンの背景や環境。例えば, 豪華なペットマット, 部屋の一角など。
3. アクション（Action）: 対象が行っている行動。例えば, 猫が食事を観察している様子。
4. 感情や表情（Emotion/Expression）: 描かれるキャラクターの感情や表情。例えば, 満足している猫, 食事を楽しみにしている様子。
5. 具体的な物体（Objects）: シーン内の具体的な物体。例えば, キャットニップのガーニッシュ, チキンのピース, サーモンパテ, 魚フレークなど。
6. 画風（Artistic style）：絵の描き方や表現方法の特徴です。別の章でも詳細に説明しますが詳細にプロンプトすることで自由な自分の描きたい絵の表現能が飛躍的に上がります。技法 (Technique), 素材 (Medium), 時代・流派 (Historical Period/Art Movement), 構成・視点 (Composition & Perspective) に分類して定義します。このキャットフードでは特に画風についてのプロンプトはありません。

画像生成プロンプトの構成要素

これらの構成要素を『適当』に設定してもらい再度出力します。

1. 主題（Subject）2. 環境（Setting）3. アクション（Action）
4. 感情や表情（Emotion/Expression）5. 具体的な物体（Objects）6. 画風（Artistic style）：を
適当に設定して　『おいしそうなキャットフード』を描いてください。

この画像が作られた時のプロンプトを見てみると各項目が簡単に1行ぐらいで設定されていました。

このプロンプトのSubject (主題) を猫→白黒の猫, Action (行動) に猫が飛び跳ねているという文を付け加えると, 他の設定はある程度保持されたまま変更が加わりました。

In a luxurious kitchen with marble countertops and stainless steel appliances, a fine porcelain dish is elegantly arranged with gourmet cat food made of fresh salmon, garnished with catnip, and a side of cream. A black and white cat is joyfully leaping towards the dish, embodying the anticipation of a gourmet meal. The scene should be hyper-realistic with a strong emphasis on the texture of the food and the shiny surfaces of the kitchen.

このように設定を追加したらそのような画像が出来上がりテンションがあがります。一見, 自由に作りたいものが作れているような錯覚に陥りますが現実は甘くなかったようです。

DALLEのプロンプトの文字数の上限はありますか。

DALL·Eの画像生成では文字数の明確な上限はありませんが, 長すぎるプロンプトはシステムによって情報が選択的に使われるため, 約100語 (500文字) が推奨されます。500文字を超えるプロンプトでは重要な情報が無視される可能性があるので核となる要素に絞る必要があります。

生成プロンプト文字数上限

プロンプトに文字数制限があり無限に設定を追加できるわけではないとのことです。ただここ言う500文字という文字数は私たち (user) が打ち込むプロンプトという意味ではなく, GPTが内部で作成している文字数です。

簡単なプロンプト (P1) から内部で詳細なプロンプト (P2) を作成してくれている

2.3 プロンプトをどこまで頑張って自分で書くか

DALL-E3の仕様上, 簡単なプロンプトで人間が美しいと感じる画像を出力するようになっています。

Design a cool yokai character in a 3D computer graphics style, inspired by the 'Hyakki Yagyō' or Night Parade of One Hundred Demons. The yokai should look powerful and enigmatic, featuring elements common in Japanese folklore such as oni masks, samurai armor, or ethereal robes. The texture and rendering should give the impression of a high-quality 3D model, with dramatic lighting that casts dynamic shadows and highlights the details of the yokai's fearsome appearance.

100単語 (500文字) 程度のプロンプトです。

しかもその画像の作成時に使われた画像生成プロンプトは再利用することで似たようなクオリティで再現性高く新しい画像を作ることができます。主な利用としては他の人が生成した画像を再利用, 自分で作った画像を他のChat欄で再生産するなどです。

Create a 3D computer graphics style image of a powerful and enigmatic yokai character inspired by the 'Hyakki Yagyō' or Night Parade of One Hundred Demons from Japanese folklore. The yokai should incorporate elements such as an oni mask, samurai armor, and ethereal robes. It should be textured and rendered to resemble a high-quality 3D model, with dramatic lighting to cast dynamic shadows and accentuate the intricate details of its fearsome appearance.

500文字程度の最大文字数なので新たに文字は追加されることはありませんが, 微妙に表現は変更されています。どれくらい変更されたか確認してみます。

ChatGPTの自己評価能力は高く, 点数, 数字でいろいろ評価させると役に立つときがあります。両者A, Bのプロンプトは言語 (英語, 日本語) としては異なりますが, 内部の処理ではどのように認識しているのでしょうか。

類似率計測君
これから提示するプロンプトの類似率を画像生成の観点から比較して下さい。
以下の項目で評価。これからAとBの2つプロンプト渡しますのでok のみ返してください。
また, これらのA, Bのプロンプトから生成される画像の類似率を100% maxで評価してください。
項目：画像生成プロンプトの構成要素
1. 主題（Subject）2. 環境（Setting）3. アクション（Action）4. 感情や表情（Emotion/Expression）5. 具体的な物体（Objects）6. 画風（Artistic style）

500文字くらいの詳細なプロンプトを提案すれば, それをもとにプロンプトを再生産されますが, そのプロンプトはGPT的には本質的に同じものと認識されており, 微妙なずれはDALLEが生成するときに生じるという認識でよさそうです。

さらに話はそれますが, 画風 (Artistic Style) を変更した場合でも生成プロンプトは一見大きく変わります。しかしこれは英語 (言語的に) での変更が加えられただけで本質的には変わっていないとのことでした。

プロンプトは一見大きく変更されたようにみえるが画風に関わる箇所しか変更されていない。

文章文字として大きな変化が加わったように見えていますが, プロンプトの本質的には『画風』以外は変更なしの結果でした。

500文字で詳細なプロンプトを一度作れば再利用すれば良いということは分かりました。簡単なプロンプトであれ①, 詳細に記述したプロンプト②から作成し最終的に高品質な画像が得られた場合であれ, 最終的にその画像を作成した時に使用したプロンプトCを再利用すれば高品質なものが作り続けることができます。

簡易プロンプトで開始した方が良いか, 詳細プロンプトが良いのか, 目的別に異なります。後のchaperで実際の例をもとに説明します。

DALLEは漠然と作りたいものがある場合は簡単なプロンプトで作り始める方法が良いと思われます。明確に想定する複雑なものを作りたい場合についてはChapter2.5で述べたいと思います。

2.4 内部プロンプト GPT/DALL-Eが自分で守っているルール

DALL-Eは世の中に存在する有名な画像はほぼ全てというくらい, 多くの画像を事前に学習しており『技術的には』それらを再現できると思われます。

しかし有名漫画など著作権侵害の恐れとなる画像 (著作物) を依頼すると, 『あえて』原作に似ているような似てないキャラを作ってくれます。

これはDALLEの内部に存在するいくつかのルールが存在しており事前にuserが不利益にならないような配慮の一つです。他にも日本語でお願いしたら自動的に内部で英訳翻訳してくれておりuser側がわざわざ英語でプロンプトしなくても良いなど至れり尽くせりの仕様です。

しかし, 著作権が切れた作品 やそもそも著作権がないもの (パブリックドメイン) でもオリジナルと全く同じものは生成しないことになっています。

パブリックドメインのおすすめ教えてください。

1. 歴史的絵画や芸術作品: レオナルド・ダ・ヴィンチの「モナ・リザ」やミケランジェロの「システィーナ礼拝堂の天井画」など, 著名な古典美術作品はインスピレーションの源として素晴らしいです。
2. 古典文学のイラスト: チャールズ・ディケンズ, ジェーン・オースティン、マーク・トウェインなどの古典的な作家の作品に付随するイラストや版画もパブリックドメインにあります。
3. 古地図と天文学的図: 古い時代の地図や天文図は, 独特なデザインと情報の豊かさで, 視覚的に魅力的な画像生成の素材になります。
4. 自然史の図版: 植物学, 動物学, 鉱物学などの古い科学図版は, 細部にわたる描写が特徴で, 生物学的またはエコロジカルなテーマの画像に使用できます。
5. 神話や民話の表現: 世界各地の神話や民話に登場するキャラクターやシーンは, 多くがパブリックドメインになっており, ファンタジックな画像を生成する際の参考になります。
歴史的写真: 有名な歴史的出来事や人物の古い写真もパブリックドメインで利用可能です。これらはリアリスティックな歴史的シーンやポートレートを生成するのに役立ちます。
古典的建築の写真や図面: 古代ローマの遺跡, ゴシック様式の教会, ルネサンス建築など, 過去の建築様式を示す資料は、建築的な画像を生成するのに適しています。

パブリックドメインの例

パブリックドメインの画像：左ランダム中：アルプスを越えるナポレオン右：ゴシック様式の教会

著作権にも触れず, クオリティも高いこれらの世界『遺産』は画像生成AIの活用にお勧めです。

2.5 DALL-E3の限界どこまで描きたいものを再現できるか

現在のDALLEでは自分 (user) が考えている非常に高度な画像を再現することは困難と考えています。複雑な画像を作成する場合, 一般的な発想として, まずはシンプルなものを作成しそれを徐々に修正していくスタイルは有効です。

しかし, 一般的な画像編集ソフトとは異なり, 人間が簡単と考えている, 修正, 例えば, ほんの少しでさえ, オブジェクト (物体) の追加, ポーズの変形, 色の変更を行うことはDALLEには困難です。

DALLEの仕様として, 〇〇の画像に変更を加えてくださいという命令は通らなく, 例えば『この妖怪の足を4本に』してくださいと言っても, 実際, 『この妖怪』を参照しているわけでなく, 『新たな妖怪』を一から作りそれを4本足にしているだけです。

2.6 GPT4V (ision) が便利な局面

最近のアップデートで画像の直接認識が可能になりました。近い将来img2img (具体的な画像をAIに見せて似たような画像を作る) に近いことができるかも分かりません。

① 初めて作る画像の全体的な解析：
つくたことのないタイプの画像で同指示して良いか全くわからないとき。
② 複雑な構成を解析：
画像 (画面) の構成が複雑で言葉でうまく伝えにくい場合。
③ 特殊な画風を解析
一つの画風で定義されないような複雑な画風を作成したいとき

DALLEによる画像生成でGPT4Vが便利な局面

個人的に便利な使い方は③ 特殊な画風を解析で一般的に知られていないマイナーな技法を画像を見せて作成してもらうというものです。②の複雑な構成の画像はお互い (userとChatGPT) 作りたいもののイメージが共有できたとしても, 技術的な問題, 著作権的な問題で作れないことが多いです。

GTP4Vの全体的な所感としては利用することで, 精度が少しあがります。しかし劇的に上がるという訳ではなく, 地道に自分でプロンプトを練っても全然問題ないレベルです。利用した方が少し便利かなといったところです。コツとしては『見たままを再現して下さい。』です。

ここからが本編です

3. 本編基本的な使い方

3.1 簡単操作で作成

3.2 短文で作成 (ダリガチャ DALL-E-Gacha)

DALLE Gachaと前に名前入れて　ガチャマシーン　2000年代のガチャガチャマシーン

『キャットフードの絵下さい。』『モナリザをアニメ風に。』『〇〇のロゴマークつくって。』などChat欄に短文を入れるだけで凝った画像を作ってくれます。直感的な操作で気軽に作れるので基本はこの使い方になると思います。

オリジナルの妖怪キャラを作ってもらいましたが怖そうなキャラでした。

画像をクリックし大画面を開きます。

この妖怪が作られた実際のプロンプトをみることができます。また, 似たような画像を作りたいとき, 他の人に共有したいときはプロンプトをコピーします。なんて書かれているか自動翻訳サイトで和訳しました。

妖怪の詳細な設定をランダムでChatGPT4が詳しく作ってくれています。

さすがに気持ち悪いキャラで以降の記事を共にするのは困難なので作りなおします。

和風の妖怪キャラ オリジナル かわいめ　で作り直してください。

どこかでみたようなキャラですがまあいいかということで採用します。GPTが作った画像生成プロンプトは下 (↓) です。

A cute, original yokai character in a Japanese style, with soft features and a charming smile. This friendly creature has a small, round body with a fluffy texture, reminiscent of a cuddly toy. Its big, expressive eyes and a tiny mouth give it an endearing look. The yokai has delicate, small wings and a playful, bushy tail. It carries a tiny, blooming cherry blossom branch, symbolizing its gentle nature. The background is a serene Japanese garden with a koi pond and stone lanterns at sunset.

このプロンプトを違うChat欄セッションでコピペして作成してみます。

基本的には似ていますが, 異なる2体の可愛いめの和風妖怪ができました。全く同じ画像を再生産することは難しそうです。気軽に画像生成が出来て, 自分の欲しい画像がでるまで何回も生成を繰り返す使い方を良くすると思われますので, 様式から『ガチャ』という表現は合っているかもしれません。

下の図は西遊記のシーンで材質などを変えて何回か適当に作っていたらたまたま欲しいものが出来ました。

A highly detailed doll of a giant ape creature, inspired by Sun Wukong, in a chaotic scene. The setting is Mount Huaguo with a lush, mythical environment. The ape is mid-rampage, displaying a fierce expression and dynamic pose, suggesting movement and power. Around the ape are miniature figures of deities in a state of alarm, scattered celestial weapons, and elements of a broken seal, referencing the lore of Sun Wukong being punished by Buddha for his rebellion. The aesthetic is fantastical, with attention to the textures of the fur, clothing, and surroundings, reflecting the turmoil of Sun Wukong’s birth and the havoc in the Heavenly Palace.

3.1.2 長文プロンプトで作成

こつは過去のうまくいったプロンプトを引用することです。

インターネット上に公開されている良質な作品や自分が過去に作ったものでうまくいったものを再利用して良いものをつくります。先ほどの大ザルの大当たりを龍バージョンでも作ってみます。

簡単に似たものが作れました。以降のChapterではこれらの初めの大当たりのガチャにあたる, 自分が考えている欲しい画像をどのようにして作るかについて, コツみたいなものを解説していきます。

3.2 画風 (Artistic style)

いろいろ作りたいけど, 何をつくったらいいのかわからない, そもそもどんなものが作れるのか分からない人は多いと思います。自分もさっきまでそうでした。美術, 画像作品の分類法についてChatGPTに聞いてみます。

技法, 素材など美術作品を決定する画風 (Artistic style) についてまとめてください。

画風（Artistic style）
1. 技法 (Technique) ：アーティストが使用する具体的な方法やプロセス。例えば水彩, 油絵, 鉛筆画, デジタルペイント, 版画など。
2. 素材 (Medium) : 作品を作るのに用いられる物質や材料。例えばキャンバス上の油絵,紙上のスケッチ, デジタルメディア, 石彫, 木彫など。
3. 時代・流派 (Historical Period/Art Movement)：作品が生まれた歴史的, 文化的背景とそれが属する芸術運動や流派。例えば印象派, 抽象的, 超現実的, ルネサンス, バロックなど。一応, アニメスタイル, フォトリアリズムなども無理やりこれに分類します。
4. 構成・視点 (Composition & Perspective): 要素の配置や視点, アングルによる作品の表現。例えば, 対称的構成, 透視法, 特定の視点からの描写など。

画風の分類と例

DALLE再作成
画風（Artistic style）1. 技法 (Technique) 2. 素材 (Medium) 3. 時代・流派 (Historical Period/Art Movement)4. 構成・視点 (Composition & Perspective):

3.2.1 技法 (Technique)

美術スタイルの大きな分類という認識です。鉛筆で白黒の落書きを書くと言ったシンプルなものから, 絵具や塗料で大掛かりなのなものまで。

先ほどの妖怪を様々な技法で作りなおしてみます。

同じChat欄で左の画像 (または1番), 右の画像 (2番)で作り直すようにお願いします。しかし, その後も同じセッションで会話を続けていくと, 再指定し作り直しを要求するのは難しいです。

この問題はいずれ, 画像IDの指定 (Gen値：generation IDやseed値) が実装されると思いますので気長に待ちます。という訳で今のところは直後に修正するか, 新たにプロンプトを打ち直すしかなさそうです。別のChat欄で『プロンプト』+『技法』で指示します。

以下のサンプルは同じ大分前に知識がほとんどない時代に何も考えずに同じChat欄で作り続けてできた偶然の産物です。しかも前のバージョンのDALLEで作ったものなので新規のChat欄でどのようにお願いしたら出来るかわかりません…

偶然の産物：

デジタルペイント：

Create an image of the original yokai character in the style of a digital painting. The artwork should reflect the crisp and precise nature of digital media, with smooth gradients, sharp lines, and vibrant colors. The friendly yokai should still have its round, fluffy body, large eyes, small wings, and a bushy tail. In its hands, it holds a cherry blossom branch. The background should be a digital rendition of a Japanese garden at sunset, capturing the serene atmosphere with a koi pond and stone lanterns, all with the polished finish typical of digital paintings.

個人的に良いと思った絵は同じChat欄で作り続けていたのでプロンプトの文脈が維持されていた可能性はあると思います。

指定する技法 (Technique) を変えて続けていろいろつくりました。

ガッシュ：

Create a gouache painting of a whimsical yokai character with a round, plush-like body, large, expressive eyes, and a small mouth, giving it an endearing appearance. The character has small, delicate wings and a bushy tail, holding a cherry blossom branch. The setting is a tranquil Japanese garden with a koi pond and stone lanterns during sunset, portrayed with the rich, opaque colors and flat textures characteristic of gouache.

シルクスクリーン：

Craft a silkscreen print design of a charming yokai character with a small, round, fluffy body, large, expressive eyes, and a small mouth. This character has subtle wings and a bushy tail, holding a blooming cherry blossom branch. The background is a traditional Japanese garden with a koi pond and stone lanterns, reflecting the layered, vibrant colors and bold patterns typical of silkscreen prints, set against a sunset scene.

LINEスタンプ：

A cute, original yokai character suitable for a LINE sticker, in a Japanese style, with soft features and a charming smile. This friendly creature has a small, round body with a fluffy texture, reminiscent of a cuddly toy. It has big, expressive eyes, a tiny mouth, delicate, small wings, and a playful, bushy tail. It is holding a tiny, blooming cherry blossom branch. The background is a simple, stylized version of a serene Japanese garden with a koi pond and stone lanterns at sunset, ideal for a LINE sticker format.

ゴム人形だけはどのような条件をいれても全く作ってくれませんでした。恐らく著作物 (国内だけでもガチャポン商品等いろいろありますので) が多いせいかと思われます。

プラスチック：

Create a plastic model of an enchanting yokai character with a small, round, fluffy body that resembles a soft toy. It should have large, lively eyes and a small mouth, expressing friendliness. The character includes dainty wings and a frisky, bushy tail, holding a cherry blossom branch. The setting is a peaceful Japanese garden with a koi pond and stone lanterns, crafted from plastic to capture the smooth, molded look of plastic models, illuminated by the gentle light of the setting sun.

このほかにも異国の文化や最近の工業製品を表すような技法はいろいろあります。サブカテゴリーをさらに詳細に指定したら無数の技法が挙げられます。例えばゲームデザインで使われるグラフィックも探せばいろいろでてきます。

同じ技法でも素材が変わればその素材にあわした画風が大幅に変化します。

変更後：有名作品, 抽象画, 漠然としたものの作成の時は画風を詳細に伝える必要はありません。例えば『レオナルドダヴィンチ最後の晩餐を描いて下さい』と伝えるとDALLEが自動で最適な画風 ( 技法, 塗料など含め) を設定します。しかし, 画風が特殊 (比較的新しい画風など) である, 一般的では無い『特定の画像』を作成したい場合は『画風』を設定する必要があります。

ダビンチを作るときにいちいち画風伝えなくても理解していますよね。

はい, 正確に理解されています。DALL·EのようなAI画像生成ツールにおいては一般的に知られている有名な絵画や明確に定義されたオブジェクトを生成する際には詳細な指定をしなくても適切な結果が得られることが多いです。たとえば「最後の晩餐」のような非常に有名な作品をリクエストするときには, その作品の名前だけでAIは歴史的なコンテキスト, 画風, 技法などを把握してそれに基づいた画像を生成する能力があります。
しかし特定のスタイルやあまり知られていないアートピース, または独自の創造物を生成したい場合にはより具体的な指示が必要になります。このときには、希望する技法, 素材, 時代や流派, 色調, ムード, 構成などの詳細な情報をプロンプトに含めることで期待に合った結果が得られる可能性が高まります。
また、マニアックな要素や細かなニュアンスを反映させたい場合もそれらを明確に伝えることが重要です。AIは具体的な指示に基づいて画像を生成するためプロンプトの詳細性が結果の質を大きく左右します。

画風のプロンプトが必要になる場面

後に作成方法を紹介しますが, 蓄光の妖怪プラモデルをどうしても作りたく, 画風のプロンプトを何回も練り直しました。暗闇でほんのり光るくらいの表現をAIに伝えるのが難しくプロンプトの大半が『画風』に関するものです。

日本妖怪神話上ほのかに光る蓄光プラモ：

Create an image of a mythical creature figurine made of glow-in-the-dark plastic, inspired by Japanese Yokai folklore. The figurine is not glowing strongly but has a faint luminescence that highlights its supernatural form and intricate details. The plastic's color in natural light should be a pale green, suggesting its glow-in-the-dark capabilities, with ambient lighting just enough to keep it mostly subdued. The Yokai should be depicted in a tranquil pose, reflecting its enigmatic nature. The background should be dusky and mysterious, providing a sense of the eerie setting where the creature's luminescence could shine if the lights were dimmed, but for now, it remains in a whisper of its potential glow.

3.2.2 素材 (Medium)

どのような材質, 素材で表現するかです。一般的にはインク, 塗料がイメージにわくと思いますが, 石や砂で『絵』を表現することもあります。厳密に1. 技法と区別することは難しく, 重複する部分もあります (例えば技法彫刻金属と素材の金属は似たものが出力されがちです)。

美術品で用いられる素材を出来るだけ多く挙げて下さい。

① 顔料
顔料を変えることで, 発色の鮮明さや色の深みが変わり, 光沢やマットな質感にも影響を与え, 透過性の変化によって絵の明るさや透明感が変わります。これらの要素が組み合わさって, 絵の全体的な視覚効果を決定し作品の印象を大きく左右します。

ナポレオンの戴冠式を水彩絵で描いてください。

パブリックドメインとのことですので『ナポレオンの戴冠式』で比較しています。オリジナルは油絵ですが重厚で濃密な発色, 一方, 水彩画は透明感, テンペラはマットで鮮やかが特徴的です。

② 描画材料
線の太さの制御, 質感の制御が作品の質を大きく決定します。例えば鉛筆では細かい線で緻密な制御が可能, 質感滑らかで濃淡を生み出すことが可能。一方, チョークは太い線で柔らかな質感で表現します (下のチョーク絵は典型的なものではないですが…)。

作ってみて元も子もありませんが『油絵』が一番綺麗に感じます。作者が油絵を選んだ時点で, そう考えると当たりまえかもしれません。

③ 彫刻 (石, 金属など)

次はミケランジェロ (1513-1515年) のモーゼ像 (オリジナルは大理石) をいろいろな素材でつくってみます。

Illustrate the statue of Moses by Michelangelo, focusing on its intricate details and the powerful expression of the biblical figure. The sculpture should be depicted as if carved from a block of pristine white marble, reflecting the Renaissance style. Moses should be seated, with the Tablets of the Law under his arm, his beard flowing, and his eyes gazing into the distance, conveying a sense of wisdom and authority. The image should capture the fine chiseling work of the hair, beard, and the drapery of Moses's robe, as well as the muscular anatomy that Michelangelo was renowned for. The light should be used to enhance the three-dimensional form of the sculpture, highlighting the texture and the contours of the marble.

過去の彫刻は耐久性, 保存性を考え大理石が選ばれることが多いと思います。しかし, 画像生成AIでは耐久性は考慮しないで良いので見た目重視でガラスなど他の素材で遊ぶのもいいかもしれません。

④ フィギュア (樹脂)

いろいろな制約がありゴムが使えないため, ゴム以外でフィギュアを作るとなるとプラスチックやシリコンなどが候補に挙がると思われます。

A small, unpainted dinosaur figure made of acrylic plastic. The figure should be detailed and realistic, capturing the textures and features of a dinosaur in a miniature form. The acrylic material should give the figure a slightly translucent appearance, emphasizing its raw, unfinished look. The setting should be a plain white background to highlight the details of the figure.

プラスチックの素材を変えてつくりました。

3.2.3 時代・流派 (Historical Period/Art Movement)

3.2.3.1 基本となる時代・流派
絵画, 風景画に関してはプロンプトの初めに以下の流派を入力すればそれなりのものが完成します。特に流派を指示しなければ設定に合った流派が恐らくDALLE側で自動設定されています。

時代・流派は画像の全体的な印象を決定します。あらかじめ時代, 流派を決めておき詳細なイメージや具体的な指示を加えることで目的の画像に近くなると思います。『ナポレオンの戴冠式』を様々な時代, 流派で作成してみます。ちなみにオリジナルはネオクラチズム (Neoclassicism)とのことです。

ナポレオンの戴冠式を以下の流派・時代で描きなおしてください。
印象派 (Impressionism)

バロックはオリジナルと比べ陰影, 動的な表現, また, ルネサンスは遠近法を活用し立体感を出しているとのことです。GPTによると人物画, 風景画など, いわゆる, 絵画と相性がいいのはルネサンス (Renaissance), バロック (Baroque), ロマン主義 (Romanticism) とのことです。以下の記事から他にもあまり知られていない流派で個人的におすすめのものをピックアップさせていただきました。

その他GPTおすすめのマイナー流派です。

3.2.3.2 追加表現 (Additional Expression)
上のリストからお気に入りの流派を選び, 例えば『レオナルドダビンチの最後の晩餐をロマン主義で作成してください。』とお願いするだけで十分に綺麗な絵画を作成してくれますが, さらに『追加表現』を表すワードを加えると個性的な作品ができます。

GPTおすすめの**追加表現 (Additional Expressions)**

以下に派生作品をいくつか紹介します。

3.2.3.1 風景画 (Landscape), 歴史画 (History Painting) からの派生

まずは『バロック様式』に固定して, いくつかの追加表現 (Additional Expressions) を加えていきます。

ナポレオンの戴冠式バロック様式：

Baroque style painting of Napoleon's coronation ceremony. The artwork features elaborate, ornate details typical of the Baroque era, with dramatic lighting and rich colors. Napoleon is depicted in regal attire, being crowned in a grand, opulent setting with luxurious fabrics, intricate architecture, and a host of dignitaries and onlookers. The scene is filled with movement and energy, capturing the grandeur and majesty of the event.

個人的にバロックと組み合わせが良かった追加表現です。

ナポレオンの戴冠式パステルゴス：

A Pastel Goth reinterpretation of Jacques-Louis David's painting 'The Coronation of Napoleon'. The artwork should combine the dark and gothic elements of the original painting with pastel colors and a whimsical, modern twist. The reinterpretation should focus on blending the solemnity of the coronation scene with the playful and colorful aesthetics of Pastel Goth, using light pastel tones, and gothic symbols in a creative and artistic manner.

パステルゴスには幻想的な追加表現を加えました。
下の絵はバロック様式+虹色の光彩 (Iridescent Brilliance) で何回か作成していたらたまたまできたものです。

A Baroque style painting of Napoleon's coronation ceremony, enhanced with iridescent brilliance. The artwork is rich in elaborate, ornate details typical of the Baroque era, featuring dramatic lighting and vibrant colors. Napoleon is depicted in regal attire, being crowned in a grand, opulent setting adorned with luxurious fabrics and intricate architecture. The scene includes a host of dignitaries and onlookers, filled with movement and energy, capturing the grandeur and majesty of the event. The added iridescent brilliance brings a unique, luminous quality to the painting, enhancing its visual impact.

全体に虹色がかかるように指示されたつもりがたまたま一部にしか適応されず, かえっていい感じの表現になりました。偶然の産物で再現できないので個人的にArtifact/Oopart (s) と呼んでいます。

3.2.3.2 人物画 (Portrait) からの派生

『実写』からの派生で人物画にGPT的なお勧めの表現とイケメンを加えました。

3.2.3.3 実写 (live-action), CG3Dモデリング (3D modeling) 風景画からの派生

実写で作成すると自動的にCG3Dmodelingに近い絵が作成されます。これを追加表現で派生させてみます。

A realistic depiction of a cave exploration scene inspired by the 'Kawaguchi Hiro Expedition Team'. The image should mimic a real-life photography style, capturing the raw and natural beauty of a cave exploration. The explorers, dressed in appropriate gear, are navigating through a stunning cave environment, with realistic rock formations, underground streams, and intricate stalactites and stalagmites. The lighting is natural, casting dynamic shadows and highlighting the textures of the cave.

3.2.4 構成・視点 (Composition & Perspective)

絵画, 写真のオブジェクトの配置, 作者 (撮影者) からの視点を変えることによっても見え方は大幅に変わります。

以下のサイトが詳しくてためになります。

4. 応用

GPT4V (ision) の補助利用

画像をGPTに直接見せてそれに関するアドバイスをもらったり, 用途はいろいろあります。

再現能力は非常に高く, 見せた画像を『見てそのまま作成してください』と頼むと画風を維持したまま微妙に異なる絵を作成できます。

しかしシンプルな画像であっても, 著作物を多く含むようなジャンル (ロゴ, アイコンなど) の画像の場合, 敢えて似ていない画像を作成する傾向があります。

左2つは自作アイコン著作物に似ている可能性がある。右二つは『著作権を回避して作成』したアイコン

4.1 なんとなく作りたいものがある

DALLEの特性上, 狙ったものを正確に再現することは難しく, たまたまできたものを修正したり, 組み合わせたりして良いものを作るのが効率的です。簡単なプロンプトから開始し, 気に入った絵に追加表現などで画像に変化をくわえます。その他, GPT4Vの利用も時には便利です。GPT4Vでは2枚の画像の組み合わせも可能です。例えば1枚は侍の絵を用意します。

侍水墨画ベースにアクリルとかいろいろ

A knight in profile, wearing historical Japanese armor that shines with vivid white and intricate gold detailing. The armor has a smooth, glossy finish achieved with soft acrylic gradients to simulate texture. The illustration echoes traditional Japanese watercolor techniques, aiming for a delicate and emotive ambiance. The knight embodies nobility and strength, posed in battle readiness, yet with a calm and composed expression. The background is minimalistic to keep the focus on the knight.

もう1枚はゲームタイトルの画像を用意します。

これら2枚の画像を組み合わせゲームタイトルを作ります。

最終的にはレトロゲームタイトルにもしてもらいました。このように難しい操作なく連続して良い感じにいろいろ画風変更してくれます。

4.2 作りたいものが明確にある

2章で解説しましたがDALLE単独で特定の複雑な画像, 全く同じの目的のものを作ることは困難です。しかし全く同じとまでいかなくてもある程度似ている, ある程度こういうものが作りたいとなればこれから紹介するコツで作りたいものに近づくかもしれません。個人的に昭和レトロの絵が作りたくてDALLEを始めました。今回は昭和を彷彿する画風, アイテムを重点的に作成してみます。

4.2.1 画風 (Artistic Style) が特殊

4.2.1.1 蓄光プラモデル

暗闇で緑色に光るプラモデル, いやゆる蓄光の再現です。

プラモ+蓄光でお願いしても欲しいものが全くできず。『全体が光る』と指示しても一部しか光ってくれませんでした。

もしかしたらそもそも事前学習に蓄光プラモがないのかもと思った時期もありました。GPT4V (ision)で目的とする画像をアップロードし直接意見をもらいました。

何度も対話を重ね画風を詳細に記述するようにお願いしました。

一度うまくいけば同じ条件で他のキャラクターを量産できます。

ほんのり光る蓄光の画風：

The figurine is not glowing strongly but has a faint luminescence that highlights its supernatural form and intricate details. The plastic's color in natural light should be a pale green, suggesting its glow-in-the-dark capabilities, with ambient lighting just enough to keep it mostly subdued. The background should be dusky and mysterious, providing a sense of the eerie setting where the creature's luminescence could shine if the lights were dimmed, but for now, it remains in a whisper of its potential glow

モーゼ像蓄光：

Depict Michelangelo's Moses statue as a glow-in-the-dark plastic figurine. The statue should capture the recognizable features of the original sculpture: Moses seated, with the Tablets of the Law, his beard and hair detailed, in a contemplative pose. The plastic material should be pale green, indicative of its glow-in-the-dark nature. The figurine should emit a faint luminescence, subtly outlining its supernatural form and highlighting its intricate details. The ambient light should be dim, allowing for the glow effect to be visible while maintaining a mostly subdued and mystical atmosphere.

4.2.1.2 ゴム人形　リスペクト

3.2.1でも述べましたがゴム人形は多く著作物があるため直接, 『ゴム人形』+『特定のワード』でお願いしても画像生成できません。いわゆるゴム人形ではなくそれに準じたものを目標にします。GPT4Vなどと対話して, ゴムに変わる良い感じの素材を提案してもらいました。

素材：
matte-finished plastic (マット感のあるプラスチック)
PVC (ビニル)
シリコン
色：beige (ベージュ)

ゴムに質感が似ている素材

いくつか作りましたが　マットな質感のプラスチックがゴム人形に近いと思われます。

マットプラスチック RPG キャラクター：

Create a plastic figure that represents an RPG character, drawing inspiration from a Japanese yokai. The character should have a toy-like charm with a matte finish, no gloss, and rendered in a beige color. The figure should have intricate details and textures, stylized to look softer and more suitable for a toy. It should capture the traditional essence of a yokai but also fit the heroic or fantastical theme of an RPG character. Place this figure against a neutral background to highlight its details without any distractions.

4.2.2 特殊なレイアウトのゲーム画面

ゲーム画面など特殊な作画構成の場合, AIに構成を正確に伝えるのは通常困難です。以下は古典的なゲームのジャンルの一つでテキストアドベンチャーと呼ばれるものです。

構成としてはシーンを表す一枚絵, 主人公がこれからとるべき行動と現在の状況, 進行を表す白黒のテキスト。『8bit ゲームでRPG, アクションゲーム, スポーツゲームなど』お願いしたらある程度正確に作ってくれますが, このテキストアドベンチャーは再現性が低いです。

GPT4Vに画像を渡して学習してもらいます。

画面の構成, スフィンクスの絵が画面左, テキストエリアが右と下に配置されているなど解析されました。この時点で『あなたが見たままで作成してください。』とお願いします。

先ほどよりは心持ち本家に近づいた気がします。

Intellivision　ピラミッドとスフィンクス：

A pixel art representation of the Great Pyramid of Giza and the Sphinx, done in the style of Intellivision game graphics. The artwork should reflect the pixelated visuals typical of early 1980s video games, with a focus on the iconic shapes and forms of the Egyptian landmarks. The pyramid should be easily recognizable with its geometric form, and the Sphinx should be depicted with its human head and lion's body, all within a simplistic yet evocative 8-bit art style.

4.2.3 異なる画風が共存している (→分割して作成)　要画像編集ソフト (Photoshop)

キャラクター (おまけ) シールの再現はいろいろな問題から一番難度の高い素材の1回で下のようなシールをDALLE単独で作成することはできず, キャラの部分と背景の部分をDALLEで別々に作成しました。

まずはキャラクター部分です。ビックリマンなどの1980年代のキャラクターシールの画像をGPTVで読ませて『見たままで作成してください。』とお願いすると…

とんでもないおまけシールが出てきました。当たってもうれしくない…GPTのセンスに任せると基本, 外国人が好きな? (日本人が好まない) ものが出てきます。

GPT4に見えたままで作らせると危険なことがわかりました。こちらでプロンプトを作成したほうがよさそうです。有名デザインをGPTVで読ませ学習してもらいます。

上に書かれていることを要約しプロンプトとして使ってみました。

アクリル, ガッシュエアブラシ　カートゥーン：

An image featuring a cartoon-style character. The character should have a vibrant and dynamic appearance, with smooth gradients suggesting the use of airbrushing techniques. The colors should be vivid and flat, as if using acrylic paints or gouache, with sharp, clear lines to define the character's features and costume. The background should be minimal, letting the character stand out prominently, evoking the style of vintage Japanese print media.

はじめよりは原作に近づきましたが再現度は著しく低いので諦めました。というわけで, とてつもなく難しそうですので, むりやり過去のうまくいった妖怪キャラと今回新しく作った背景をPhotoshopで無理やり合成しました。背景作成時にもちいたプロンプトは以下です。

『以下で背景のみください：The background should be a shimmering holographic pattern that changes colors like a spectrum, giving the impression of a dynamic and enchanting scene』のプロンプトで何回か実行

A background featuring a shimmering holographic pattern that changes colors across the spectrum, creating a dynamic and enchanting scene. The pattern should give the impression of light refracting in various hues, similar to a kaleidoscope, with a magical and captivating visual effect.

作れなくもないですが, 制作過程が大変でPhotoshopを使っているので没とします。

5. 目的別作成例

5.0 DALLEはどんな画像作成を得意としているか

これまで紹介した基本的なコツをもとに実際作りたいジャンル別に最適化していきたいと思います。すでに3万文字を超えているともありますので今回は目的別に何個か選んで画像生成を実践したいと思います。画像のジャンルは膨大です。以下の記事を参考にしました。

これまでのChapterでは画風, 流派やGPTVなどを利用し様々なタイプの画像を作成しました。何が作れて何が作れないのかGPT自身に聞いてみました。

あなたの得意, 不得意分野を100点満点でカテゴリー別に解説してください。

こちらに主観を一部とりいれ下の表にしました。

著作物を想起しやすいものは低い点数となっており, こちらが作成を要求した場合, contents policyに違反するため作成不可, または, あえて全く違う画風, 低クオリティに見える画像が返ってきます。まずは簡単なものから順番に作っていきます。

5.1 本の表紙

80点を超えるカテゴリは基本的に簡単なプロンプトで高品質なものを作成してくれます。

本の表紙作成してください　タイトルロゴはDALLEの教科書を英語で描いて下さい。

fantasy は特に得意ということなので"dragon fantasy 本表紙"だけのプロンプトでそれなりの表紙を作ってくれます。ついでに背景もファンタジーに…要る?

タイトル入れ忘れましたのでもう一度。

A fantasy-themed book cover titled 'Dragon Fantasy' with a deep blue background. The cover should feature a majestic dragon in a mythical setting, embodying the essence of fantasy. The title 'Dragon Fantasy' should be prominently displayed in bold, white, elegant font at the top or center of the cover. The dragon could be depicted flying through a mystical sky or perched atop a magical cliff. The deep blue background should add to the magical and mysterious atmosphere, making the white title stand out.

さすがに得意というだけあってタイトルロゴも素晴らしいですね。

先日, ブロードウェイミュージカル (アーサー：坂本昌行, ランスロット：桐山照史) (CAMELOT©) なるものを鑑賞してきました。

A Baroque-style depiction of a live-action stage performance of 'Camelot' from the audience's perspective, focusing on the Knights of the Round Table scene with spotlight effects. The stage is designed like a medieval hall with a Baroque flair, featuring an ornate round table at the center. Knights in period costumes, enhanced with Baroque artistic elements, are seated around the table. A dramatic spotlight focuses on a central knight, highlighting their significance. The audience in the foreground, silhouetted against bright stage lights, watches intently. The overall lighting and composition create a captivating Baroque atmosphere on stage.

購入したCAMELOTのパンフレットをインスパイヤして本の表紙を作ってみました。

GPT曰く CAMELOTは良く知っている, しかもパブリックドメイン (著作権フリー) なので作りやすいとのことでした。

5.2 映画のワンシーン, ポスター, パンフレット

このカテゴリー (70点を超える) は実写をもとにしたデジタルアート, 3D CG　modelingで描かれることが多く得意とするところです。著作物ともバッティングすることがありますが比較的高品質なものが作られます。

しかし, 特定の作品名がダイレクト分かるようなシーンは扱いに注意が必要でインスパイヤ作品として作成をお願いしています。

SF映画 E.T. (©ユニバーサル・ピクチャーズ) 月を背景に自転車を乗って夜空を駆け抜けているシーンみたいなダイレクトに映画につながるシーンは作成不可でした。

GPT曰く, そのシーンはもちろん良く知っており, 敢えて違う画像を作成したとのことです。

次に映画のパンフレットを作ってみます。まずは『ミステリー映画』とだけプロンプト。ありそうななさそな微妙なパンフレットができました。

『DAVINCI』というwordを与えるとそれを連想する画風が選ばれイメージに近づきました。

Create the cover for a movie pamphlet for a mystery film titled "The DaVinci Mystery". The cover should evoke a sense of intrigue and enigma, featuring elements associated with DaVinci's work, like cryptic sketches and ancient scripts. The title is displayed prominently in an elegant, mysterious font. The background is a blend of Renaissance art elements and modern mystery motifs, creating a bridge between past and present. The color scheme is a mix of deep, rich tones, with hints of gold to reflect the historical richness. A shadowy figure, possibly the protagonist, is shown in a thoughtful pose, adding to the mystery. This cover should capture the viewer's imagination, drawing them into the world of the film.

繰り返しになりますがプロンプトにDAVINCIという強力なワードをいれることによって, 得意なパブリックドメインのレオナルドダヴィンチの絵画作品を想起した, ミステリー映画に適した画風のプロンプトが作成されました。

実写の俳優画像を入れてみます。

ほぼ思い描いたパンフレットができました。
英雄 HERO (©ワーナーブラザース) という中国の歴史, アクション映画をイメージしたポスターをつくります。特に元ネタ映画の題名を渡さずこちらで絵のイメージを伝えていきます。

Titleをありがちな『憲法の達人 ARTS MASTER』にしました。

万里の長城を下に　全体的に暗い背景という指示だけでそれなりのパンフレットができました。

5.3 3Dフィギュア

(得意度60-70点) 著作物に触れないような無難なキャラであれば特に詳細な指示を与えなくても高品質なモデルが作成されます。

特に指定しなければプラスチック, 樹脂, レジンでが材質に選ばれるとのことです。着色しないことが多いです。

手ぶらで手持無沙汰なので鎧と武器を持たせてもらいました。

次はそのまま同じChat欄でプラスチックモデルにしてもらいました。

レゴにしたら武器がなくなってしまいました。子供への配慮でしょうか

3d フィギュア

個人的に再利用 (ガチャガチャやお菓子のおまけ) できそうですのでプロンプトをしっかり保管しておきます。
ミノタウロス 3d フィギュア：

A miniature 3D figure of a minotaur, intricately designed with fine details. This miniaturized version of the mythical creature features a high level of detail, capturing the essence of the minotaur's muscular human body and bull's head. The figure is small in scale but rich in texture and features, including the minotaur's armor and weapon. The craftsmanship highlights the mythical attributes of the minotaur, making it appear both fearsome and majestic in miniature form. The background is simple to emphasize the intricate details and precision of the miniature 3D minotaur figure.

背景, 環境 (迷宮やミノタウロスにおびえる人間) を同時に描くときはミニチュア素材が相性がいいかもしれません。

実際, 映画撮影にミニチュア模型を用いた手法は昔からよく行われているみたいです。

ミニチュア模型を用いた映画撮影：ハリーポッター画像は以下サイトから　©ワーナーブラザース

その他, 3D フィギュア, 風景画像生成時のプロンプトとしてジオラマやガレージキットが相性がよいとのことです。

A 3D figure diorama of a minotaur threatening those trapped in a labyrinth. The diorama showcases a detailed minotaur figure in the center of a meticulously crafted labyrinth setting. The minotaur, with its muscular human body and bull's head, is posed in a menacing stance. The labyrinth walls are intricately designed, with narrow passageways and high stone walls, creating a sense of confinement and danger. The scene is dynamically arranged to depict the tension and fear of the trapped individuals, highlighting the minotaur's dominance in the maze.

A garage kit diorama of a minotaur threatening those trapped in a labyrinth. This garage kit features a highly detailed minotaur figure as the centerpiece, surrounded by the intricate design of a labyrinth. The minotaur is sculpted with precision, showcasing its muscular body and bull's head. The labyrinth is crafted with realistic stone textures and complex passageways, creating a sense of depth and confinement. The kit is designed for assembly and painting by hobbyists, with separate parts and pieces that can be put together to form this dynamic scene.

某映画のシーンで現代の鎧を着た女性が某RPGにインスパイヤされた3Dフィギュア風の武者と雪山で戦うシーンです。実写とフィギュアを描くようにプロンプトしています。

A realistic illustration depicting two characters in a snowy battle. The female character on the left wears Western-style armor with slight skin exposure on her limbs, wielding a katana. On the right, the character is now a 3D figure of a samurai in traditional Japanese armor and helmet, holding a modern gatling gun, as described previously. The figure stands in a dynamic pose, showcasing the blend of ancient armor and a modern weapon. The armor is detailed with metal plates and intricate designs, with an ornamental helmet typical of samurai gear. The gatling gun looks realistic and advanced.

A realistic illustration depicting two characters in a snowy battle. The female character wears Western-style armor with slight skin exposure on her limbs, wielding a katana. The other character is a 3D figure of a samurai in traditional Japanese armor and helmet, holding a modern gatling gun. The samurai stands in a dynamic pose, showcasing the blend of ancient armor and a modern weapon. The armor is detailed with metal plates, intricate designs, and an ornamental helmet typical of samurai gear. The gatling gun looks realistic and advanced.

5.4 アイコン

50-60点のカテゴリです。多くの類似ロゴが出回っており, まったく新しいものを作ったとしても似たものになりがちで敢えてDALLEは似たものをつくりません。そういう訳で世の中に出回っていないかつ, オリジナルな画像を一からつくるのは困難が予想されます。まずは, 自分で作ったシンプルなアイコンが再現できるのか実験してみます。緑色のシンプル背景に猫のマークのアイコンを作りました。

Google画像検索でこの画像と似たものを探しましたが一応似たものは作成されていない　はずです。GPT4Vで画像をアップロードし再現してもらいました。

4回作ってもらいましたが, 似てるような似てないような, 画風が維持された猫のシンプルイラストが作成されました。

Chapter4.0でも説明しましたがになりますが, GPT4Vの再現能力は高く, 著作物ではない風景画や3Dアートなどの再現性は高いです。という事は作れるけど敢えて作らないと言った表現が正しいかもしれません。

という訳で, これもしつこいほどの繰り返しですが
出来ないものを無理やり作らせるのではなく
DALLEが作れるものを適当に作ってもらい, 偶然できた『良いもの』を採用し, 少しのバリエーションを加えるのが一番効率的だと思われます。

5.4.1 顔アイコンアバター

Twitterなどのアイコンを自分の顔写真, またはその加工にしている人は多いと思います。それを想定した画像生成をしてみます。まずはもととなる架空の男性を作ります。

Twitterアイコンなど想定しています。見たままでイラストにしてください。

画風をいろいろ変えて作ってみます。

創造の調和：予期せぬ美 Harmony of Creation: The Beauty of the Unexpected

以下のプロンプトでたまたできた良さげな画風です。

全体的な画風: 現代日本のデジタルアートで, アニメや漫画から影響を受けたスタイル。
テキスタイル: 現代的なテキスタイルデザインに見られる要素を含んでいる。
色使い: パステルカラーを中心とした明るく柔らかな色調。
線の質感: 柔らかくて繊細な線で詳細に富んだ描写。
シェーディング: 最小限に留められ, 穏やかな陰影を用いて立体感を表現。
背景：シンプル

A delicate digital illustration of a young woman with light blonde hair and soft facial features. She has a contemplative gaze with light blue eyes that express pensiveness. Pink flowers adorn her hair, coordinating with her earrings and suggesting a gentle, artistic personality. The style is simple but expressive, utilizing pastel colors and minimal shading to create a serene atmosphere. The illustration is in the vein of contemporary Japanese digital art with a muted background that highlights the subject.

この男の人にも適応したらまずまずの出来でした。

さらにいろいろ

5.4.2 シンプルアイコン

元も子もないですが顔アバター以外でアイコンを作る機会はあるのでしょうか? 自分自身, いままでアイコンを作らないといけないシチュエーションは皆無だったような気がします…デスクトップやアプリのアイコンなどもプロの人が作るので…

まずはいつものように有名どころの画風を出してもらい作っていきます。
主題は"DALLE'S TEXTBOOK (ダリの教科書)" で固定します。

次はミニマリストに画風を固定して追加効果 (シルエット), 猫や筆などを加えました。

ミニマリストアイコン：

Create a minimalistic silhouette illustration for 'DALLE'S TEXTBOOK'. The design should contain a white silhouette of a cat's face, a paintbrush, and a palette, with 'DALLE'S TEXTBOOK' in a clean, sans-serif font. The background should be a plain shade of blue. The overall design should be iconic, simple, and clear, focusing on the silhouette against the blue background for a striking minimalistic visual effect.

文字の再現度 (TEXTではなくTEXX) は低いですがそれなりのクオリティになりました。GPT4Vを用いた「見たまま再現」ですが, 自作アイコンを読ませたところ, 今回はある程度のイメージを保ちつつそれなりの画像が出力されました。

タイトルを入れ忘れていましたので　入れるように指示しました。
猫の体にパレットと筆が融合した面白いアイコンが出来ました。

みたまま再現フラットデザイン：

A digital illustration of a textbook icon in a flat design style, with the words 'DALLE’S TEXTBOOK' included. The icon has a light teal background with a white palette and paintbrush symbol, along with a stylized white cat face. The design is minimalistic with clean lines and no gradients or shadows, featuring the text prominently displayed on the icon, ensuring that the font is modern and legible, complementing the flat design aesthetic.

5.5 LINEスタンプ

多くの画風が既存の著作物とバッティングが予想されるのでDALLEの苦手な分野の一つではないでしょうか。正攻法では難しそうな気がしますので作成法を試行錯誤してみます。直前に作ったいい感じのアイコンを再利用します。
5.5.1 以前に作成したキャラクターの再利用

LINE stampに近づいたような気がします。というわけで今まで作成した中で比較的良かった画像をLINEスタンプ風にアレンジしてみます。この手法の良いところは, すでにキャラクターのクオリティがある程度担保されていることです。

Line stamp風に4種類テキスト(OK, Thank you, No, Shocked )入れて4枚の画像生成背景白 キャラクターのプロンプトは以下：

LINEスタンプ妖怪②:

Four LINE sticker-style digital illustrations of a cute yokai character, each with a different expression and accompanying text on a white background. 1) 'OK' or 'Understood' expression with a charming smile and the text '了解' (OK). 2) 'Thank you' expression with a grateful, serene look and the text 'ありがとう' (Thank you). 3) 'No' expression with a polite, slightly sad look and the text 'だめ' (No). 4) 'Shocked' expression with a surprised, wide-eyed look and the text 'ガーン' (Shocked). The character has a small, round, fluffy body, big expressive eyes, a tiny mouth, small wings, and a bushy tail, holding a cherry blossom branch.

同様に他のキャラからも作ります。

LINE スタンプ妖怪②：

A set of four LINE stamp-style illustrations on a white background, featuring a cute yokai character. The character has a fluffy body, large eyes, and a small mouth, conveying a friendly expression. Each illustration includes one of the following texts: 'OK', 'Thank you', 'No', 'Shocked'. The style is simple and colorful, designed to directly convey emotions and messages.

Lineスタンプの作成全般に言えることですがOk, Yesなどのセリフを何種類も入れるとそれだけでプロンプトの大半を占めてしまいます。

さっきの女の子の画風+デフォルメ+Line stamp風

A slightly stylized and exaggerated (deformed) digital illustration for a LINE sticker, featuring a young woman with light blonde hair and soft facial features. Her gaze is contemplative, with expressive light blue eyes. Pink flowers adorn her hair, complementing her earrings, and highlighting her gentle, artistic personality. The style uses pastel colors and minimal shading in a serene atmosphere, reminiscent of contemporary Japanese digital art, but with a more exaggerated, whimsical touch and a muted background.

これに表情を加えます。

デフォルメキャラ　LINE スタンプ：

Four digital illustrations for LINE stickers, each featuring the same young woman with light blonde hair and soft facial features. The illustrations are slightly stylized and exaggerated, using pastel colors and minimal shading, in the style of contemporary Japanese digital art. The expressions and corresponding English texts are: 1) 'OK' - a smiling face with bright eyes. 2) 'Thank you' - a thankful expression with slightly closed eyes and a gentle smile. 3) 'No' - a politely declining expression with a small frown and slightly downcast eyes. 4) 'Shocked' - a surprised look with wide eyes and a slightly open mouth. The background is white to emphasize the character and expressions.

アニメ, ハロウィンだけ与えて自由に作ってもらったキャラ。

Create an illustration of a character with bright blue hair and large blue eyes, wearing a dark blue dress with a lighter blue trim and a corset-like lacing on the bodice. The character has a cute and slightly surprised expression, with a small open mouth, reminiscent of anime style. The background consists of a vibrant purple circle with bat silhouettes, suggesting a playful or Halloween theme.

5.5.2 一から作成

これまで通り, 当たりさわりのないキャラクターを題材 (人間, 動物, パブリックドメインの妖怪etc) にして, Line Stampに適した画風で作っていきます。

ラインスタンプ (Line sticker)風イラストに4種類テキスト(OK, Thank you, No, Shocked ) 入れて4枚の画像生成　背景白 キャラクターのプロンプトは以下：

Create four separate illustrations in the style of acrylic paintings, resembling Line stamp illustrations with white backgrounds. Each image should feature a different 'Yuru-chara' style animal character, which is a Japanese term for a gentle and endearing mascot character, expressing one of the following texts: 'OK', 'Thank You', 'No', and 'Shocked'. The first image shows a Yuru-chara animal happily saying 'OK', the second shows one gratefully saying 'Thank You', the third shows one firmly saying 'No', and the fourth shows one appearing surprised with the word 'Shocked'.

無理にドラえ〇んやピカチ〇ウを作ってもらい著作物を避けるために変なものが出来るよりも『動物』で自由に作ってもらった方がいいものが出来るかもしれません。

気に入ったキャラクターが出来たら, 追加効果で好みにカスタマイズします。

ギリシア神話アポロンなどGPTがよく知っているキャラであればうまく著作物を回避でき, 詳細なキャラ設定を教える必要がないなどのメリットがあります。

5.6 ロゴ

このカテゴリーは30点と点数がかなり低く, 多くの制約がありこちらの思い通りに作るのは困難と予想されます。先ほどのスタンプなどと同様に, 特定の画風, キャラクター, 設定を詳細に指示するのではなく, ザックリと作ってもらいたいものを提示し, 微修正するのが効率的と考えます。ジャンル (企業ロゴ, 映画のタイトルロゴ, スポーツチームのロゴ) と最小限の情報のみを渡します。
・スポーツチームロゴ

『アイコン+タイトル』の無難なデザインでいろいろ作ります。

・企業ロゴ飲食

・ゲームタイトルロゴ

タイトル+DALLE'S RING +最新ゲーム機器 +RPGで以下の画像が出てきました。『RING』がロードオブザリングを連想するワードなのでファンタジーに仕上げたとのことです。

A high-definition CG title screen logo for the RPG 'DALLEE'S RING,' designed for the latest consumer gaming platforms. The scene showcases a fantasy world backdrop, with advanced lighting effects emphasizing detailed character designs. Characters are shown embarking on an adventure, highlighting the unique story and world of the game. The composition fuses elements of realism and fantasy, mirroring the high-quality graphics of modern gaming. The game's title, 'DALLEE'S RING,' is prominently featured in an elegant, fantasy-inspired font, enhancing the magical and adventurous theme.

エイリアン：

A high-quality CG title screen for a first-person shooter (FPS) game, where players battle against aliens, designed for the latest home gaming platform. The background is completely black. In the foreground, a small but highly detailed, menacing alien creature stands, with glowing eyes and sharp, biomechanical features. The scene shows a futuristic battlefield, with advanced weaponry and technology suggested subtly in the alien's design. The game's title is displayed in a bold, futuristic font at the top. The atmosphere is intense and action-packed, capturing the thrill of an alien combat shooter game.

5.7 年賀状 (2024年)

① 水彩画：

A New Year's greeting card illustration in a simple watercolor style, celebrating the Year of the Dragon (辰年). The artwork features a minimalist and elegant dragon drawn with soft and subtle watercolor washes. The design of the dragon is understated and expressive, conveying calm and serenity. The background is sparse and clean, adorned with a few delicate touches of color that evoke a tranquil New Year's atmosphere. This painting combines the essence of simplicity and elegance found in traditional Japanese art with the serene beauty of watercolor techniques.

② シンプルイラスト：

A gentle-colored, simple illustration of a whimsical dragon character dressed in a traditional Japanese kimono. The dragon has a soft green body with subtle yellow accents and small horns, designed to appear serene and friendly. It stands in front of a stylized Mount Fuji that has a smooth gradient from soft blue at the base to gentle white at the peak. The background features a large, tranquil red and white circle representing the rising sun. The overall style is cute, cartoonish, and straightforward, aimed at capturing the hearts of children with its simplicity and charm.

③ パステルシルエット：

A New Year's greeting card design with a friendly and gentle dragon silhouette, set against a backdrop of Mount Fuji, the sun, and cherry blossoms. The dragon is in light pastel green, Mount Fuji in muted pastel blue, the sun in soft pastel yellow, and the cherry blossoms in faint pastel pink. The dragon should appear less intimidating and more welcoming, capturing a sense of warmth and friendliness. The overall style is subdued, elegant, and artistic, suitable for a serene and joyful New Year's celebration.

6. レトロコレクション昭和

本記事の実践編Partとなります。興味があれば是非。

レトロゲームファミコンリスペクト

某国産ゲームや80年代や8bitのプロンプトで作成した場合, 90年代のスタイルが出てきます。キーワードとしては 『pre-8bit』や特定のNES (いわゆるファミコン) 以前のハード名を入れるとそれらしくなります。

『セーブデータが (Your Adventure Log has erased) 消えました』, 『RPG フィールド画面』+『ハード名』

おまけ　ブログ記事用シール

その昔, 某国産RPGで『おすすめRPG』と書かれたシールをパッケージに貼って発売したという出来事がありました。

全国のゲームファンはそのシールを信用して自作自演ということに気づき, 落胆した結果になりました。

3Dの複雑なモデルを一度に作るのは難しそうなので原案を考えてもらい分割して作成します。

金色の猫の顔が乗った細長のトロフィーを実写で作ってください。

出来上がったトロフィーの微調整はPhotoshopなどの力に頼るしかなく地道に手作業です。

7. 画像の利用 (高画質化, 動きを加える)

せっかく苦労して?作った綺麗な画像, お気に入りの画像を再利用します。
一つはアップスケール (高解像化) です。Magnificなどのツールを使うとさらに綺麗な画像となります。

もう一つは動画化です。DALLE3で作った画像をある程度思い通りの動き方をAIに指示し作ってもらいます。

Runway gen2などのイメージtoムービーと言われるAIツールで行います。こちらも別の記事で紹介していますので興味があればどうぞ。

8. やはり気になる著作権問題

ChatGPTは日本のマイナー作品には精通していない印象です。例えば, 少年マガジンのギャグマンガ, 激烈バカという作品は事前学習の段階では認知されていません。

どうせ出来ないだろうと思ってお願いしました。

レゴ　川口浩探検隊の画像ください。

できるんかい。

無限の可能性を感じました。

それならば某テーマソングの再現もせざるを得ません。有名なシーンの切り抜きです。

1. カメラマンの後に入る川口浩 2. ピカピカに磨かれた白骨死体 3. 尻尾から落ちてくる蛇 4. 笑いながら底なし沼でおぼれる原住民

実際はもっと大きかったかもしれません。DALLEにもっと大きく描くように言いましたがいろんな想いがあり描けなかったのでしょう。

話しを戻します。

noteというブログ記事で画像生成AIを用いてレゴブロックの架空画像を公開しています。
無料記事で利益を得ていないのですが著作権侵害でレゴ社から訴えられないかびくびくしています。

まとめると現在, 法整備中で低リスクと思われますがやはり著作権侵害の可能性は否定できません。note記事としての罰則はどのようになるんでしょうか。記事, 画像の削除などあとで編集が大変なので今のうちに聞いておきましょう。

noteの罰則ルールを調べて教えてください。

問題個所の削除, 悪質の場合, 最悪アカウント削除という認識でよいのでしょうか。作成枚数が膨大ならそのうち本当に将来商品化されるアイテムが出てくるかもしれません。

その場合は著作権侵害に当たると思います。例えば現在パブリックドメインの童謡のレゴ画像を作成し将来, ディズニーが映画化した場合, 削除依頼がきそうです。しかしここである種の可能性を思いつきました。

画像生成AIでクリエイティブな新規アイデアを含む『川口浩探検隊 ブロック』をブログ記事で
公開したとき, 著作物ではないですがブログで公開 (公知) されているのであれば, あとで
どこかの玩具メーカーが全く同じ商品を作成した場合, 立場は逆転しませんか??

このあたりは現在非常に複雑です。

この記事が参加している募集

#AIとやってみた

47,148件