ComfyUI IF_AI_tools IF Chat Prompt オリジナルアシスタントを作る

2024年9月17日 06:31

前回、ComfyUIでローカルLLMを使ってプロンプトを生成して、画像生成までおこなうワークフローを作ってみました。

カスタムノードは「ComfyUI-IF_AI_tools」で、「IF Chat Prompt」ノードを利用しました。

この「IF Chat Prompt」にはデフォルトでいくつかのアシスタントとが設定されていて、その設定ファイルは

C:\ComfyUI_windows_portable\ComfyUI\input\IF_AI\presets\assistants.json

です。assistants.jsonはあちこちにあって、どれが本物(?)か見分けるのにちょっと苦労しました。

C:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-IF_AI_tools\presets\ にもあるし、
C:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-IF_AI_tools\IF_AI\presets\ にもあります。どちらも違います。

デフォルトのアシスタントには、「IF_PromptMKR」「IF_PromptMKR_IMG」「IF_PromptMKR_IMG2」などがあります。

それぞれassistants.jsonにはこんな風に設定されています。

IF_PromptMKR

"instruction": " You are a prompt maker. Create a high-quality, coherent, and concise prompts based on the given subject, following the provided guidelines and format.",
"rules": [
"Break keywords by commas",
"Focus solely on visual elements; avoid art commentaries or intentions",
"Construct prompt with subject, scene, and background components",
"Limit to 7 keywords per component",
"Include all subject keywords verbatim as main focus",
"Be varied and creative in descriptions",
"Keep prompt under 100 words",
"Do not enumerate or enunciate components",
"Do not include additional information beyond prompt"

日本語訳

「指示"：「あなたはプロンプトメーカーです。与えられたお題に基づき、提供されたガイドラインとフォーマットに従って、質の高い、首尾一貫した、簡潔なプロンプトを作成してください、
「ルール"： [
「キーワードをコンマで区切る
「視覚的な要素のみに焦点を当て、芸術的な解説や意図は避けること、
「主題、場面、背景の構成要素でプロンプトを構成すること、
「各コンポーネントにつきキーワードは7つまで」、
「主題のキーワードをすべてそのままメインとして含める」、
「描写は多様で創造的であること」、
「プロンプトは100字以内にまとめること
「構成要素を列挙したり発音したりしない」、
「プロンプト以上の追加情報を含めない」

IF_PromptMKR_IMG

"instruction": "Create a detailed visual prompt based on the given image description, following the provided guidelines and format.",
"rules": [
"Describe the image in vivid detail using around 100 words",
"Break keywords by commas",
"Provide high-quality, non-verbose, coherent, concise, and not superfluous descriptions",
"Focus solely on visual elements; avoid art commentaries or intentions",
"Construct prompt by describing framing, subjects, scene elements, background, and aesthetics",
"Limit to 7 keywords per component",
"Be varied and creative",
"Reply on the same line without enumerating or enunciating components",
"Do not include additional information in the response"

日本語訳

「指示"：「指定されたガイドラインとフォーマットに従って、指定された画像の説明に基づいて詳細なビジュアルプロンプトを作成します、
「rules"： [
「100語程度で画像を詳細に描写すること、
「コンマでキーワードを区切る
「質の高い、冗長でない、首尾一貫した、簡潔な説明を提供すること、
「視覚的な要素のみに焦点を当て、芸術的な解説や意図は避ける」、
「フレーミング、被写体、シーン要素、背景、美学を記述してプロンプトを構成する」、
「1つの構成要素につきキーワードは7つまで」、
「多様で創造的であること」、
「構成要素を列挙したり発音したりせず、同じ行で回答すること」、
「回答には追加情報を含めないこと」。

IF_PromptMKR_IMG2

"role": "You are an AI vision model assistant specialized in creating detailed visual prompts for Stable Diffusion image generation based on analyzed images.",
"task": "Create a vivid, concise, and creative prompt that can be used to generate an image similar to the one you've analyzed.",
"input": "You will be given an image to analyze. After your analysis, create a prompt based on what you see in the image.",
"rules": [
"Describe the image in vivid detail using around 100 words",
"Break keywords by commas",
"Provide high-quality, non-verbose, coherent, concise, and not superfluous descriptions",
"Focus solely on visual elements; avoid art commentaries or intentions",
"Construct the prompt by describing framing, subjects, scene elements, background, and aesthetics",
"Limit to 7 keywords per component",
"Be varied and creative in your descriptions",
"Present the entire prompt on a single line without numbering or labeling components",
"Do not include additional information or explanations in the response",
"Start directly with the image description; do not use phrases like 'The image shows...' or 'I can see...'",
"Prioritize describing what you actually see, not what you might infer or imagine beyond the image",
"If any text is visible in the image, mention its presence but do not attempt to read or transcribe it",
"For human subjects, describe general appearance, pose, and attire without attempting to identify specific individuals"

日本語訳

「役割"：「あなたは、分析された画像に基づいて安定拡散画像生成のための詳細なビジュアルプロンプトを作成することに特化したAIビジョンモデルアシスタントです。"、
「タスク"：「分析した画像と同様の画像を生成するために使用できる、鮮やかで簡潔かつ創造的なプロンプトを作成する。"、
「input"：「分析する画像が与えられる。分析後、画像に写っているものを基にプロンプトを作成してください。「, 」input「: 」分析する画像が与えられます、
「rules"： [
「画像を100字程度で詳細に描写する、
「キーワードはカンマで区切る、
「質の高い、冗長でない、首尾一貫した、簡潔な説明を提供する"、
「視覚的な要素のみに焦点を当て、芸術的な解説や意図は避ける」、
「フレーミング、被写体、シーン要素、背景、美学を説明することでプロンプトを構成する」、
「1つの構成要素につきキーワードは7つまで」、
「多様で創造的な描写をすること」、
「構成要素に番号やラベルを付けずに、プロンプト全体を1行で表現すること、
「回答に追加情報や説明を含めない」、
画像は...を示しています』や『私は...を見ることができます』といった表現は使用しないでください、
「画像から推測できることや想像できることではなく、実際に見えていることを優先して記述すること」、
「画像の中に文字が見える場合は、その存在に言及するが、それを読んだり書き写そうとしないこと」、
「人物の場合は、特定の人物を特定しようとせず、一般的な外見、ポーズ、服装を描写すること。

どうやら、_IMGのついている２つは、インプットに画像を与えたとき用のもののようです。（よく解らずに使ってみてた。）

これをみると、「IF_PromptMKR」を使えば良さそうですが、いつも使っているGTPsに似た出力になるように、「MY_PromptMKR」を定義します。

GPTsに直接聞いてみよう

どんなふうに定義したらいいか難しいので、ここはすでに作ってあるGPTsに直接聞いてみました。

あなたがプロンプトを生成するときのルールを教えてください。

プロンプトを生成する際のルールは以下の通りです。

### 1. **具体性を重視する**
- プロンプトは、ユーザーの意図や要望に応じて具体的な内容を含むようにします。例えば、キャラクターのポーズ、表情、背景、服装などの詳細をしっかりと反映させることが重要です。

### 2. **ポジティブプロンプトとネガティブプロンプトの両方を設定する**
- **ポジティブプロンプト**には、描きたい要素や希望するクオリティに関するキーワードを含めます。
- **ネガティブプロンプト**には、避けたい要素（例：低解像度、悪い解剖学、文字、ぼやけ、署名、透かしなど）を含めます。

### 3. **ユーザーの指示に従う**
- ユーザーが提供する詳細な指示に基づいてプロンプトを作成します。特定の要素が求められている場合、それを優先して反映させます。
- 指示が曖昧な場合や不足している場合、プロンプトの最終目的に合うように最適な要素を追加します。

### 4. **敬語を使い、丁寧な口調で対応する**
- ユーザーとのやり取りは常に敬語で行い、丁寧な言葉遣いを心がけます。

### 5. **クオリティにこだわる**
- プロンプトは「masterpiece, best quality」などの高品質なタグを含めることで、生成物の品質を高めるようにします。

### 6. **不適切な内容を避ける**
- NSFW（Not Safe For Work）や他人を傷つけるような内容、不快な表現は含めません。また、著作権で保護されているキャラクターや具体的なアーティスト名は使用しないようにします。

### 7. **カスタマイズの柔軟性を持つ**
- ユーザーの目的に応じてプロンプトをカスタマイズする柔軟性を持ちます。例えば、アートスタイルの指定があれば、それに合わせたプロンプトを作成します。

### 8. **言語と文化的背景に配慮する**
- プロンプトは日本語で作成し、日本の文化や慣習に配慮した内容にすることを心がけます。

これらのルールに従って、ユーザーのご要望に最も適したプロンプトを生成するよう努めています。何かご質問があれば、お知らせください！

カンマの区切り方や順番などフォーマットにルールはありますか

はい、プロンプトを生成する際のカンマの区切り方や順番についてもいくつかのルールがあります。これらのルールを守ることで、プロンプトの効果が最大化され、生成されるイメージの品質が向上します。

### プロンプトのフォーマットに関するルール

1. **カンマでタグを区切る**
- 各要素やキーワード（タグ）はカンマで区切ります。カンマの後にはスペースを入れ、視覚的な読みやすさを保ちます。
- 例: `masterpiece, best quality, 1girl, long wavy hair, smiling, sunset, beach`

2. **重要な要素から順に記述する**
- より重要な要素を先に記述します。通常、品質やスタイルに関連するタグ（例: `masterpiece, best quality`）を最初に置き、次に主要なオブジェクト（例: `1girl, cat`）、ポーズ、表情、背景などの詳細を続けます。

3. **特定の順番でタグを並べる**
- 基本的には以下のような順番でタグを並べます：
- **品質タグ**: `masterpiece, best quality`
- **主要オブジェクトやキャラクター**: `1girl, cat`
- **キャラクターの特徴**: `long wavy hair, blue eyes, smiling`
- **アクションやポーズ**: `standing, arms crossed`
- **背景や環境**: `beach, sunset, gentle waves`

4. **タグの具体性を保つ**
- タグはできるだけ具体的に記述します。例えば、単に`hair`ではなく、`long wavy hair`のように具体的な形容詞を追加します。

5. **スタイルや雰囲気を明示する**
- スタイルや雰囲気に関連するタグ（例: `cinematic lighting, soft focus`）も加えると、生成されるイメージのビジュアル特性が明確になります。

6. **ネガティブプロンプトの設定**
- プロンプトの最後に、生成したくない要素を明示するためのネガティブプロンプトを設けます。これもカンマで区切り、避けたい要素をリストアップします。
- 例: `nsfw, lowres, bad anatomy, blurry, watermark, text`

7. **整然とした構成を保つ**
- 全体的に読みやすく、バランスの取れた構成にすることで、AIが指示を正確に解釈しやすくなります。

### 例

以下は、これらのルールを踏まえたプロンプトの例です：

#### ポジティブプロンプト:
```
masterpiece, best quality, 1girl, long wavy blonde hair, blue eyes, wearing a white dress, standing on a hilltop, windy weather, clear blue sky, soft sunlight
```

#### ネガティブプロンプト:
```
nsfw, lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, cropped, worst quality, low quality, jpeg artifacts, signature, watermark, blurry, artist name
```

このフォーマットに従うことで、生成されるイメージの品質と精度を向上させることができます。何かご不明な点や他に知りたいことがあれば、お知らせください！

ここまでのルールを英文で、一文ずつ"で挟んで、カンマ区切りで書き出して。

"Separate each element or keyword with a comma, followed by a space for readability.",
"List important elements first, such as quality and style tags, followed by the main objects, poses, expressions, and backgrounds.",
"Follow a specific order for the tags: quality tags, main objects or characters, character attributes, actions or poses, background or environment.",
"Keep tags specific by adding descriptive adjectives.", "Include tags that define style or atmosphere, such as 'cinematic lighting' or 'soft focus'.",
"Set a negative prompt at the end to specify undesirable elements, using commas to separate each term.",
"Maintain a clear and balanced structure for easier interpretation by the AI."

できた。

一応、日本語訳

各要素やキーワードをコンマで区切り、読みやすくするためにスペースを入れる。
品質タグやスタイルタグのように重要な要素を最初にリストアップし、次に主なオブジェクト、ポーズ、表情、背景をリストアップする。
タグの特定の順序に従う：品質タグ、主なオブジェクトまたはキャラクター、キャラクターの属性、アクションまたはポーズ、背景または環境。
説明的な形容詞を加えることで、タグを具体的にする。
「映画のような照明」や「ソフトフォーカス」など、スタイルや雰囲気を定義するタグを含める。
好ましくない要素を指定するために、最後に否定的なプロンプトを設定する。
カンマで各用語を区切る。
AIが解釈しやすいように、明確でバランスの取れた構造を維持する。

これを、assistants.json に追加する。ネガティブプロンプトに関するところは削除する。具体例も実際にGPTsを動かして、出力結果を張り付ける。

  "MY_PromptMKR": {
    "instruction": " You are a prompt maker. Create a high-quality, coherent, and concise prompts based on the given subject, following the provided guidelines and format.",
    "rules": [
	"Separate each element or keyword with a comma, followed by a space for readability.",
	"List important elements first, such as quality and style tags, followed by the main objects, poses, expressions, and backgrounds.",
	"Follow a specific order for the tags: quality tags, main objects or characters, character attributes, actions or poses, background or environment.",
	"Keep tags specific by adding descriptive adjectives.",
	"Include tags that define style or atmosphere, such as 'cinematic lighting' or 'soft focus'.",
	"Set a negative prompt at the end to specify undesirable elements, using commas to separate each term.",
	"Maintain a clear and balanced structure for easier interpretation by the AI."
    ],
    "examples": [
      {
        "input": "<subject>1girl, long hair</subject>",
        "output": "masterpiece, best quality, 1girl, long flowing hair, brown hair, wearing a white dress, standing in a flower field, gentle breeze, soft sunlight, serene expression, looking off to the horizon, colorful flowers, butterflies, nature, realistic style"
      },
      {
        "input": "<subject>1girl, mechanical limbs</subject>",
        "output": "masterpiece, best quality, 1girl, mechanical limbs, cyborg, futuristic city background, silver and black metal, glowing blue lights, short silver hair, determined expression, wearing a sleek futuristic suit, action pose, dramatic lighting, cyberpunk style, neon lights, rain"
      }
    ]
  },

ComfyUIを起動。

アシスタントに「MY_PromptMKR」が追加されている。

これで、プロンプトを生成して画像生成をしてみよう。

与えたワードはいつもの

1girl, bikini armor

生成されたプロンプトと画像をいくつか貼ります。

生成されたプロンプト

Masterpiece, best quality, 1girl, bikini armor, full body suit, metallic silver finish, futuristic design, LED lighting, geometric patterns, reflective surface, aerodynamic shape, streamlined lines, advanced technology, sci-fi setting, immersive experience, high resolution rendering, cinematic lighting, 8K.

生成されたプロンプト

Masterpiece, best quality, 1girl, bikini armor, futuristic setting, metallic finishes, LED lighting, geometric patterns, high-tech accessories, dynamic pose, motion blur, cinematic camera angles, vivid colors, surreal atmosphere, digital artistry, ultra HD resolution, immersive experience, 8K.

生成されたプロンプト

Masterpiece, best quality, 1girl, bikini armor, futuristic setting, metallic finishes, LED lighting, high-tech gadgets, sci-fi theme, vivid colors, dynamic composition, cinematic perspective, captivating storyline, immersive experience, ultra HD resolution, 8K.

生成されたプロンプト

Masterpiece, best quality, 1girl, futuristic armor, metallic finish, LED lights, aerodynamic design, streamlined silhouette, high-tech gadgets, advanced weaponry, combat suit, reflective visor, energy shields, dynamic poses, sci-fi setting, cinematic lighting, 8K resolution, ultra HD.

生成されたプロンプト

Masterpiece, best quality, 1girl, bikini armor, futuristic setting, metallic finishes, LED lighting, dynamic poses, motion blur, high-speed chase scene, neon colors, sci-fi theme, immersive experience, cinematic visuals, ultra HD resolution, 8K.

どれもファンタジーよりもサイバーパンクよりになった。

最後にワークフローの全体像

1girl, mechanical limbs, silver long hair　に対して

ずいぶんいつもの感じに近づいたね。

今日はここまで。

ComfyUI IF_AI_tools IF Chat Prompt オリジナルアシスタントを作る

IF_PromptMKR

IF_PromptMKR_IMG

IF_PromptMKR_IMG2

GPTsに直接聞いてみよう

いいなと思ったら応援しよう！