はじめに あまり日本風に拘らずに、(欧米産AIモデルで)得意そうな画風で生成しました。
下記手法で生成したものです。※ 簡単に説明するとFlux1→SD1.5→Flux1→SD1.5の順で交互にimg2imgを行い、それらの繋ぎのプロンプトをLLMで生成しています。
利用した主なAIモデルは以下です。
flux1-schnell-fp8.safetensors(Flux.1 Schnell)
realisticVisionV51_v51VAE.safetensors(Stable Diffusion 1.5)
llava-llama3/llama3.2(LLM)
4x_NMKD-Superscale-SP_178000_G.pth(ESRGANアップスケール)
作品とプロンプト The image portrays a young goddess, who appears to be of european descent. She has long blonde hair that falls past her shoulders. Her attire consists of a red and silver armor, which covers her torso and arms, providing an impression of strength and resilience. With many roses. In her right hand, she holds a shield adorned with intricate designs, suggesting readiness for battle or defense. The background is light gray, creating a stark contrast with the woman's vibrant attire. Scattered around her are white insanely many feathers and lights, adding an element of mystery to the scene. The woman stands in profile, looking off to the side, as if surveying her surroundings or contemplating something. The image does not contain any text or other discernible objects. The relative positions of the objects suggest that the woman is the central figure in this composition. Her position and attire indicate she may be a warrior or heroine, ready for action. The feathers surrounding her could symbolize victory or peace after battle. 1.A 20 yo princess is standing, wearing a intricate patterned gorgeous long red armor dress with intricate lace trim and a silver belt. 2. The dress is made of a gorgeous light-colored mithril with darker accents on the sleeves and skirt. And the dress emits strong lights. 3. The woman's blonde hair is styled in loose waves and she is wearing gold earrings.4. Her left hand rests on her hip while her right hand is casually placed on her thigh.5. Background is a death bringer's citadel. There is a young woman with long blonde hair. She has a serious expression on her face and is performing a magic. The woman is wearing a silver gorgeous dress that reaches down to her knees. Over the dress, she has a blue cape-like garment draped over her shoulders. Her head is adorned with a silver headdress featuring a circular design in the center. The background of the image features a gray sky filled with clouds with lights and thunder. It is heavy rains. On either side of the woman, there are two large wings. The wing on the left is white and appears to be made of feathers, while the one on the right is black and seems to be made of bone-like material. The wings seem to be attached to the woman's back, suggesting that she might have the ability to fly or transform into a bird at will. Her entire body is engulfed in flames. The overall color scheme of the image is predominantly silver due to the woman's dress and headdress, blue from her cape-like garment, and gray from the background sky. デススターへ視察に降り立った女性士官 A portrait of a Greek goddess standing amidst the ruins of a once-great greek castle, now reduced to ashes and rubble. Her silver armor is scorched, and her braids are singed. The background is filled with the remnants of smoldering buildings, with the sky above heavy with dark, swirling clouds.Scorched wooden beams from collapsed structures protrude from the rubble, blackened and twisted by fire.Thin tendrils of smoke rise from smoldering piles of ash and debris and stones, adding a sense of recent destruction.The skeletal remains of a greek shrine stand in the background, with its roof caved in and walls partially collapsed.Shredded Spartan banners hang limply from broken poles, their once-proud symbols now faded and torn. Holding a tattered olympus banner in both hands, her expression filled with pride and sorrow as she gazes at, strong bright sun rays within the clouds. a cinematic photo of a young princess with a circlet standing in front of a cityscape at night. The woman wears a mithril dress with gorgeous accessories. Her blonde hair is styled into an updo, and she holds a black object that could be either a weapon or a tool. She appears alert and ready, looking off to the side. The futuristic city in the background features tall buildings with numerous windows, all illuminated against the night sky. A bright blue light stands out in the top right corner of the image, possibly indicating a significant location or point of interest within the cityscape. The style is photorealistic, capturing the mood and atmosphere of a dark, futuristic urban environment. The colors are muted yet vibrant, with shades of grey, blue, and black dominating the palette.
まとめ Flux.1のおかげで、重なったり複雑な状態になっていない限りは、指を修正する必要がかなり少なくなりました。
最終的に1536x2048の画像を生成しているので、筆者のRTX3060パソコンでは、1枚あたり(Flux1:2回、SD1.5:2回、アップスケール:3回、LLM:2回)で3分程度になります。
また、LLMの「Mixture of Agents」の手法では、異なるAIモデルのテキスト出力結果を利用すると性能が上がる事が知られています。おそらく画像生成でも似たような現象がおきているのだと推測しています。ControlNetのようにネットワークに直接混ぜるのではなく、VAEを通してピクセル空間に変換した出力結果を利用する事が重要なのだと思います。
人間も三人寄れば文殊の知恵と言われるように、脳を直接つなぐよりも、口頭で意思疎通を行った方が性能が上がる現象の一つなのでしょう。