Stability.aiのDream StudioのAPIを使ってみた

2023年3月21日 00:42

ここ1年で画像生成系のAIで名前を聞くようになりましたStability.aiは、イギリスのAI開発会社で、高性能な画像生成AI「Stable Diffusion」を開発しています。Dream Studioは、Stability.aiが公開したサービスで、テキストを入力すると画像が出力されるツールです。Dream Studioでは、Stable Diffusionの最新バージョンや編集モデルを利用できます。

今回は、Text-To-Image、Image-To-Image、CLIPガイダンスについて記載しております。

Dream StudioのAPIキー作成

そんなわけで今回は、Dream StudioのAPIを使ってみようかと思います。

Dream Stduioのアカウントの取得は以下から取得します。

https://beta.dreamstudio.ai/membership?tab=home

API Key->Create API KeyでAPIキー作成を作成していきます。

早速、どんな画像が生成されるかAPIキーを利用してみていきます。Stability.aiのページにサンプルコードがありますので、参考にさせて頂きます。

Text-to-Image

今回利用したのは、下記のコードを参考にさせて頂きました。

https://platform.stability.ai/docs/features/text-to-image?tab=python

今回、実施したコードは以下です。Stability.aiのページにサンプルコードがありますので、各々の意味を確認したい人は確認してみてください。Your-API-KeyにStability.aiで取得したAPIキーを書いて下さい。promptにはお好みのpromptを入れてください。

%pip install stability-sdk
import getpass, os

os.environ['STABILITY_HOST'] = 'grpc.stability.ai:443'
os.environ['STABILITY_KEY'] = 'Your-API-Key'

import io
import os
import warnings

from IPython.display import display
from PIL import Image
from stability_sdk import client
import stability_sdk.interfaces.gooseai.generation.generation_pb2 as generation


stability_api = client.StabilityInference(
    key=os.environ['STABILITY_KEY'], 
    verbose=True, 
    engine="stable-diffusion-v1-5", 
    # Available engines: stable-diffusion-v1 stable-diffusion-v1-5 stable-diffusion-512-v2-0 stable-diffusion-768-v2-0 stable-inpainting-v1-0 stable-inpainting-512-v2-0
    ) 

answers = stability_api.generate(
    prompt="(((super realistic))), (((best quality))),((masterpiece)), ((ultra-detailed)), a girl, smile, (((super realistic black hair))), shirt, black eyes, an anime style",
　  seed=992446758, 
　  steps=30, 
    cfg_scale=8.0,
    width=512, 
    height=512, 
    samples=1, 
    sampler=generation.SAMPLER_K_DPMPP_2M 
    )

for resp in answers:
    for artifact in resp.artifacts:
        if artifact.finish_reason == generation.FILTER:
            warnings.warn(
                "Your request activated the API's safety filters and could not be processed."
                "Please modify the prompt and try again.")
        if artifact.type == generation.ARTIFACT_IMAGE:
            img = Image.open(io.BytesIO(artifact.binary))
            display(img)

今回作成された結果は以下です。

(((super realistic))), (((best quality))),((masterpiece)), ((ultra-detailed)), a girl, smile, (((super realistic black hair))), shirt, black eyes, an anime style

promptを変えて試してみます。赤いポチが気になりますが、良い絵ですね。

A digital Illustration of the Babel tower, 4k, detailed, trending in artstation, fantasy vivid colors, 8k

Image-To-Image

今回は、Imageの画像をベースにして、クレヨン風に書き換えてくれというものになります。

今回は、下記コードを参考にさせて頂きました。

今回のコードは以下です。簡潔に説明しますとベースとなる画像imgを作成し、それに対してpromptでクレヨン風に作成して下さいと指示を出し、img2を作成する流れとなります。

Your-API-Keyには、Stability.aiで取得したAPIキーを記載して下さい。

%pip install stability-sdk
import getpass, os

os.environ['STABILITY_HOST'] = 'grpc.stability.ai:443'
os.environ['STABILITY_KEY'] = 'Your-API-Key'

import io
import os
import warnings

from IPython.display import display
from PIL import Image
from stability_sdk import client
import stability_sdk.interfaces.gooseai.generation.generation_pb2 as generation

stability_api = client.StabilityInference(
    key=os.environ['STABILITY_KEY'], 
    verbose=True, 
    engine="stable-diffusion-v1-5", 
    # Available engines: stable-diffusion-v1 stable-diffusion-v1-5 stable-diffusion-512-v2-0 stable-diffusion-768-v2-0 stable-inpainting-v1-0 stable-inpainting-512-v2-0
)


answers = stability_api.generate(
    prompt="(((super realistic))), (((best quality))),((masterpiece)), ((ultra-detailed)), a girl, smile, (((super realistic black hair))), shirt, black eyes, an anime style",
    seed=992446758,
    steps=30, 
    cfg_scale=8.0,
    width=512, 
    height=512, 
    sampler=generation.SAMPLER_K_DPMPP_2M  
)


for resp in answers:
    for artifact in resp.artifacts:
        if artifact.finish_reason == generation.FILTER:
            warnings.warn(
                "Your request activated the API's safety filters and could not be processed."
                "Please modify the prompt and try again.")
        if artifact.type == generation.ARTIFACT_IMAGE:
            img = Image.open(io.BytesIO(artifact.binary))
            display(img)


answers = stability_api.generate(
    prompt="sketchy crayon outline on old paper",
    init_image=img, 
    start_schedule=0.6, 
    seed=992446758, 
    steps=30,
    cfg_scale=8.0,
    width=512, 
    height=512,
    sampler=generation.SAMPLER_K_DPMPP_2M 
)


for resp in answers:
    for artifact in resp.artifacts:
        if artifact.finish_reason == generation.FILTER:
            warnings.warn(
                "Your request activated the API's safety filters and could not be processed."
                "Please modify the prompt and try again.")
        if artifact.type == generation.ARTIFACT_IMAGE:
            img2 = Image.open(io.BytesIO(artifact.binary))
            display(img2)

出力結果は以下です。画風を変えたにも関わらず画像の安定感が良いです。

promptを変えて、他にも見てみます。

CLIPガイダンス

CLIPガイダンスとは、画像生成AIの一種である拡散モデルにおいて、テキストから画像を生成する際に、CLIPという別のAIモデルを利用してガイドする方法です。CLIPは、テキストと画像の関連性を評価することができるAIモデルで、拡散モデルに対して目標となる画像の特徴を伝える役割を果たします。しかし、CLIPガイダンスは、拡散過程のノイズに弱く、フォトリアリスティックな画像やテキストに忠実な画像を生成するのには不向きです。

今回は、下記ページを参考にさせて頂きました。

今回のコードは以下です。Your-API-Keyには、Stability.aiで取得したAPIキーを利用してください。

%pip install stability-sdk
import getpass, os

os.environ['STABILITY_HOST'] = 'grpc.stability.ai:443'
os.environ['STABILITY_KEY'] = 'Your-API-Key'

import io
import os
import warnings

from IPython.display import display
from PIL import Image
from stability_sdk import client
import stability_sdk.interfaces.gooseai.generation.generation_pb2 as generation


stability_api = client.StabilityInference(
    key=os.environ['STABILITY_KEY'], 
    verbose=True, 
    engine="stable-diffusion-v1-5", 
    # Available engines: stable-diffusion-v1 stable-diffusion-v1-5 stable-diffusion-512-v2-0 stable-diffusion-768-v2-0 stable-inpainting-v1-0 stable-inpainting-512-v2-0
)

answers = stability_api.generate(
    prompt="best high quality landscape, in the morning light, 日本, 桜, 神社, by greg rutkowski and thomas kinkade,Trending on artstationmakoto shinkai style",
    seed=992446758,
    steps=50, 
    cfg_scale=7.0, 
    width=512, 
    height=512, 
    sampler=generation.SAMPLER_K_DPMPP_2S_ANCESTRAL, 
    guidance_preset=generation.GUIDANCE_PRESET_FAST_GREEN # Enables CLIP Guidance. 
)

for resp in answers:
    for artifact in resp.artifacts:
        if artifact.finish_reason == generation.FILTER:
            warnings.warn(
                "Your request activated the API's safety filters and could not be processed."
                "Please modify the prompt and try again.")
        if artifact.type == generation.ARTIFACT_IMAGE:
            img = Image.open(io.BytesIO(artifact.binary))
            display(img)

CLIPガイダンスを有効にした結果と無効にした結果を見てみましょう。

CLIPガイダンスはフォトリアルは、苦手とありますので、フォトリアルではないものを生成してみます。