Mitsua Likesに触れてみる（Windows+CUDA、VRAM 8GB）

2024年12月18日 21:18

※ Last update 12-18-2024
※ 3.にて、生成した16枚の画像とパラメーターを掲載しています。
※ 利用の際は必ず「Mitsua Likes 表示-非営利ライセンス」を確認してください。
※ CUDA Toolkitは12.1以上で、HuggingFaceのアカウントとGitが必要です。VRAMは8GBであれば十分です。

■ 0. 概要

▼ 0-0. はじめに

　本記事では「Mitsua Likes」を利用してみます。このモデルは「明示的オプトインで許諾を得たデータ、オープンライセンス及びパブリックドメインのデータのみ」を使用し、「モデルのアーキテクチャ全体(CLIP Text Encoder, VAE, UNet)が、他のモデルの知識を使用することなく、完全にゼロから学習」されたものとのことです。詳細は下記のリンク先より確認してください。

▼ 0-1. モデル等

Mitsua Likes : A Text-to-Image Diffusion Model trained on Opt-In Contributors' "Likes"
https://huggingface.co/Mitsua/mitsua-likes

Mitsua Likes Demo
https://huggingface.co/spaces/Mitsua/Likes-demo

▼ 0-2. 関連リンク

絵藍ミツア
https://elanmitsua.com/
https://x.com/elanmitsua

株式会社アブストラクトエンジン
https://abstractengine.ltd/
（2021年に株式会社ライゾマティクスから社名変更）

■ 1. 利用の準備

▼ 1-1. はじめに

　CUDA Toolkit（12.1以上）とGitがインストールされていること、HuggingFaceのアカウントを取得していることを前提として、以降の手順を進めます。

▼ 1-2. アクセス許可のための同意

　https://huggingface.co/Mitsua/mitsua-likes へアクセスして、モデルへアクセスするための同意を行ってください。下記は参考のため掲載します。

Mitsua Likes 表示-非営利ライセンス
https://elanmitsua.notion.site/Mitsua-Likes-15baa85a9b278005bba5f30866a35f48

Abstract Engineのプライバシーポリシー
https://elanmitsua.notion.site/2023-1-16-664669b0aebc4b1d90aba7c068bd7c86

▼ 1-3. モデルのアクセス許可待ち

　下記の表示が出ている間は、まだアクセス許可が出ていないのでダウンロードができません。先に1-4.～1-5.の手順を行っておくと良いでしょう。

　アクセス許可が出ると、下記の表示に切り替わります。

アクセス許可が出た状態

▼ 1-4. 実行ディレクトリの準備

　作業ディレクトリを「\aiwork」、実行ディレクトリを「\aiwork\mitsua-likes-test」としていますので、お好みの場所に読み替えてください。

　下記のコマンドを順に実行してください。実行ディレクトリを作成して、その中に移動します。

cd \aiwork
mkdir mitsua-likes-test
cd mitsua-likes-test

▼ 1-5. 実行環境の準備

　下記のコマンドを順に実行してください。「pip install」実行時に、ダウンロード等で若干の時間を要します。

python -m venv venv
venv\Scripts\activate
python -m pip install --upgrade pip
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install accelerate protobuf transformers==4.44.2 diffusers==0.31.0 sentencepiece==0.2.0

▼ 1-6. アクセストークンの確認

　モデルをダウンロードする際にアクセストークンが必要です。有効なトークンが分かっている場合はそれを利用できます。下記はトークンを取得する手順です。

　最初に https://huggingface.co/settings/profile へアクセスして、左側からAccess Tokensを選択します。既に存在するトークンを更新しても良い場合は、右側の三点をクリックして「Invalidate and refresh」を選ぶと新しいトークンが表示されます。もしくは、「Create new token」をクリックして「Token type」は「read」を選び、適当な名前を付けて「Create token」をクリックすると表示されます。トークンの値は再表示できないので注意してください。他で利用していなければ、その都度「Invalidate and refresh」を行って取得しても構いません。

▼ 1-7. モデルのダウンロード

　下記のコマンドを実行してください。「Enter your token～」と表示されたら、1-6.で確認したアクセストークンを入力します。「Add token as git credential? (Y/n)」では、そのままEnterを押します。「Login successful」と表示されたら完了です。

huggingface-cli login

　次に、下記のコマンドを順に実行してください。正しくアクセス許可が出ていればモデルがダウンロードできます。

　ダウンロードが完了したら、いったんコマンドプロンプトのウインドウを閉じてください。本当は閉じなくても良いのですが、2回目以降の起動手順を確認していただくためこのようにしています。

git lfs install
git clone https://huggingface.co/Mitsua/mitsua-likes

■ 2. 画像の生成

▼ 2-1. 生成用のコードについて（CUI版）

　サンプルコードを参考に作成しました。コードの一部はChatGPTを利用しています。簡素ながら下記の機能があります。

コマンドラインにてプロンプトの指定が可能
画像にクレジットを表示（ライセンス上、生成画像を公開、共有する場合は必須）
生成の設定が変更可能（コードを直接書き換え）
生成枚数の指定が可能（デフォルトは5）
類似度やパラメーターを画像に記録

　自己責任において、本記事に掲載したコードを自由に変更したり再公開したりすることができます。その場合は、Mitsua Likesのライセンスをよく確認してください。

▼ 2-2. 生成用のコード（CUI版）

　下記のコードを、実行ディレクトリ（\aiwork\mitsua-likes-test 等）に設置してください。

　少し長くなりますが、上記ファイルの全コードを掲載します。

# Mitsua Likes Image Generator
#  Model: https://huggingface.co/Mitsua/mitsua-likes
#  License: https://elanmitsua.notion.site/Mitsua-Likes-15baa85a9b278005bba5f30866a35f48

# usage:
#   python mitsua-test.py [Prompt]
# example:
#   python mitsua-test.py
#   python mitsua-test.py 笑う絵藍ミツア
#   python mitsua-test.py "upper body, girl, smile, digital illustration"

# 設定 / Settings

# 生成枚数 / Number of images
num_images = 5

# 出力先 / Output directory
output_dir = "output"

# パラメーター / Generation parameters
prompt = '''絵藍ミツアと花畑、先生アート'''
negative_prompt = '''elan doodle, lowres'''
width = 768
height = 768
guidance_scale = 5.0
guidance_rescale= 0.7
num_inference_steps = 40

# Tips.
#  スタイル / Style
#   "先生アート" or "sensei art"
#   "デジタルイラスト" or "digital illustration"
#   "アナログイラスト" or "analog illustration"
#   "3d cg"
#   "芸術作品" or "artworks"
#  解像度（横、縦） / resolution (width, height)
#   1024, 576 (16: 9)
#    896, 672 ( 4: 3)
#    768, 768 ( 1: 1)
#    672, 896 ( 3: 4)
#    576,1024 ( 9:16)

# フォント / Font
font_name = "Arial.ttf"


##########################################################
### 以下は変更不要です / No changes are required below ###
##########################################################

import os
import sys
import torch
from diffusers import DiffusionPipeline
from datetime import datetime
from PIL import  ImageDraw, ImageFont, PngImagePlugin

# Make output dir
os.makedirs(output_dir, exist_ok=True)

# Prompt from commandline
if len(sys.argv) > 1:
    prompt = sys.argv[1]

# Device configuration
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.float16

# Set up the pipeline
pipe = DiffusionPipeline.from_pretrained("./mitsua-likes").to(device, dtype=dtype)
#pipe = DiffusionPipeline.from_pretrained("Mitsua/mitsua-likes", trust_remote_code=True).to(device, dtype=dtype) # original diffusers model

# Image generation loop
for i in range(num_images):

    # Generate a seed
    seed = torch.seed()
    generator = torch.Generator(device=device).manual_seed(seed)
    
    # Generate image
    ret = pipe(
        prompt=prompt,
        negative_prompt=negative_prompt,
        guidance_scale=guidance_scale,
        guidance_rescale=guidance_rescale,
        width=width,
        height=height,
        num_inference_steps=num_inference_steps,
    )
    
    # Similarity detection model output
    similarity_detection = (
        f"Similarity Restriction: {ret.detected_public_fictional_characters[0]}\n"
        "Similarity Measure:\n"
        + "\n".join(
            f"{k} : {v:.3%}"
            for k, v in ret.detected_public_fictional_characters_info[0].items()
        )
    )
    print (similarity_detection)

    # Output image
    image = ret.images[0]

    # Add credit *** DO NOT REMOVE ***
    font_path = os.path.join(os.environ["WINDIR"], "Fonts", font_name)
    if not os.path.isfile(font_path):
        raise FileNotFoundError(f"Font file not found: {font_path}")
    font_size = 32
    text = "Generated by Mitsua Likes"
    font = ImageFont.truetype(font_path, font_size)
    draw = ImageDraw.Draw(image)
    text_bbox = draw.textbbox((0, 0), text, font=font)
    text_width, text_height = text_bbox[2] - text_bbox[0], text_bbox[3] - text_bbox[1]
    x = image.width - text_width - 20
    y = image.height - text_height - 20
    draw.text((x + 2, y + 2), text, font=font, fill="black")
    draw.text((x, y), text, font=font, fill="white")

    # Set metadata
    img_parameters = (
        f"\n"
        f"Prompt: {prompt} \n"
        f"Negative prompt: {negative_prompt} \n"
        f"Steps: {num_inference_steps} \n"
        f"Guidance Scale: {guidance_scale} \n"
        f"Guidance Rescale: {guidance_rescale} \n"
        f"Seed: {seed} \n"
        f"Width: {width} \n"
        f"Height: {height} \n"
        f"Model: https://huggingface.co/Mitsua/mitsua-likes \n"
        f"{similarity_detection}\n"
    )
    metadata = PngImagePlugin.PngInfo()
    metadata.add_text("parameters", img_parameters)
    
    # Save an image
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    filename = f"{output_dir}/{timestamp}-{seed}-mitsua_likes.png"
    image.save(filename, "PNG", pnginfo=metadata)
    print(f"Saved: {filename}")

▼ 2-3. 生成の実行

　コマンドプロンプトを開いてから、下記のコマンドを順に実行してください。実行ディレクトリに移動して仮想環境を有効化します。

cd \aiwork\mitsua-likes-test
venv\Scripts\activate

　なお、実行ディレクトリに下記の内容でバッチファイル（ファイル名は activate_venv.bat 等）を用意しておくと、エクスプローラー上から実行するだけで済むのでおすすめです。

@echo off
cd %~dp0
call venv\Scripts\activate.bat
cmd /k

　下記のコマンドを実行すると、Mitsua Likesのモデルを読み込んで画像を5枚生成し、「outputs」ディレクトリに保存します。

python mitsua-test.py

　デフォルトのプロンプトは「絵藍ミツアと花畑、先生アート」になっています。実行時にプロンプトを指定して変更ができます。

python mitsua-test.py 街の夜景、星空

　英語プロンプト等で半角スペースを入れたい場合は、ダブルクオーテーション（""）で囲んでください。

python mitsua-test.py "tomato on table, artworks"

▼ 2-4. 設定の変更

　設定を変更したい場合は、コードを直接書き換えてください。参考まで、該当する部分を再掲載します。プロンプトに記号が使えるように、promptとnegative_promptはトリプルクオーテーションで囲っています。

# 生成枚数 / Number of images
num_images = 5

# 出力先 / Output directory
output_dir = "output"

# パラメーター / Generation parameters
prompt = '''絵藍ミツアと花畑、先生アート'''
negative_prompt = '''elan doodle, lowres'''
width = 768
height = 768
guidance_scale = 5.0
guidance_rescale= 0.7
num_inference_steps = 40

# Tips.
#  スタイル / Style
#   "先生アート" or "sensei art"
#   "デジタルイラスト" or "digital illustration"
#   "アナログイラスト" or "analog illustration"
#   "3d cg"
#   "芸術作品" or "artworks"
#  解像度（横、縦） / resolution (width, height)
#   1024, 576 (16: 9)
#    896, 672 ( 4: 3)
#    768, 768 ( 1: 1)
#    672, 896 ( 3: 4)
#    576,1024 ( 9:16)

# フォント / Font
font_name = "Arial.ttf"