Cline×Deepseekを使った自然言語BGM生成

2025年1月20日 09:08

前提として、pythonを使えば音声データは作成可能です。
そのため、LLMでの音声データの作成は可能です。

今回の構成では、ClineというAI agentを使って、プログラムの修正・実行を可能にし、Deepseekという格安LLMでpythonを作成することで、自然言語での音声データ作成が可能になります。

つまり、作曲をLLMに依頼し、その成果物に対して、人間が口を出す構成となります。

Clineを使わなくても、以下のテキスト（「★TODO」のところはコピペの必要あり）をChatGPTやClaude、Geminiなどに入力することで、BGMを作成するプログラムを作成してもらうことは可能です。

README.mdとsound_generator.pyを参照し、
ここちのよい3分程度のBGMを作成するプログラムを作成してください。
まずはコンセプトを出力し、プログラムを作成してください。


■README.md
---
★TODO:ここにREADME.mdをコピペ


■sound_generator.py
---
★TODO:ここにsound_generator.pyをコピペ


■config.json
---
★TODO:ここにconfig.jsonをコピペ

ただし、エラーになることもあるので、Clineの方が楽です。

環境構築

Cline×DeepSeekの環境構築については、別の方の記事を共有させていただきます。（従量課金が気になる方もいると思いますが、この記事の内容を検証するまでにもいろいろCline試しながら、この記事の内容を検証、記事の作成までをするのにかかったコストは1ドルです。）

pythonとFFmpegを準備します。

- Python 3.8以上
- 以下のライブラリが必要です：
  ```bash
  pip install pydub numpy scipy ffmpeg-python
  ```
- MP3/OGG出力にはffmpegが必要です：
  1. [ffmpeg公式サイト](https://ffmpeg.org/download.html)からインストーラーをダウンロード
  2. インストール後、システムの環境変数PATHにffmpegのパスを追加

以下のフォルダ構成でファイルを配置します。

├── sound_generator/
│   ├── sound_generator.py  # メインプログラム
│   ├── config.json         # 設定ファイル
│   ├── README.md           # 設計書
│   └── sounds/             # 生成された効果音の保存先

sound_generator.py

# sound_generator.py

import os
import numpy as np
import soundfile as sf
from scipy import signal
from pydub import AudioSegment
import subprocess

class SoundGenerator:
    def __init__(self, sample_rate=44100):
        self.sample_rate = sample_rate
        self.output_dir = "sounds"
        os.makedirs(self.output_dir, exist_ok=True)

    def generate_tone(self, frequency=440, duration=1.0, wave_type='sine', volume=0.5, 
                    fm_ratio=1.0, fm_depth=0.0, am_ratio=1.0, am_depth=0.0,
                    pulse_width=0.5, noise_amount=0.0):
        """指定された波形で単一トーンを生成（FM/AM合成、パルス幅調整、ノイズ追加対応）"""
        t = np.linspace(0, duration, int(self.sample_rate * duration), False)
        
        # 基本波形生成
        if wave_type == 'sine':
            wave = np.sin(frequency * 2 * np.pi * t)
        elif wave_type == 'square':
            wave = signal.square(frequency * 2 * np.pi * t, duty=pulse_width)
        elif wave_type == 'sawtooth':
            wave = signal.sawtooth(frequency * 2 * np.pi * t)
        elif wave_type == 'triangle':
            wave = signal.sawtooth(frequency * 2 * np.pi * t, width=0.5)
        elif wave_type == 'noise':
            wave = np.random.uniform(-1, 1, len(t))
        else:
            wave = np.sin(frequency * 2 * np.pi * t)  # デフォルトはサイン波
            
        # FM合成
        if fm_depth > 0:
            modulator_freq = frequency * fm_ratio
            fm_wave = fm_depth * np.sin(modulator_freq * 2 * np.pi * t)
            wave = np.sin((frequency * 2 * np.pi * t) + fm_wave)
            
        # AM合成
        if am_depth > 0:
            modulator_freq = frequency * am_ratio
            am_wave = 1.0 + am_depth * np.sin(modulator_freq * 2 * np.pi * t)
            wave *= am_wave
            
        # ノイズ追加
        if noise_amount > 0:
            noise = np.random.uniform(-1, 1, len(t)) * noise_amount
            wave = wave * (1 - noise_amount) + noise
            
        # 音量調整とクリッピング防止
        wave = np.clip(wave * volume, -0.99, 0.99)
            
        return wave

    def generate_chord(self, frequencies, duration=1.0, wave_type='sine', volume=0.5):
        """複数周波数を組み合わせてコードを生成"""
        chord_wave = np.zeros(int(self.sample_rate * duration))
        
        for freq in frequencies:
            tone = self.generate_tone(freq, duration, wave_type, volume/len(frequencies))
            chord_wave += tone
            
        # 正規化
        max_val = np.max(np.abs(chord_wave))
        if max_val > 0:
            chord_wave = chord_wave / max_val * volume
        
        return chord_wave

    def generate_sequence(self, notes, durations, wave_type='sine', volume=0.5):
        """ノートのシーケンスを生成"""
        sequence = np.array([])
        
        for note, duration in zip(notes, durations):
            tone = self.generate_tone(note, duration, wave_type, volume)
            sequence = np.append(sequence, tone)
            
        return sequence

    def apply_envelope(self, wave, attack=0.1, decay=0.1, sustain_level=0.7, release=0.2):
        """ADSRエンベロープを波形に適用"""
        total_samples = len(wave)
        attack_samples = int(attack * self.sample_rate)
        decay_samples = int(decay * self.sample_rate)
        release_samples = int(release * self.sample_rate)
        sustain_samples = total_samples - (attack_samples + decay_samples + release_samples)
        
        # エンベロープ時間が長すぎる場合、スケーリング
        if sustain_samples < 0:
            total_env_samples = attack_samples + decay_samples + release_samples
            if total_env_samples == 0:
                return wave
            scale = total_samples / total_env_samples
            attack_samples = int(attack_samples * scale)
            decay_samples = int(decay_samples * scale)
            release_samples = int(release_samples * scale)
            sustain_samples = 0  # 持続フェーズなし
        
        # 攻撃フェーズ
        if attack_samples > 0:
            attack_env = np.linspace(0, 1, attack_samples)
            wave[:attack_samples] *= attack_env
        
        # 減衰フェーズ
        if decay_samples > 0:
            decay_env = np.linspace(1, sustain_level, decay_samples)
            wave[attack_samples:attack_samples + decay_samples] *= decay_env
        
        # 持続フェーズ
        if sustain_samples > 0:
            wave[attack_samples + decay_samples:attack_samples + decay_samples + sustain_samples] *= sustain_level
        
        # リリースフェーズ
        if release_samples > 0:
            release_env = np.linspace(sustain_level, 0, release_samples)
            wave[-release_samples:] *= release_env
        
        return wave
        
    def apply_distortion(self, wave, gain=2.0, mix=0.5):
        """ディストーションエフェクトを適用"""
        distorted = np.tanh(wave * gain)
        return wave * (1 - mix) + distorted * mix
        
    def apply_delay(self, wave, delay_time=0.5, feedback=0.5, mix=0.3):
        """ディレイエフェクトを適用"""
        delay_samples = int(delay_time * self.sample_rate)
        output = np.zeros_like(wave)
        
        for i in range(len(wave)):
            output[i] += wave[i]
            if i >= delay_samples:
                output[i] += output[i - delay_samples] * feedback
                
        return wave * (1 - mix) + output * mix
        
    def apply_reverb(self, wave, decay=0.7, mix=0.3):
        """シンプルなリバーブエフェクトを適用"""
        impulse_length = int(self.sample_rate * 1.5)  # 1.5秒のインパルス
        impulse = np.random.uniform(-1, 1, impulse_length)
        impulse *= np.exp(-np.linspace(0, 10, impulse_length)) * decay
        
        # 畳み込みでリバーブを適用
        reverb_wave = np.convolve(wave, impulse, mode='same')
        
        # 正規化
        reverb_wave = reverb_wave / np.max(np.abs(reverb_wave))
        
        return wave * (1 - mix) + reverb_wave * mix

    def save_sound(self, wave, filename):
        """生成された音声をファイルに保存"""
        try:
            filepath = os.path.join(self.output_dir, filename)
            print(f"Saving sound to: {filepath}")
            
            # ステレオデータの場合、データを2チャンネルに分割
            if wave.ndim == 2 and wave.shape[1] == 2:
                # 正規化
                max_val = np.max(np.abs(wave))
                if max_val > 0:
                    wave = wave / max_val * 0.9  # 少し余裕を持たせる
                sf.write(filepath, wave, self.sample_rate)
            else:
                # モノラルデータ
                sf.write(filepath, wave, self.sample_rate)
            
            # ファイルが正しく作成されたか確認
            if os.path.exists(filepath):
                file_size = os.path.getsize(filepath)
                print(f"Successfully saved: {filename} (Size: {file_size} bytes)")
                return filepath
            else:
                print(f"Error: File was not created at {filepath}")
                return None
                
        except Exception as e:
            print(f"Error saving sound file: {str(e)}")
            import traceback
            traceback.print_exc()
            return None

    def has_ffmpeg(self):
        """ffmpegがインストールされているか確認"""
        try:
            subprocess.run(["ffmpeg", "-version"], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
            return True
        except FileNotFoundError:
            return False

if __name__ == "__main__":
    generator = SoundGenerator()
    
    # テスト用の複雑な音を生成
    wave = generator.generate_tone(
        frequency=440,
        duration=2.0,
        wave_type='square',
        volume=0.8,
        fm_ratio=1.5,
        fm_depth=0.3,
        am_ratio=2.0,
        am_depth=0.2,
        pulse_width=0.3,
        noise_amount=0.05
    )
    
    # エンベロープを適用
    wave = generator.apply_envelope(wave, attack=0.05, decay=0.1, sustain_level=0.8, release=0.2)
    
    # エフェクトを適用
    wave = generator.apply_distortion(wave, gain=3.0, mix=0.4)
    wave = generator.apply_delay(wave, delay_time=0.3, feedback=0.5, mix=0.3)
    wave = generator.apply_reverb(wave, decay=0.7, mix=0.3)
    
    # 保存
    generator.save_sound(wave, "complex_sound.wav")

config.json

{
    "audio_settings": {
        "sample_rate": {
            "default": 44100,
            "min": 8000,
            "max": 192000
        },
        "channels": {
            "default": 1,
            "options": [1, 2]
        },
        "bit_depth": {
            "default": 16,
            "options": [8, 16, 24, 32]
        },
        "output_format": {
            "default": "wav",
            "options": ["wav", "mp3", "ogg"]
        }
    },
    "effects": {
        "beep": {
            "frequency": {
                "default": 440,
                "min": 20,
                "max": 20000
            },
            "duration": {
                "default": 0.5,
                "min": 0.1,
                "max": 10.0
            },
            "wave_type": {
                "default": "sine",
                "options": ["sine", "square", "sawtooth", "triangle"]
            },
            "effects": {
                "fade_in": {
                    "default": 0.1,
                    "min": 0.0,
                    "max": 1.0
                },
                "fade_out": {
                    "default": 0.1,
                    "min": 0.0,
                    "max": 1.0
                }
            }
        },
        "whoosh": {
            "start_freq": {
                "default": 100,
                "min": 20,
                "max": 20000
            },
            "end_freq": {
                "default": 1000,
                "min": 20,
                "max": 20000
            },
            "duration": {
                "default": 1.0,
                "min": 0.1,
                "max": 10.0
            },
            "effects": {
                "reverb": {
                    "default": 0.2,
                    "min": 0.0,
                    "max": 1.0
                }
            }
        }
    }
}

README.md

# ローカル効果音生成システム

## 概要
このシステムはPythonを使用して、ローカル環境で高品質な効果音を自動生成するツールです。基本的な波形生成から、複雑なエフェクト処理まで幅広い音声生成が可能です。

## 主な機能
- 基本波形生成（サイン波、矩形波、ノコギリ波）
- エフェクト処理（フェードイン/アウト、パン、リバーブ）
- 複数フォーマット出力（WAV, MP3, OGG）
- 設定ファイルによる柔軟なカスタマイズ

## システム構成
```
.
├── sound_generator/
│   ├── sound_generator.py  # メインプログラム
│   ├── config.json         # 設定ファイル
│   ├── README.md           # 設計書
│   └── sounds/             # 生成された効果音の保存先
```

## 前提条件
- Python 3.8以上
- 以下のライブラリが必要です：
  ```bash
  pip install pydub numpy scipy ffmpeg-python
  ```
- MP3/OGG出力にはffmpegが必要です：
  1. [ffmpeg公式サイト](https://ffmpeg.org/download.html)からインストーラーをダウンロード
  2. インストール後、システムの環境変数PATHにffmpegのパスを追加

## 使用方法
1. config.jsonを編集して効果音のパラメータを設定
2. 以下のコマンドで実行
```bash
python sound_generator.py
```

## 注意事項
- 出力ディレクトリ（sounds/）は自動生成されます
- 長時間の音声生成には大量のメモリを消費する可能性があります
- 高品質なMP3出力にはffmpegのインストールが必要です

## 拡張方法
新しい効果音を追加する手順：
1. config.jsonの"effects"セクションに新しいエントリを追加
2. SoundGeneratorクラスに新しいeffect_typeの処理を実装
3. 必要に応じて新しい波形生成関数を追加

## ライセンス
MIT License

## 新しいBGMの作成手順
1. sound_generator.pyをコピーして新しいファイルを作成（例: fantasy_bgm.py）
2. 新しいファイル内でgenerate_fantasy_bgm()のパラメータを調整
3. ファイルを実行して新しいBGMを生成
4. 生成されたファイルはsounds/ディレクトリに保存されます

例:
```bash
cp sound_generator.py fantasy_bgm.py
python fantasy_bgm.py
```

BGM作成

以下のようなプロンプトで、ClineにBGM作成を依頼します。

「sound_generator」フォルダに音声を作成するためのプログラムを用意しています。詳細はreadme.mdを参照してください。

fantasyゲームの教会的な3分程度のBGMを作成してください。

BGM①：ClineとDeepseekで作成したBGM

より複雑なBGM作成

o1やGemini 2.0 Flash Experimentalなどの高度推論モデルに以下のプロンプト（「★TODO」のところはコピペの必要あり）を投げて、clineに投げるためのプロンプトを作成してもらいます。

ただし、複雑になればなるほどCPUでの処理に時間がかかります。

README.mdとsound_generator.pyとconfig.jsonを参照し、
ここちのよいパスるゲームのような3分程度のBGMを作成したいです。
単調なリズムにならないようにLLMにプログラム修正とプログラムの実行を依頼します。
プロンプトを作成してください。

■README.md
---
★TODO:ここにREADME.mdをコピペ


■sound_generator.py
---
★TODO:ここにsound_generator.pyをコピペ


■config.json
---
★TODO:ここにconfig.jsonをコピペ

BGM②：ClineとDeepseekとo1で作成したBGM（リラックス）

https://youtu.be/9zRGJPHJVy0
※たぶん載せられる動画の上限で表示されず

BGM➂：ClineとDeepseekとo1で作成したBGM（教会）

BGM④：ClineとDeepseekとo1で作成したBGM（パズル）

感想

昔のゲームのBGM的な曲は割と作成できそうな雰囲気がしてます。
今回は雑に作ったため、単調になる部分の制御が不十分ですが、Clineであれば今回の設定やソースコードをベースに、より複雑でリズムカルな曲が作成できると思います。

ちなみに、Clineにソースコードなどの作成はお任せしているので、ソースコードの内容は一切見ていません。