英語のポットキャストを日本語に吹き替えてみる

2024年4月14日 15:02

YouTubeの英語動画を見る際、日本語字幕を利用することができますが、ポッドキャストのように目を離しても楽しめるメリットを字幕では得られません。そこで、AI技術を使って動画の日本語吹き替えを作成する方法をご紹介します。この方法を使えば、動画を見ながらではなく、通勤中や家事をしながらでも英語のコンテンツを楽しむことができるようになります。

必要なのは Google Colab と OpenAI API Key だけです。

まずは、必要なライブラリをインストールしましょう。

!pip install openai
!pip install pydub
!pip install python-dotenv
!pip install yt-dlp

続いて、必要なライブラリをインポートします。

import json
import os

from dotenv import load_dotenv
from openai import OpenAI
from pydub import AudioSegment

OpenAIのAPIキーを環境変数に設定します。

os.environ['OPENAI_API_KEY'] = "YOUR OPENAI API KEY"

次に、テキストを文章に分割し、日本語に翻訳し、適切な長さで音声にするための関数を定義します。

def group_into_sentences(words):
    sentences = []
    sentence = []
    for word in words:
        sentence.append(word)
        if word.endswith(('.', '?', '!')):
            sentences.append(' '.join(sentence))
            sentence = []
    return sentences

def translate_sentences_to_japanese(client, input_sentences):
    data = {"sentences": input_sentences}
    inputs_json = json.dumps(data, indent=2)

    response = client.chat.completions.create(
        model="gpt-3.5-turbo-0125",
        response_format={"type": "json_object"},
        messages=[
            {"role": "system", "content": "You are a helpful assistant designed to output JSON. Please translate the following inputs into Japanese. Fomat is `{'sentences': ['これは例です。', 'これは例です。']}`"},
            {"role": "user", "content": f"{inputs_json}"}
        ]
    )
    outputs_sentences = json.loads(response.choices[0].message.content)["sentences"]

    return outputs_sentences

def extract_in_chunks(lst, chunk_size=10):
    lst_length = len(lst)
    chunks = []

    for i in range(0, lst_length, chunk_size):
        chunks.append(lst[i:i + chunk_size])

    return chunks

def bundle_sentences(sentences, bundle_size=300):
    bundled = []
    current_bundle = ""

    for sentence in sentences:
        # 例外処理
        sentence = str(sentence)
        if sentence.strip() == "":
            continue

        # メイン処理
        if len(current_bundle + sentence) <= bundle_size:
            current_bundle += sentence
        else:
            bundled.append(current_bundle)
            current_bundle = ""

    if current_bundle:
        bundled.append(current_bundle)

    return bundled

YouTubeから音声を抽出し、それを日本語に翻訳し、最終的に日本語の音声ファイルとして出力するプロセスは以下の通りです。

# YouTube動画のURLを指定
youtube_url = "https://www.youtube.com/watch?v=FZieYYj0ImE" # ここに動画のURLを入力

# YouTube動画から音声をダウンロード
!yt-dlp -x --audio-format mp3 -o "audio.%(ext)s" --audio-quality 96K "{youtube_url}"

# OpenAIクライアントを初期化
client = OpenAI()

# 音声ファイルからテキストへの書き起こし
audio_file= open("audio.mp3", "rb")
transcription = client.audio.transcriptions.create(
  model="whisper-1",
  file=audio_file
)

# テキストを文章に分割
sentences = group_into_sentences(transcription.text.split())

# 文章をチャンクに分けて日本語に翻訳
translated_sentences = []
chunks = extract_in_chunks(sentences)
for chunk in chunks:
    outputs_sentences = translate_sentences_to_japanese(client, chunk)
    translated_sentences += outputs_sentences

# 翻訳済みの文書をチャンクに分ける
bundled_sentences = bundle_sentences(translated_sentences)

# 翻訳済みの文を結合して音声に変換
temp_dir = "temp"
os.makedirs(temp_dir, exist_ok=True)

temp_files = []

for i, sentence in enumerate(bundled_sentences):
    response = client.audio.speech.create(
        model="tts-1",
        voice="alloy",
        input=sentence,
    )
    temp_file_path = os.path.join(temp_dir, f"temp_{i}.mp3")
    response.stream_to_file(temp_file_path)
    temp_files.append(temp_file_path)

# 各音声ファイルを結合
combined = AudioSegment.empty()
for temp_file in temp_files:
    sound = AudioSegment.from_mp3(temp_file)
    combined += sound

# 最終的な音声ファイルをエクスポート
combined.export("final_output.mp3", format="mp3")

# 一時ファイルをクリーンアップ
for temp_file in temp_files:
    os.remove(temp_file)
os.rmdir(temp_dir)
os.remove("audio.mp3")

最後に、生成した日本語の音声ファイルを聴くために、以下のコードを使ってColab上で再生することができます。また、ファイルをダウンロードしてローカルで聴くことも可能です。

音声ファイルを再生するには、以下のコードを実行します。

from IPython.display import Audio
Audio('final_output.mp3')

音声ファイルを自分のコンピュータにダウンロードしたい場合は、以下のコードを実行します。

from google.colab import files
files.download('final_output.mp3')

英語のポットキャストを日本語に吹き替えてみる

いいなと思ったら応援しよう！