OpenAI、Soraの訓練データにYouTube利用か？真相に迫る　”OpenAI's Use of YouTube Videos for Sora Training: Unveiling the Truth”

TOMMY・YOSHIDA（吉田勉）

2024年4月24日 06:55

OpenAIの動画生成AI「Sora」は、そのリアルな映像生成能力で大きな注目を集めました。しかし、その訓練データの背後には、YouTube動画の無断利用疑惑が浮上しています。

多くの専門家は、Soraの訓練データに大量のYouTube動画が使われた可能性が高いと指摘しています。しかし、OpenAIはデータソースを明らかにしていません。

この疑惑について、OpenAIの最高技術責任者（CTO）ミラ・ムラティ氏は、インタビューの中で以下のように述べています。

「私たちは、公に利用可能なデータと、クリエイターからライセンス供与を受けたデータの両方を使用してSoraを訓練しました。」

しかし、具体的にどの程度のYouTube動画が使われたのか、どのようにライセンス供与を受けたのかについては言及を避けています。

この曖昧な回答は、疑惑をさらに深める結果となりました。

Soraの訓練データにYouTube動画が大量に使われたと仮定すると、以下の点が懸念されます。

著作権侵害: YouTube動画の無断利用は、著作権侵害にあたる可能性があります。
データの偏り: 特定の動画に偏ったデータで訓練すると、生成される動画にも偏りが生じる可能性があります。
倫理的な問題: 倫理的に問題のある動画で訓練されたAIが、倫理的に問題のある動画を生成する可能性があります。
OpenAIは、これらの懸念に対して十分な説明を行っていないと批判されています。

Soraの技術的な可能性は高く評価されていますが、その一方で、倫理的な問題も無視できません。OpenAIは、これらの問題に対して透明性のある説明を行う必要があります。

OpenAI's video generation AI, Sora, released in February, has garnered significant attention for its impressive ability to produce realistic videos. However, concerns have arisen regarding the potential unauthorized use of YouTube videos in its training data.

Numerous experts have expressed their belief that Sora's training data likely includes a substantial amount of YouTube videos. However, OpenAI has remained tight-lipped about the data sources.

Addressing these concerns, OpenAI's CTO, Mira Murati, stated in an interview:

"We trained Sora using a combination of publicly available data and data licensed from creators."

However, she declined to elaborate on the specific quantity of YouTube videos used or the licensing procedures.

This vague response has only amplified the existing doubts.

Assuming that a significant portion of YouTube videos were indeed utilized in Sora's training, the following issues are of concern:

Copyright Infringement: The unauthorized use of YouTube videos could constitute copyright infringement.
Data Bias: Training with a biased dataset of videos could lead to generated videos that also exhibit biases.
Ethical Concerns: An AI trained on ethically questionable videos might produce ethically problematic outputs.
OpenAI has been criticized for its lack of transparency in addressing these concerns.

While Sora's technical capabilities are highly regarded, the ethical implications cannot be overlooked. OpenAI must provide clear and transparent explanations to address these concerns.

#OpenAI #Sora #訓練データ #YouTube #TrainingData

OpenAI、Soraの訓練データにYouTube利用か？ 真相に迫る ”OpenAI's Use of YouTube Videos for Sora Training: Unveiling the Truth”

いいなと思ったら応援しよう！

OpenAI、Soraの訓練データにYouTube利用か？真相に迫る　”OpenAI's Use of YouTube Videos for Sora Training: Unveiling the Truth”