Veo 2 ・ Imagen 3 (改良版) ・ Whisk の概要

2024年12月17日 03:40

以下の記事が面白かったので、簡単にまとめました。

・State-of-the-art video and image generation with Veo 2 and Imagen 3

1. Veo 2 - 最先端のビデオ生成

「Veo 2」は、幅広いテーマやスタイルで、驚くほど高品質なビデオを作成できる動画生成AIです。人間の評価者による直接比較では、主要モデルに対して最先端の結果を達成しました。

現実世界の物理法則や人間の動きと表情のニュアンスに対する理解が深まり、全体的なディテールとリアリティが向上します。「Veo 2」は映画撮影の独特な言語を理解しています。ジャンルを尋ね、レンズを指定し、映画効果を提案すれば、最大4Kの解像度で、数分の長さまで延長してそれを実現します。シーンの真ん中を滑るように移動するローアングルのトラッキングショットや、顕微鏡をのぞいている科学者の顔のクローズアップショットを要求すれば、それを作成します。プロンプトで「18mm lens」を提案すれば、このレンズで知られる広角ショットを作成したり、「shallow depth of field」をプロンプトに入力して背景をぼかして被写体に焦点を合わせたりします。

Veo 2 prompt: Cinematic shot of a female doctor in a dark yellow hazmat suit, illuminated by the harsh fluorescent light of a laboratory. The camera slowly zooms in on her face, panning gently to emphasize the worry and anxiety etched across her brow. She is hunched over a lab table, peering intently into a microscope, her gloved hands carefully adjusting the focus. The muted color palette of the scene, dominated by the sickly yellow of the suit and the sterile steel of the lab, underscores the gravity of the situation and the weight of the unknown she is facing. The shallow depth of field focuses on the fear in her eyes, reflecting the immense pressure and responsibility she bears.

Veo 2 prompt: This medium shot, with a shallow depth of field, portrays an adorable cartoon girl with wavy brown hair and lots of character, sitting upright in a 1980s kitchen. Her hair is medium length and wavy. She has a small, slightly upturned nose, and small, rounded ears. She is very animated and excited as she talks to the camera and lighting and giggling with a huge grin.

ビデオモデルでは、余分な指や予期しないオブジェクトなど、不要な詳細が「幻覚」として表示されることがよくありますが、「Veo 2」ではこうした詳細があまり表示されないため、出力がよりリアルになります。

安全性と責任ある開発に対する当社の取り組みが「Veo 2」の指針となっています。「Veo」の可用性を意図的に高めることで、「VideoFX」「YouTube」「Vertex AI」を通じて徐々に展開しながら、モデルの品質と安全性を特定、理解、改善できるよう努めています。

他の画像およびビデオ生成モデルと同様に、「Veo 2」の出力には目に見えない「SynthID 透かし」が含まれており、AI生成であることを識別するのに役立つため、誤報や誤認の可能性を減らすことができます。

本日 (2024年12月16日)、「Google Labs」のビデオ生成ツールである「VideoFX」に「Veo 2」を導入し、アクセスできるユーザー数を拡大します。順番待ちリストに登録するには、 Google Labsにアクセスしてください。また、来年には「YouTube Shorts」やその他の製品にも「Veo 2」を拡張する予定です。

2. Imagen 3 - 最先端の画像生成

画像生成モデル「Imagen 3」も改良し、より明るく、より構成の優れた画像を生成できるようになりました。フォトリアリズムから印象派、抽象からアニメまで、より多様なアートスタイルをより高い精度でレンダリングできるようになりました。このアップグレードにより、プロンプトへの忠実性も向上し、より豊かなディテールとテクスチャをレンダリングします。人間の評価者による出力と主要な画像生成モデルを並べて比較したところ、最先端の結果を達成しました。

本日 (2024年12月16日)より、最新の「Imagen 3」が、「Google Labs」の画像生成ツールである「ImageFX」で世界 100 か国以上に展開されます。開始するには、「ImageFX」にアクセスしてください。

Imagen 3 prompt: A close-up shot captures a winter wonderland scene – soft snowflakes fall on a snow-covered forest floor. Behind a frosted pine branch, a red squirrel sits, its bright orange fur a splash of color against the white. It holds a small hazelnut. As it enjoys its meal, it seems oblivious to the falling snow.

Imagen 3 prompt: A portrait of an Asian woman with neon green lights in the background, shallow depth of field.

3. Whisk - 画像でアイデアを視覚化できる新ツール

「Google Labs」の最新の実験である「Whisk」では、思い描いている主題、シーン、スタイルを伝える画像を入力または作成できます。その後、それらを組み合わせてリミックスし、デジタルぬいぐるみからエナメルピンやステッカーまで、自分だけのユニークなものを作成できます。

「Whisk」は、最新の「Imagen 3」と「Gemini」の視覚的理解および説明機能を内部的に組み合わせています。「Gemini」は、画像の詳細なキャプションを自動的に書き込み、その説明を「Imagen 3」に送り込みます。このプロセスにより、被写体、シーン、スタイルを楽しく新しい方法で簡単にリミックスできます。