Google unveils Veo 2 and Imagen 3 with advanced capabilities


Google on Monday unveiled two advanced AI models—Veo 2 for video generation and Imagen 3 for image generation—both designed to deliver state-of-the-art results. These models are now available through VideoFX, ImageFX, and the new Google Labs experiment, Whisk.

Veo 2

Veo 2 creates high-quality videos across a wide range of subjects and styles, achieving exceptional results in human-rated comparisons against leading models. This model enhances realism by understanding real-world physics and human movement, which helps generate more detailed and lifelike videos.

Users can request Veo 2 to produce specific cinematic effects, such as a low-angle tracking shot or a close-up of a scientist. It can generate videos in resolutions of up to 4K and durations lasting several minutes.

Key Features of Veo 2
  • Enhanced Realism and Fidelity: Achieves superior detail, realism, and artifact reduction compared to other video generation models.
  • Advanced Motion Capabilities: Accurately portrays motion, following detailed instructions and capturing complex scenes.
  • Camera Control Options: Interprets instructions precisely, offering a diverse range of shot styles, angles, and movements.
Performance Benchmarks

Veo 2 outperformed other video generation models in head-to-head comparisons, as evaluated by human raters. It excelled in overall preference and accurately followed prompts, producing fewer unrealistic details (such as extra fingers or misplaced objects).

Challenges and Limitations

Despite its advancements, Veo 2 still faces challenges in maintaining consistency during complex scenes with intricate motion. Google is continually working to refine these aspects.

Safety Measures

Veo 2 outputs feature an invisible SynthID watermark to identify them as AI-generated, reducing the risk of misinformation and misattribution. Google is also expanding access to Veo 2 carefully via VideoFX, YouTube, and Vertex AI.

Imagen 3

Imagen 3, the latest version of Google’s image generation model, now produces brighter, more vibrant images with improved color balance and fidelity. It also expands its capability to render diverse art styles, from photorealism to abstract art and anime.

Key Features of Imagen 3
  • Brightness and Vibrancy: Enhanced color balance for richer, more vivid images.
  • Greater Art Style Versatility: Capable of generating a wide range of styles, including photorealism, impressionism, abstract, and anime.
  • High-Fidelity Detail: Produces highly detailed textures and enhanced visual appeal.
Performance Benchmarks

Imagen 3 has achieved state-of-the-art results in human-rated comparisons against other image generation models. It received top scores for visual quality, prompt accuracy, and appeal.

Other Notable Improvements
  • Greater Prompt Understanding: Imagen 3 now better understands natural language prompts, reducing the need for complex instructions.
  • Increased Detail and Precision: The model captures intricate details such as specific camera angles and fine textures.
  • Better Text Rendering: Improved capabilities allow for more accurate and stylized text, useful for creating birthday cards, presentations, and other designs.
Imagen 3 Prompt: A portrait of an Asian woman with neon green lights in the background, shallow depth of field.
Safety Measures

Google has implemented extensive filtering, data labeling, and red-teaming to ensure the safety and fairness of Imagen 3’s outputs. SynthID watermarking is also applied to all images, providing an imperceptible but detectable identifier for AI-generated content.

Availability

  • Veo 2: Available through Google Labs’ VideoFX, with plans to expand its use to YouTube Shorts and other products in the coming year. Users can join the waitlist on Google Labs for early access.
  • Imagen 3: Rolled out globally in ImageFX, Google’s image generation tool, now accessible in over 100 countries.