Google unveils Gemini 2.0 AI model built for agentic experiences

Google on Wednesday announced Gemini 2.0, its most advanced AI model to date, designed to support the evolving “agentic era.” With features like native image and audio output, multimodal support, and advanced tool integration, Gemini 2.0 aims to create more capable AI agents, edging closer to Google’s vision of a universal assistant.

Gemini 2.0 Flash: Boosted Performance and New Features

Gemini 2.0 Flash builds on the success of its predecessor, 1.5 Flash, offering significantly faster performance—twice as fast as the 1.5 Pro model.

It supports multimodal inputs such as images, videos, and audio, alongside multimodal outputs like generated images and multilingual text-to-speech (TTS) audio. The model also enables native tool usage, including Google Search and third-party functions, enhancing versatility.

Unlocking New Agentic Experiences

Gemini 2.0 Flash introduces capabilities such as multimodal reasoning, long-context understanding, and advanced planning, enabling the development of more complex AI agents. Google is testing these possibilities through various prototypes, including Project Astra, Project Mariner, and Jules.

  • Project Astra: This prototype, designed for Android devices, integrates improved multilingual dialogue, tool integration (Google Search, Lens, Maps), and up to 10 minutes of in-session memory. It also features lower latency for more natural conversations. Google plans further testing on prototype glasses.
  • Project Mariner: Aimed at enhancing human-agent interaction for web tasks, it uses Gemini 2.0 to process web elements like text, images, and code. Though still in early stages, it has already achieved an 83.5% performance rate in web task execution. The project focuses on improving efficiency and safety with user confirmation for sensitive actions.
  • Jules for Developers: An experimental AI agent integrated into GitHub, Jules assists developers by identifying issues, proposing solutions, and executing plans under supervision. This initiative is part of Google’s broader aim to integrate AI across various domains, including software development.
AI Agents in Games and Beyond

Leveraging DeepMind’s gaming experience, Google is developing AI agents that can reason based on game actions and provide real-time suggestions. These agents are being tested in games like Clash of Clans and Hay Day in collaboration with developers like Supercell. Google is also exploring Gemini 2.0’s potential for robotics and spatial reasoning in physical environments.

Responsible AI Development with Gemini 2.0

Google is committed to responsible AI development, focusing on safety and security as it explores new agentic capabilities. Key steps include:

  • Collaboration with the Responsibility and Safety Committee (RSC): Google’s internal review group assesses potential risks and safety measures.
  • AI-assisted Red Teaming: Gemini 2.0’s advanced reasoning capabilities enable automated risk evaluations, optimizing safety at scale.
  • Multimodal Safety: Training Gemini 2.0 to handle diverse inputs and outputs, ensuring safety across all data types.
  • Project Astra and Mariner: Ongoing research to prevent users from unintentionally sharing sensitive information, while providing privacy controls and ensuring user instructions take precedence over malicious prompt injections.

Google remains focused on responsible AI development, ensuring safety and security are integral to their approach.

AI Agents as Key Milestones Toward AGI

Google noted that the release of Gemini 2.0 Flash and several research prototypes mark a new chapter in the Gemini era. The advancements signify an exciting milestone in AI development, as Google continues to build toward Artificial General Intelligence (AGI) while prioritizing safety.

Availability
  • Gemini App Access: A chat-optimized version of Gemini 2.0 Flash is available on desktop and mobile web, with the Gemini mobile app coming soon.
  • Developer Access: Developers can access the experimental model through Google AI Studio and Vertex AI. Multimodal input and text output are available to all, with broader availability, including more model sizes, expected in January.
  • Multimodal Live API: A new API supports real-time audio, video streaming, and combined tool use for developers.
  • AI Overviews Expansion: Testing for complex tasks like advanced math and coding is underway, with a wider rollout expected in early 2025.

Speaking about Gemini 2.0, Google and Alphabet CEO Sundar Pichai said,

Gemini 2.0’s advancements are powered by our decade-long investments in a differentiated full-stack approach to AI innovation. Built on custom hardware like Trillium, our sixth-generation TPUs, Gemini 2.0 training and inference were entirely powered by TPUs, and Trillium is now generally available to customers for their own development.

While Gemini 1.0 focused on organizing and understanding information, Gemini 2.0 is designed to make that information far more useful. I’m excited to see where this next era takes us.


Related Post