Google has introduced its latest model, Gemini 1.5, boasting significant advancements in performance and efficiency. This new version builds upon the success of Gemini 1.0 Ultra, which was rolled out last week.
Gemini 1.5 Performance Enhancements
Google claims that Gemini 1.5 represents a major leap forward, incorporating innovations across various aspects of model development and infrastructure.
The introduction of a new Mixture-of-Experts (MoE) architecture makes Gemini 1.5 more efficient to train and serve.
Gemini 1.5 Pro: Mid-Size Multimodal Model
The initial release of Gemini 1.5 includes Gemini 1.5 Pro, a mid-size multimodal model optimized for scalability across diverse tasks.
Despite its smaller size, Gemini 1.5 Pro delivers performance comparable to 1.0 Ultra and introduces breakthrough features for long-context understanding.
Expanded Context Window and Experimental Features
Gemini 1.5 Pro offers a standard 128,000 token context window, with a limited preview allowing developers and enterprise customers to experiment with up to 1 million tokens.
This expanded context window enables the model to process vast amounts of information, including videos, audio, codebases, and text.
Key Features of Gemini 1.5
Highly Efficient Architecture: Built upon Google’s leading research on Transformer and MoE architecture, Gemini 1.5 learns complex tasks more quickly and maintains quality while being more efficient to train and serve.
Greater Context and Helpful Capabilities: Gemini 1.5’s increased context window capacity allows for more consistent, relevant, and useful output by processing larger amounts of information in a given prompt.
Complex Reasoning and Understanding: Gemini 1.5 Pro can analyze, classify, and summarize large amounts of content within a given prompt, enabling it to understand and reason about complex topics.
Multimodal Understanding: The model can perform sophisticated understanding and reasoning tasks across different modalities, including video, enhancing its capabilities in analyzing diverse types of data.
Relevant Problem-Solving with Longer Code Blocks: Gemini 1.5 Pro excels at problem-solving tasks across longer blocks of code, providing helpful solutions, modifications, and explanations.
Enhanced Performance: Gemini 1.5 Pro outperforms its predecessor on a majority of benchmarks and maintains high performance even with an expanded context window.
Ethics and Safety Testing
Google emphasizes its commitment to ethical AI development, ensuring that Gemini models undergo extensive ethics and safety testing before release. They continuously refine the models to mitigate potential risks and ensure responsible deployment.
Availability and Pricing
A limited preview of Gemini 1.5 Pro is available to developers and enterprise customers via AI Studio and Vertex AI.
Google plans to introduce pricing tiers based on context window size, with early testers able to experiment with the 1 million token context window at no cost during the testing period.
Speaking about Gemini 1.5 Pro, Google and Alphabet CEO Sundar Pichai said:
Last week, we introduced our most capable model yet, Gemini 1.0 Ultra, marking a significant leap forward in enhancing the usefulness of Google products, starting with Gemini Advanced. Today, developers and Cloud customers can begin leveraging the power of 1.0 Ultra through our Gemini API in AI Studio and Vertex AI.
Our teams remain dedicated to advancing the boundaries of our latest models while prioritizing safety. They have made rapid progress, and we’re now poised to unveil the next generation: Gemini 1.5. This latest iteration demonstrates remarkable improvements across various dimensions, with 1.5 Pro achieving comparable quality to 1.0 Ultra while consuming fewer computational resources.
The potential unlocked by longer context windows is immense, offering the promise of entirely new capabilities and empowering developers to create significantly more useful models and applications. We are thrilled to offer a limited preview of this experimental feature to developers and enterprise customers.