Google’s Gemini 2.0: A New Era in Generative AI
What is Gemini 2.0?
Gemini 2.0 is a multimodal generative AI model running on Google’s Trillium hardware, designed to make online tasks easier and more intuitive by assisting with summarizing information, performing web searches, and interacting with tools or apps more naturally.
Agentic Experiences
Gemini 2.0 is built around "agentic experiences," which enable the model to "understand more about the world around you, think multiple steps ahead, and take action on your behalf, with your supervision." This is achieved through its multimodal processing capabilities, ability to understand long books or wide swaths of the web, function calling, and native tool use.
Key Features
- Multimodal processing: allows the model to process and generate content in various formats, such as text, images, and audio.
- Long-form content understanding: enables the model to comprehend long books or wide swaths of the web.
- Function calling: allows the model to use and interact with various tools and applications.
- Native tool use: enables the model to incorporate tools like Google Search and code execution to perform autonomous actions.
- Complex instruction following and planning: allows the model to execute complex tasks and plan accordingly.
Early Access and Public Rollout
The first model, Gemini 2.0 Flash, was released on December 11 for global developers through the Gemini API in Google AI Studio and Vertex AI. A public rollout is expected in early 2025.
Developers Can Access
- Multimodal input and text output
- Early access partners can test text-to-speech and native image generation
Additional Model Sizes
- Gemini 2.0, the base model, is expected to follow in January
Google Search and AI Overviews Impact
Limited testing is set to begin next week, with a public rollout expected in early 2025.
Gemini 2.0 Flash
- Twice as fast as its predecessor, 1.5 Pro
- Surpasses it in AI performance benchmarks such as MMLU-PRO and LiveCodeBench
What Sets Gemini 2.0 Apart
- Its agentic capabilities, which enable the model to "understand more about the world around you, think multiple steps ahead, and take action on your behalf, with your supervision"
Google’s Vision
"If Gemini 1.0 was about organizing and understanding information, Gemini 2.0 is about making it much more useful," said Google CEO Sundar Pichai
Additional Projects and Prototypes
- Project Mariner: an experimental Chrome extension showcasing Google’s effort to enable Gemini to read browser screens
- Deep Research: an experimental model connected to the web, designed to create research plans and outlines for grad students, scientists, or entrepreneurs
- Jules: a coding assistant powered by Gemini 2.0 Flash, available to a limited pool of testers today, with expanded availability expected in early 2025
Cyber Threats
Google is aware that Project Mariner, in particular, might be a rich hunting ground for prompt injection attacks and is working on putting up guardrails against phishing and fraud attempts.
Conclusion
Gemini 2.0 is a significant step forward in the development of generative AI, offering a range of capabilities that can revolutionize the way we interact with technology. Its agentic experiences, multimodal processing, and native tool use make it an exciting prospect for developers and users alike.
FAQs
- What is Gemini 2.0?
Gemini 2.0 is a multimodal generative AI model running on Google’s Trillium hardware. - What are the key features of Gemini 2.0?
Multimodal processing, long-form content understanding, function calling, native tool use, and complex instruction following and planning. - When is the public rollout of Gemini 2.0 expected?
Early 2025. - What are the early access features available?
Multimodal input and text output, text-to-speech, and native image generation. - What is Project Mariner?
An experimental Chrome extension showcasing Google’s effort to enable Gemini to read browser screens. - What is Deep Research?
An experimental model connected to the web, designed to create research plans and outlines for grad students, scientists, or entrepreneurs. - What is Jules?
A coding assistant powered by Gemini 2.0 Flash, available to a limited pool of testers today, with expanded availability expected in early 2025.