Gemini 2.5 Flash
Google · Active · Updated May 18, 2026
Google's fastest and most cost-efficient model with upgraded reasoning, supporting high-volume production workloads.
Input Price
$0.30/M
per million tokens
Output Price
$2.50/M
per million tokens
Context Window
1,048,576
tokens
Max Output
8,192
tokens
Technical Specifications
| Provider | |
| Release Date | March 1, 2026 |
| Pricing Type | per token |
| Input Price | $0.3.00 / 1M tokens |
| Output Price | $2.5.00 / 1M tokens |
| Cached Input | $0.03 / 1M tokens |
| Context Window | 1,048,576 tokens |
| Max Output | 8,192 tokens |
| Input Modalities | text, image, audio |
| Output Modalities | text |
| Status | active |
| Availability | api |
| Latency | very fast |
| Rate Limit | 30,000 RPM |
| Pricing URL | View official pricing → |
| Docs URL | View documentation → |
Capability Scores
Coding76
Reasoning74
Math75
Image66
Speed96
Overview
Gemini 2.5 Flash builds on Google's speed-optimized model line with significantly improved reasoning and coding capabilities. It retains the massive 1M token context window while offering better benchmark scores across the board. At just $0.15 per million input tokens, it remains one of the most cost-effective models for high-throughput applications that need to process very long contexts.
Pros
- +Fastest inference among long-context models (speed: 96/100)
- +1M context window at a budget-friendly price
- +Significantly improved reasoning over previous generation
Cons
- −Moderate coding and reasoning performance
- −Text-only output — no audio or image generation
- −Not suitable for complex multi-step agentic tasks
Compare with Alternatives
Use Cases
Real-time content moderation at scale
High-volume data extraction with long-context understanding
Cost-sensitive QA systems with extensive reference documents