Gemini 3.5 Flash
Google · Active · Updated May 22, 2026
Google's next-generation fast model, combining lightning inference with competitive reasoning and a 1M context window.
Input Price
$1.50/M
per million tokens
Output Price
$9.00/M
per million tokens
Context Window
1,048,576
tokens
Max Output
16,384
tokens
Technical Specifications
| Provider | |
| Release Date | May 15, 2026 |
| Pricing Type | per token |
| Input Price | $1.5.00 / 1M tokens |
| Output Price | $9.00 / 1M tokens |
| Cached Input | $0.15 / 1M tokens |
| Context Window | 1,048,576 tokens |
| Max Output | 16,384 tokens |
| Input Modalities | text, image, audio |
| Output Modalities | text |
| Status | active |
| Availability | api, web_app |
| Latency | very fast |
| Rate Limit | 30,000 RPM |
| Pricing URL | View official pricing → |
| Docs URL | View documentation → |
Capability Scores
Coding80
Reasoning78
Math77
Image70
Speed97
Overview
Gemini 3.5 Flash is Google's latest speed-optimized model, delivering the fastest inference available while maintaining a massive 1M token context window. It achieves significantly better reasoning and coding scores than its predecessor Gemini 2.5 Flash, narrowing the gap with frontier models while keeping pricing extremely competitive. For high-throughput applications requiring both speed and quality, Gemini 3.5 Flash is an exceptional choice.
Pros
- +Fastest inference among all models (speed: 97/100)
- +1M context window at a budget-friendly price
- +Near-frontier reasoning at a fraction of the cost
Cons
- −Moderate coding performance compared to frontier models
- −Text-only output — no audio or image generation
- −Not designed for complex multi-step agentic tasks
Use Cases
Real-time content generation and moderation
High-volume data processing with long-context understanding
Cost-effective customer service and chat applications