Llama 4 Scout
Meta · Active · Updated May 18, 2026
Meta's efficient open-weight model optimized for fast inference, easy deployment, and high-throughput production workloads.
Input Price
$0.30/M
per million tokens
Output Price
$1.00/M
per million tokens
Context Window
262,144
tokens
Max Output
16,384
tokens
Technical Specifications
| Provider | Meta |
| Release Date | April 1, 2026 |
| Pricing Type | per token |
| Input Price | $0.3.00 / 1M tokens |
| Output Price | $1.00 / 1M tokens |
| Cached Input | — |
| Context Window | 262,144 tokens |
| Max Output | 16,384 tokens |
| Input Modalities | text, image |
| Output Modalities | text |
| Status | active |
| Availability | api, enterprise |
| Latency | very fast |
| Rate Limit | 5,000 RPM |
| Pricing URL | View official pricing → |
| Docs URL | View documentation → |
Capability Scores
Coding74
Reasoning70
Math68
Image62
Speed92
Overview
Llama 4 Scout is Meta's efficiency-optimized open-weight model, designed for high-throughput production environments where speed and cost efficiency are paramount. Despite its smaller size, Scout offers a 256K context window and strong enough reasoning capabilities for most real-world applications. Like all Llama models, it is fully open for self-hosting and customization.
Pros
- +Fully open-weight with permissive licensing
- +Fast inference (speed: 92/100) for production workloads
- +256K context window at a budget-friendly price
- +Easy to deploy on consumer-grade hardware
Cons
- −Lower benchmark scores than Maverick variant
- −Not suitable for complex reasoning or coding tasks
- −No audio or image output capabilities
Use Cases
High-volume content generation and classification
Self-hosted chat applications and customer service
Fine-tuned domain-specific deployments