by Meta
Llama 70B is Meta's refined 70B model. Improved instruction following and reduced toxicity. The workhorse for production deployments requiring reliability.
Parameters
70B
Architecture
Dense Transformer
Context
128K
Provider
Meta
Drop-in replacement for OpenAI API. Just change the base URL.
Only pay for actual GPU compute time. No idle costs.
99.9% uptime SLA, SOC 2 compliant, dedicated support.
Scales from zero to thousands of requests automatically.
| Fleek | Fireworks | Together | Baseten | |
|---|---|---|---|---|
| Input | $0.05 | $0.90 | $0.88 | — |
| Output | $0.21 | $0.90 | $0.88 | — |
| Savings | 70% | 70% | — |
Prices are per million tokens. Fleek pricing based on $0.0025/GPU-second.
See how much you'd save running Llama 70B on Fleek
| Model Name | Llama 70B |
| Total Parameters | 70B |
| Active Parameters | N/A |
| Architecture | Dense Transformer |
| Context Length | 128K tokens |
| Inference Speed | 10,000 tokens/sec |
| Provider | Meta |
| Release Date | Sep 15, 2025 |
| License | Llama Community |
| HuggingFace | https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct |
Massive Multitask Language Understanding
OpenAI code generation benchmark
Mathematical problem solving
Click any benchmark to view the official leaderboard. Rankings among open-source models.
Join the waitlist for early access. Start free with $5 in credits.