LLMLlama

Llama 70B

by Meta

Sep 15, 2025•128K context•$0.05/M input•$0.21/M output

Llama 70B is Meta's refined 70B model. Improved instruction following and reduced toxicity. The workhorse for production deployments requiring reliability.

View on HuggingFace

Fleek Pricing

$0.0025/GPU-second

Context128K tokens

Estimated Token Cost

Input

$0.05/M

Output

$0.21/M

Based on 10,000 tokens/sec

vs CompetitorsSave 26%

Overview

Parameters

70B

Architecture

Dense Transformer

Context

128K

Provider

Best For

Production workloadsEnterprise deploymentContent moderationGeneral AI

OpenAI Compatible

Drop-in replacement for OpenAI API. Just change the base URL.

Pay Per Second

Only pay for actual GPU compute time. No idle costs.

Enterprise Ready

99.9% uptime SLA, SOC 2 compliant, dedicated support.

Auto Scaling

Scales from zero to thousands of requests automatically.

Compare Pricing

	Fleek	Fireworks	Together	Baseten
Input	$0.05	$0.90	$0.88	—
Output	$0.21	$0.90	$0.88	—
Savings		70%	70%	—

Prices are per million tokens. Fleek pricing based on $0.0025/GPU-second.

Calculate Your Savings

See how much you'd save running Llama 70B on Fleek

Model

Llama 70B

Compare To

Monthly Usage: 100M tokens

Quick Select

Your Fleek Cost

$21-31/mo

8.3K-12.5K GPU-sec × $0.0025

Fireworks AI

$90/mo

Your Savings70%

Annual Savings

$768

See all models in full calculatororUpload your bill for custom analysis

Technical Specifications

Model Name	Llama 70B
Total Parameters	70B
Active Parameters	N/A
Architecture	Dense Transformer
Context Length	128K tokens
Inference Speed	10,000 tokens/sec
Provider	Meta
Release Date	Sep 15, 2025
License	Llama Community
HuggingFace	https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct