LLMMIT

GLM 4.7

by Z.ai (Zhipu)

Dec 22, 2025•200K context•$0.14/M input•$0.54/M output

GLM 4.7 scored 73.8% on SWE-bench Verified — highest among open-source models. 84.9% on LiveCodeBench, beating Claude. Preserved Thinking maintains reasoning across turns.

View on HuggingFace

Fleek Pricing

$0.0025/GPU-second

Context200K tokens

Estimated Token Cost

Input

$0.14/M

Output

$0.54/M

Based on 29,500 tokens/sec

vs CompetitorsSave 9%

Overview

Parameters

355B (MoE)

Architecture

Mixture of Experts

Context

200K

Provider

Z.ai (Zhipu)

Best For

Code generationMulti-step reasoningAgentic codingTool use

OpenAI Compatible

Drop-in replacement for OpenAI API. Just change the base URL.

Pay Per Second

Only pay for actual GPU compute time. No idle costs.

Enterprise Ready

99.9% uptime SLA, SOC 2 compliant, dedicated support.

Auto Scaling

Scales from zero to thousands of requests automatically.

Compare Pricing

	Fleek	Fireworks	Together	Baseten
Input	$0.14	$0.60	$0.45	$0.60
Output	$0.54	$2.20	$2.00	$2.20
Savings		70%	70%	70%

Prices are per million tokens. Fleek pricing based on $0.0025/GPU-second.

Calculate Your Savings

See how much you'd save running GLM 4.7 on Fleek

Model

GLM 4.7

Compare To

Monthly Usage: 100M tokens

Quick Select

Your Fleek Cost

$54-91/mo

21.6K-36.4K GPU-sec × $0.0025

Fireworks AI

$140/mo

Your Savings48%

Annual Savings

$810

See all models in full calculatororUpload your bill for custom analysis

Technical Specifications

Model Name	GLM 4.7
Total Parameters	355B (MoE)
Active Parameters	32B
Architecture	Mixture of Experts
Context Length	200K tokens
Inference Speed	29,500 tokens/sec
Provider	Z.ai (Zhipu)
Release Date	Dec 22, 2025
License	MIT
HuggingFace	https://huggingface.co/THUDM/GLM-4.7