by Z.ai (Zhipu)
GLM 4.7 scored 73.8% on SWE-bench Verified — highest among open-source models. 84.9% on LiveCodeBench, beating Claude. Preserved Thinking maintains reasoning across turns.
Parameters
355B (MoE)
Architecture
Mixture of Experts
Context
200K
Provider
Z.ai (Zhipu)
Drop-in replacement for OpenAI API. Just change the base URL.
Only pay for actual GPU compute time. No idle costs.
99.9% uptime SLA, SOC 2 compliant, dedicated support.
Scales from zero to thousands of requests automatically.
| Fleek | Fireworks | Together | Baseten | |
|---|---|---|---|---|
| Input | $0.14 | $0.60 | $0.45 | $0.60 |
| Output | $0.54 | $2.20 | $2.00 | $2.20 |
| Savings | 70% | 70% | 70% |
Prices are per million tokens. Fleek pricing based on $0.0025/GPU-second.
See how much you'd save running GLM 4.7 on Fleek
| Model Name | GLM 4.7 |
| Total Parameters | 355B (MoE) |
| Active Parameters | 32B |
| Architecture | Mixture of Experts |
| Context Length | 200K tokens |
| Inference Speed | 29,500 tokens/sec |
| Provider | Z.ai (Zhipu) |
| Release Date | Dec 22, 2025 |
| License | MIT |
| HuggingFace | https://huggingface.co/THUDM/GLM-4.7 |
Software Engineering benchmark - real GitHub issues
Real-time coding benchmark
OpenAI code generation benchmark
Click any benchmark to view the official leaderboard. Rankings among open-source models.