GLM 5.2

z-ai/glm-5.2
NewFeatured
by Z.ai · text in · text out · 1M ctx · 2026-06-16

1M token context window for long-form text processing, accessed via OrcaRouter's API.

INPUT$1.40/ 1M tokens
OUTPUT$4.40/ 1M tokens
p50 TTFT5.60 s7d
p95 TTFT7.54 s7d
TRAFFIC8.0Mtokens / 7d

What is Z.ai: GLM 5.2?

Z.ai: GLM 5.2 is a text‑only large language model with a 1,000,000‑token context window and a maximum output of 128,000 tokens. It is developed by Z.ai and offered through OrcaRouter’s API. The model processes only text inputs, making it ideal for tasks that require reading and generating very long passages, such as full‑book analysis or comprehensive summarization of multi‑file codebases. Pricing follows the provider’s rate: $1.40 per million input tokens and $4.40 per million output tokens, with no markup by OrcaRouter.

Who is this model designed for?

Z.ai: GLM 5.2 targets users and organizations that need to handle extremely long text sequences in a single API call. Common roles include legal professionals analyzing entire contracts or discovery documents, researchers reviewing extensive literature, software engineers understanding large code repositories, and data scientists working with long log files. The generous context window reduces the need for manual chunking, while the high output limit supports generating detailed reports or code patches.

What are the key specifications?

Key specifications include a total context window of 1,000,000 tokens (both input and output combined), with a maximum output of 128,000 tokens. The model supports text input only; no multimodal capabilities are advertised. It is accessed through OrcaRouter’s OpenAI‑compatible API using the model ID “z-ai/glm-5.2” at base URL https://api.orcarouter.ai/v1. Pricing is per‑token: $1.40 per million input tokens and $4.40 per million output tokens, billed at Z.ai’s provider rate with zero markup.

Code samples

from openai import OpenAI

client = OpenAI(
    base_url="https://api.orcarouter.ai/v1",
    api_key="$ORCAROUTER_API_KEY",
)

response = client.chat.completions.create(
    model="z-ai/glm-5.2",
    messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)

Pricing

Input / 1M tokens$1.40
Output / 1M tokens$4.40
Cache read / 1M$0.260
CurrencyUSD

Performance

p50 TTFT
5.60 s
Output speed
96.0 tok/s
p95 TTFT
7.54 s
Error rate
0%

Public benchmarks

AIME 2026
99.2
CritPt
16.7
DeepSWE
46.2
FrontierSWE (Dominance)
74.4
GPQA-Diamond
91.2
HLE
40.5
HLE (w/ Tools)
54.7
HMMT Feb. 2026
92.5
HMMT Nov. 2025
94.4
IMOAnswerBench
91.0
MCP-Atlas (Public Set)
76.8
NL2Repo
48.9
PostTrainBench
34.3
ProgramBench
63.7
SWE-bench Pro
62.1
SWE-Marathon
13.0
Terminal Bench 2.1 (Best Reported)
82.7
Terminal Bench 2.1 (Terminus-2)
81.0
Tool-Decathlon
48.2
Source: artificialanalysis.ai

FAQ

What is the cost per token for GLM 5.2?
Input tokens cost $1.40 per million tokens, and output tokens cost $4.40 per million tokens. There is no markup by OrcaRouter; you pay Z.ai’s provider rate.
What is the model’s context window size?
The context window is 1,000,000 tokens (combined input and output). The maximum output is 128,000 tokens per request.
What are the model’s strengths?
Its main strength is the large context window (1M tokens) and high output limit (128k tokens), enabling processing of very long documents or conversations in a single call. It is text‑only.
How does GLM 5.2 compare to other models with smaller context windows?
It has a much larger context window, making it suitable for tasks that require reading entire books or large codebases. Smaller models are cheaper and faster for tasks that fit within their context limits.
Does OrcaRouter cache tokens or offer discounts?
No, OrcaRouter does not advertise token caching or volume discounts for this model. Pricing is per‑token at the provider’s rate with zero markup.
How do I call GLM 5.2 through OrcaRouter?
Use the OpenAI‑compatible API at base URL https://api.orcarouter.ai/v1, model ID “z-ai/glm-5.2”. Send a standard chat completion request with your API key.
What input modalities does the model support?
Z.ai: GLM 5.2 supports only text input. It cannot process images, audio, or other multimodalities.
Are there any known benchmark scores?
No benchmark scores for GLM 5.2 are provided in the available facts. Users should evaluate the model on their own datasets.
Can I stream the output?
Yes, set `stream: true` in your API call. The response will be sent as server‑sent events, identical to OpenAI’s streaming format.
What happens if I exceed the 1M token limit?
You will receive an error. Ensure the total number of tokens in your messages plus max_tokens does not exceed 1,000,000.

Embed this badge

Z.ai: GLM 5.2$1.40/M in5596ms p50via OrcaRouter
HTML <a href="https://www.orcarouter.ai/models/z-ai/glm-5.2" target="_blank"> <img src="https://www.orcarouter.ai/embed/z-ai/glm-5.2.svg" alt="Z.ai: GLM 5.2 on OrcaRouter" /> </a>
Markdown [![Z.ai: GLM 5.2](https://www.orcarouter.ai/embed/z-ai/glm-5.2.svg)](https://www.orcarouter.ai/models/z-ai/glm-5.2)