GLM 5.2

Name: Z.ai: GLM 5.2 API
Brand: z-ai

z-ai/glm-5.2

NewFeatured

by Z.ai · text in · text out · 1M ctx · 2026-06-16

1M token context window for long-form text processing, accessed via OrcaRouter's API.

Endpoints:/v1/chat/completions

INPUT$1.40/ 1M tokens

OUTPUT$4.40/ 1M tokens

p50 TTFT5.60 s7d

p95 TTFT7.54 s7d

TRAFFIC8.0Mtokens / 7d

Get the GLM 5.2 API →▶ Try in playground </> Use via API

What is Z.ai: GLM 5.2?

Z.ai: GLM 5.2 is a text‑only large language model with a 1,000,000‑token context window and a maximum output of 128,000 tokens. It is developed by Z.ai and offered through OrcaRouter’s API. The model processes only text inputs, making it ideal for tasks that require reading and generating very long passages, such as full‑book analysis or comprehensive summarization of multi‑file codebases. Pricing follows the provider’s rate: $1.40 per million input tokens and $4.40 per million output tokens, with no markup by OrcaRouter.

Who is this model designed for?

Z.ai: GLM 5.2 targets users and organizations that need to handle extremely long text sequences in a single API call. Common roles include legal professionals analyzing entire contracts or discovery documents, researchers reviewing extensive literature, software engineers understanding large code repositories, and data scientists working with long log files. The generous context window reduces the need for manual chunking, while the high output limit supports generating detailed reports or code patches.

What are the key specifications?

Key specifications include a total context window of 1,000,000 tokens (both input and output combined), with a maximum output of 128,000 tokens. The model supports text input only; no multimodal capabilities are advertised. It is accessed through OrcaRouter’s OpenAI‑compatible API using the model ID “z-ai/glm-5.2” at base URL https://api.orcarouter.ai/v1. Pricing is per‑token: $1.40 per million input tokens and $4.40 per million output tokens, billed at Z.ai’s provider rate with zero markup.

Code samples

from openai import OpenAI

client = OpenAI(
    base_url="https://api.orcarouter.ai/v1",
    api_key="$ORCAROUTER_API_KEY",
)

response = client.chat.completions.create(
    model="z-ai/glm-5.2",
    messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)

Pricing

Input / 1M tokens	$1.40
Output / 1M tokens	$4.40
Cache read / 1M	$0.260
Currency	USD

Performance

last 7 days

p50 TTFT

5.60 s

Output speed

96.0 tok/s

p95 TTFT

7.54 s

Error rate

Public benchmarks

Last evaluated 2026-06-15

AIME 2026

99.2

CritPt

16.7

DeepSWE

46.2

FrontierSWE (Dominance)

74.4

GPQA-Diamond

91.2

HLE

40.5

HLE (w/ Tools)

54.7

HMMT Feb. 2026

92.5

HMMT Nov. 2025

94.4

IMOAnswerBench

91.0

MCP-Atlas (Public Set)

76.8

NL2Repo

48.9

PostTrainBench

34.3

ProgramBench

63.7

SWE-bench Pro

62.1

SWE-Marathon

13.0

Terminal Bench 2.1 (Best Reported)

82.7

Terminal Bench 2.1 (Terminus-2)

81.0

Tool-Decathlon

48.2

Source: artificialanalysis.ai

More from Z.ai

See all models from z-ai →

GLM 5.1Flagship

z-ai/glm-5.1

$1.40 in · $4.40 out / 1M

200K ctx· quality 9/10

GLM 5

z-ai/glm-5

$1.00 in · $3.20 out / 1M

200K ctx· quality 8/10

GLM 4.5Cheapest

z-ai/glm-4.5

$0.60 in · $2.20 out / 1M

128K ctx· quality 7/10

FAQ

What is the cost per token for GLM 5.2?

Input tokens cost $1.40 per million tokens, and output tokens cost $4.40 per million tokens. There is no markup by OrcaRouter; you pay Z.ai’s provider rate.

What is the model’s context window size?

The context window is 1,000,000 tokens (combined input and output). The maximum output is 128,000 tokens per request.

What are the model’s strengths?

Its main strength is the large context window (1M tokens) and high output limit (128k tokens), enabling processing of very long documents or conversations in a single call. It is text‑only.

How does GLM 5.2 compare to other models with smaller context windows?

It has a much larger context window, making it suitable for tasks that require reading entire books or large codebases. Smaller models are cheaper and faster for tasks that fit within their context limits.

Does OrcaRouter cache tokens or offer discounts?

No, OrcaRouter does not advertise token caching or volume discounts for this model. Pricing is per‑token at the provider’s rate with zero markup.

How do I call GLM 5.2 through OrcaRouter?

Use the OpenAI‑compatible API at base URL https://api.orcarouter.ai/v1, model ID “z-ai/glm-5.2”. Send a standard chat completion request with your API key.

What input modalities does the model support?

Z.ai: GLM 5.2 supports only text input. It cannot process images, audio, or other multimodalities.

Are there any known benchmark scores?

No benchmark scores for GLM 5.2 are provided in the available facts. Users should evaluate the model on their own datasets.

Can I stream the output?

Yes, set `stream: true` in your API call. The response will be sent as server‑sent events, identical to OpenAI’s streaming format.

What happens if I exceed the 1M token limit?

You will receive an error. Ensure the total number of tokens in your messages plus max_tokens does not exceed 1,000,000.

Embed this badge

Paste into your blog post

Z.ai: GLM 5.2•$1.40/M in•5596ms p50•via OrcaRouter

HTML <a href="https://www.orcarouter.ai/models/z-ai/glm-5.2" target="_blank"> <img src="https://www.orcarouter.ai/embed/z-ai/glm-5.2.svg" alt="Z.ai: GLM 5.2 on OrcaRouter" /> </a>

Markdown [![Z.ai: GLM 5.2](https://www.orcarouter.ai/embed/z-ai/glm-5.2.svg)](https://www.orcarouter.ai/models/z-ai/glm-5.2)

GLM 5.2

What is Z.ai: GLM 5.2?

Who is this model designed for?

What are the key specifications?

What core tasks can GLM 5.2 perform?

When should you use this model over a smaller one?

What are the model’s limitations?

How does the large context window affect performance?

What are the model’s reported benchmark scores?

How does latency compare to smaller models?

What are the model’s known strengths?

How is GLM 5.2 priced?

Are there any discounts or volume pricing?

How does the cost compare to smaller models?

How do I call GLM 5.2 via OrcaRouter?

What parameters are supported?

Can I stream responses?

How do I migrate from a different provider?

How does GLM 5.2 compare to other large context models?

When should I choose GLM 5.2 over a cheaper model?

What about models with smaller context but similar quality?