qwen/qwen3-max-preview

Name: qwen/qwen3-max-preview API
Brand: qwen

ToolsJSONReasoning

by qwen

Qwen3 Max preview — proprietary chat preview, 256k context, thinking mode + function calling.

Endpoints:/v1/chat/completions

ctx262.1K tokens

Max output65.5K

Inputtext

Outputtext

p50 TTFT3.67 s

from openai import OpenAI

client = OpenAI(
    base_url="https://api.orcarouter.ai/v1",
    api_key="$ORCAROUTER_API_KEY",
)

INPUT$0.86/ 1M tokens

OUTPUT$3.44/ 1M tokens

p50 TTFT3.67 s7d

p95 TTFT10.00 s7d

TRAFFIC420.7Ktokens / 7d

Get the qwen/qwen3-max-preview API →▶ Try in playground </> Use via API

What is Qwen3-Max-Preview?

Qwen3-Max-Preview is a text-only large language model from the Qwen family, developed by Alibaba Cloud's Qwen team. It is currently available in preview status, meaning it provides early access to new capabilities before a stable release. The model processes only text inputs and produces text outputs, with a context window of 262,144 tokens and a maximum output length of 65,536 tokens. It scored 83.8 on the MMLU-Pro benchmark, which evaluates knowledge and reasoning across 57 subjects. As a preview model, its performance and behavior may evolve over time.

Who should use this model?

Developers and researchers who need to handle very long documents or complex multi-step reasoning tasks will find Qwen3-Max-Preview well-suited. Its large context window makes it ideal for analyzing entire books, legal contracts, lengthy codebases, or extended conversation histories without needing manual chunking. The model is also appropriate for tasks where deep knowledge retrieval and logical deduction are critical, such as advanced question answering, scientific literature review, and detailed report generation. It is a good choice for prototyping and evaluation before committing to a stable model.

What makes it distinct?

Qwen3-Max-Preview stands out for its combination of a very large context window (262,144 tokens) and a high output limit (65,536 tokens). The 83.8 MMLU-Pro score indicates strong general knowledge and reasoning capabilities. As a preview release, it represents the latest algorithmic improvements from the Qwen team, though these may not yet be fully stable. The model is text-only, unlike some multimodal models, making it a focused solution for text-intensive applications. Access via OrcaRouter's OpenAI-compatible API allows easy integration with existing tools.

Code samples

Call from any SDK

OpenAI-compatible — keep the SDK you already use

OpenAI SDKhttps://api.orcarouter.ai/v1

from openai import OpenAI

client = OpenAI(
    base_url="https://api.orcarouter.ai/v1",
    api_key="$ORCAROUTER_API_KEY",
)

response = client.chat.completions.create(
    model="qwen/qwen3-max-preview",
    messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)

Supported parameters

enable_search
enable_thinking
include_reasoning
logprobs
max_tokens
n
parallel_tool_calls
presence_penalty
reasoning
repetition_penalty
response_format
seed
stop
stream
stream_options
temperature
thinking_budget
tool_choice
tools
top_k
top_logprobs
top_p

Pricing

Tier	Input / 1M tokens	Output / 1M tokens
≤ 32K	$0.861	$3.441
≤ 128K	$1.434	$5.735
≤ 256K	$2.151	$8.602
Tier selected by input token count of each request

Cost calculator

Tokens / month10MM

70%

Estimate based on list price

Tiered pricing — this estimate uses base-tier rates.

Token & cost estimator

Expected output tokens

Input tokens: 20Cost per request: $0.001738

Estimate only — actual token counts depend on the provider's tokenizer.

Performance

last 7 days

p50 TTFT

3.67 s

Output speed

69.5 tok/s

p95 TTFT

10.00 s

Error rate

Public benchmarks

Last evaluated 2025-09-05

25.5

AA Coding

Better than 27% of models compared

#77 of 106

26.1

AA Intelligence

Better than 24% of models compared

#84 of 110

75.0

AA Math

Better than 69% of models compared

#25 of 81

AIME 2025

75.0

GPQA Diamond

76.4

Humanity's Last Exam

9.3

IFBench

48.0

LiveCodeBench

65.1

Long-Context Recall

39.7

MMLU-Pro

83.8

SciCode

37.0

TerminalBench Hard

19.7

τ²-Bench

32.7

Source: artificialanalysis.ai

How it compares

	qwen/qwen3-max-preview	Qwen3.5 397B A17B	qwen/qwen3.5-plus	Qwen3.6 35B A3B
Input $/M	$0.86	$0.17	$0.12	$0.25
Output $/M	$3.44	$1.03	$0.69	$1.49
Context	262K	33K	1.0M	262K
Quality	8/10	8/10	8/10	8/10
Compare side-by-side		Compare side-by-side	Compare side-by-side	Compare side-by-side

More from qwen

See all models from qwen →

Qwen3.6 Plus

qwen/qwen3.6-plus

$0.50 in · $3.00 out / 1M

1.05M ctx· quality 8/10

Compare side-by-side

Qwen3.7 PlusCheapest

qwen/qwen3.7-plus

$0.35 in · $1.42 out / 1M

$0.36 in · $1.43 out / 1M

262.1K ctx· quality 7/10

Compare side-by-side

FAQ

What is the cost to use qwen/qwen3-max-preview on OrcaRouter?

Specific per-token pricing for this model is not provided in the available facts. Please refer to OrcaRouter's pricing page or contact their sales team for current rates.

What is the context window size of Qwen3-Max-Preview?

The model supports a context window of 262,144 tokens.

What is the maximum output length?

The model can generate up to 65,536 tokens in a single response.

What are the model's main strengths?

It combines an extremely large context window (262K tokens) with a high MMLU-Pro score (83.8) and a large output limit, making it strong for long-document analysis and complex reasoning.

How does this model compare to other Qwen models?

Qwen3-Max-Preview is a preview with a larger context window than previous Qwen2.5 models. It represents the latest improvements but may be less stable.

Does the model support image or audio inputs?

No, it is text-only. It accepts only text input and produces text output.

How do I call this model using an OpenAI-compatible API?

Use OrcaRouter's API at https://api.orcarouter.ai/v1 with model id 'qwen/qwen3-max-preview'. The API supports standard OpenAI chat completion parameters.

What data handling policies apply?

Data handling follows OrcaRouter's terms of service and privacy policy. The facts provided do not detail specific data retention or processing practices; consult OrcaRouter's documentation.

Is this model suitable for production use?

It is a preview release, which may be less stable than a production version. Evaluate it on your specific workload before deploying in production.

What benchmarks are available?

Only the MMLU-Pro score of 83.8 is provided. No other benchmark results are part of the available facts.

Embed this badge

Paste into your blog post

qwen/qwen3-max-preview•$0.86/M in•3666ms p50•via OrcaRouter

HTML <a href="https://www.orcarouter.ai/models/qwen/qwen3-max-preview" target="_blank"> <img src="https://www.orcarouter.ai/embed/qwen/qwen3-max-preview.svg" alt="qwen/qwen3-max-preview on OrcaRouter" /> </a>

Markdown [![qwen/qwen3-max-preview](https://www.orcarouter.ai/embed/qwen/qwen3-max-preview.svg)](https://www.orcarouter.ai/models/qwen/qwen3-max-preview)

Model card as data

GET /api/public/models/qwen/qwen3-max-previewOpen

Machine-readable:/llms.txt /llms-full.txt

qwen/qwen3-max-preview

What is Qwen3-Max-Preview?

Who should use this model?

What makes it distinct?

What tasks is it optimized for?

How does the large context window help?

When should you choose a cheaper model?

What are its input and output modalities?

What does the MMLU-Pro score of 83.8 indicate?

How does it compare on reasoning tasks?

What are its strengths and limitations?

What about speed and latency?

How is pricing structured?

What are the cost trade-offs?

Does caching affect cost?

How to estimate cost for a project?

How do I call Qwen3-Max-Preview through OrcaRouter?

What parameters are available?

How to migrate from another provider?

Is it compatible with the OpenAI SDK?

How does it compare to other Qwen models?

How does it compare to GPT-4o?

How does it compare to Claude 3.5 Sonnet?

How does it compare to Llama 3.1 405B?

Code samples

Call from any SDK

Supported parameters

Pricing

Cost calculator

Token & cost estimator

Performance

Public benchmarks

How it compares

More from qwen

FAQ

Embed this badge

Model card as data