Qwen3 Max

Name: Qwen: Qwen3 Max API
Brand: Qwen

qwen/qwen3-max

ToolsJSONReasoning

by Qwen · 2025-09-23

Qwen3 Max — proprietary flagship chat model, 256k context, thinking mode + function calling.

Endpoints:/v1/chat/completions

ctx262.1K tokens

Max output65.5K

Inputtext

Outputtext

p50 TTFT1.92 s

from openai import OpenAI

client = OpenAI(
    base_url="https://api.orcarouter.ai/v1",
    api_key="$ORCAROUTER_API_KEY",
)

INPUT$0.36/ 1M tokens

OUTPUT$1.43/ 1M tokens

p50 TTFT1.92 s7d

p95 TTFT10.00 s7d

TRAFFIC768.3Ktokens / 7d

Get the Qwen3 Max API →▶ Try in playground </> Use via API

What is Qwen3 Max?

Qwen3 Max is a Mixture-of-Experts (MoE) language model from Alibaba's Qwen team. It is designed for high-capacity tasks that require extended context and deep reasoning. The model accepts text-only inputs and supports a context window of 262,144 tokens, allowing it to process long documents, books, or multi-turn conversations in one pass. Maximum output is 65,536 tokens. Access to Qwen3 Max is provided through OrcaRouter's OpenAI-compatible API at base URL https://api.orcarouter.ai/v1. The model identifier is 'qwen/qwen3-max'. OrcaRouter handles hosting, inference, and billing, offering a drop-in replacement for OpenAI clients. There is no proprietary data collection; all inputs and outputs are processed solely for inference.

Who should use Qwen3 Max?

Qwen3 Max is suited for developers and researchers who need a large, capable model for tasks that demand both broad knowledge and long-range attention. It is a strong choice for applications like legal document analysis, academic research summarization, code review of massive repositories, and complex problem solving that requires chaining many reasoning steps. Because of its MoE architecture, it can achieve high performance while maintaining reasonable inference costs per token compared to dense models of equivalent capability. OrcaRouter makes it straightforward to call Qwen3 Max using any OpenAI SDK by changing the base URL and model name. Potential users should evaluate whether the additional capability over smaller or cheaper models justifies the cost for their specific workload.

What is OrcaRouter's role?

OrcaRouter is the platform that provides API access to Qwen3 Max. It exposes an OpenAI-compatible endpoint at https://api.orcarouter.ai/v1, meaning any tool or library built for OpenAI's chat completions can switch to Qwen3 Max simply by updating the base URL and model ID. OrcaRouter manages the underlying infrastructure, load balancing, and scaling. It does not train on user data, and all requests are processed in real-time. The platform supports streamed responses, function calling, and standard parameters. Billing is usage-based; actual per-token rates are published on OrcaRouter's pricing page. Users can also combine multiple models in a single application via OrcaRouter's routing features.

What input modalities are supported?

Qwen3 Max accepts text-only inputs. It does not natively process images, audio, or other modalities. The model operates on tokenized text, with a maximum context of 262,144 tokens and a maximum generation of 65,536 tokens. This makes it ideal for tasks where the input is entirely textual, such as long-form documents, code, conversation histories, or structured data like JSON or CSV. If multimodal input is required, other models in the Qwen family (such as Qwen2-VL) or alternative providers on OrcaRouter may be more appropriate. However, for text-only tasks that demand extremely long context windows, Qwen3 Max is one of the most capable options available.

Code samples

Call from any SDK

OpenAI-compatible — keep the SDK you already use

OpenAI SDKhttps://api.orcarouter.ai/v1

from openai import OpenAI

client = OpenAI(
    base_url="https://api.orcarouter.ai/v1",
    api_key="$ORCAROUTER_API_KEY",
)

response = client.chat.completions.create(
    model="qwen/qwen3-max",
    messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)

Supported parameters

enable_search
enable_thinking
include_reasoning
logprobs
max_tokens
n
parallel_tool_calls
presence_penalty
reasoning
repetition_penalty
response_format
seed
stop
stream
stream_options
temperature
thinking_budget
tool_choice
tools
top_k
top_logprobs
top_p

Pricing

Tier	Input / 1M tokens	Output / 1M tokens
≤ 32K	$0.359	$1.434
≤ 128K	$0.574	$2.294
≤ 256K	$1.004	$4.014
Tier selected by input token count of each request

Cost calculator

Tokens / month10MM

70%

Estimate based on list price

Tiered pricing — this estimate uses base-tier rates.

Token & cost estimator

Expected output tokens

Input tokens: 20Cost per request: $0.000724

Estimate only — actual token counts depend on the provider's tokenizer.

Performance

last 7 days

p50 TTFT

1.92 s

Output speed

86.8 tok/s

p95 TTFT

10.00 s

Error rate

Public benchmarks

Last evaluated 2025-09-23

26.4

AA Coding

Better than 29% of models compared

#75 of 106

31.4

AA Intelligence

Better than 32% of models compared

#75 of 110

80.7

AA Math

Better than 70% of models compared

#23 of 81

AIME 2025

80.7

GPQA Diamond

76.4

Humanity's Last Exam

11.1

IFBench

44.1

LiveCodeBench

76.7

Long-Context Recall

46.7

MMLU-Pro

84.1

SciCode

38.3

TerminalBench Hard

20.5

τ²-Bench

74.3

Source: artificialanalysis.ai

How it compares

	Qwen3 Max	qwen/qwen3-max-preview	Qwen3.5 397B A17B	qwen/qwen3.5-plus
Input $/M	$0.36	$0.86	$0.17	$0.12
Output $/M	$1.43	$3.44	$1.03	$0.69
Context	262K	262K	33K	1.0M
Quality	7/10	8/10	8/10	8/10
Compare side-by-side		Compare side-by-side	Compare side-by-side	Compare side-by-side

More from Qwen

See all models from qwen →

Qwen3.6 35B A3BCheapest

qwen/qwen3.6-35b-a3b

$0.25 in · $1.49 out / 1M

262.1K ctx· quality 8/10

Compare side-by-side

Qwen3.6 Plus

qwen/qwen3.6-plus

$0.50 in · $3.00 out / 1M

1.05M ctx· quality 8/10

Compare side-by-side

Qwen3.7 Plus

qwen/qwen3.7-plus

$0.35 in · $1.42 out / 1M

1M ctx· quality 8/10

Compare side-by-side

FAQ

What is the cost per token for Qwen3 Max on OrcaRouter?

Per-token pricing for Qwen3 Max is published on OrcaRouter's pricing page and varies by usage tier. There is no flat fee; you pay only for input and output tokens. Because Qwen3 Max is a large MoE model, its per-token cost is higher than smaller models but competitive with other top-tier models. Check OrcaRouter's website for current rates.

What is the context window size of Qwen3 Max?

Qwen3 Max supports a context window of 262,144 tokens, meaning it can accept up to that many tokens as input (including system and user messages). The maximum output is 65,536 tokens per request.

What are the main strengths of Qwen3 Max?

Its main strengths are a very large context window (262k tokens), high output limit (65k tokens), strong performance on professional reasoning benchmarks (MMLU-Pro 84.1), and an MoE architecture that balances capability with inference efficiency. It is ideal for long-document analysis and complex multi-step reasoning.

How does Qwen3 Max compare to GPT-4 or Claude 3?

Comparisons depend on the specific variant. Qwen3 Max has a larger context window than most GPT-4 versions (128k for GPT-4 Turbo) and higher output limit than Claude 3 Opus (4k). Benchmark scores on MMLU-Pro are comparable to top-tier models, but actual performance varies by task. OrcaRouter offers multiple models; you can test side-by-side.

Does OrcaRouter train on my data when I use Qwen3 Max?

No. OrcaRouter does not use customer data for training or improving models. All inputs and outputs are processed solely for inference and logged for billing and operational purposes. Data is not shared with third parties beyond the necessary infrastructure. See OrcaRouter's privacy policy for details.

How do I call Qwen3 Max using an OpenAI-compatible API?

Set your base URL to https://api.orcarouter.ai/v1, provide your OrcaRouter API key, and use the model ID 'qwen/qwen3-max'. All OpenAI SDKs and direct HTTP clients work without modification. Example: client = OpenAI(base_url='https://api.orcarouter.ai/v1', api_key='...') then client.chat.completions.create(model='qwen/qwen3-max', messages=[...]).

What are the supported input and output modalities?

Qwen3 Max accepts text-only input and generates text-only output. It does not process images, audio, or video. For multimodal tasks, consider models like Qwen2-VL available on OrcaRouter.

Can I use function calling with Qwen3 Max?

Yes, Qwen3 Max supports the OpenAI-compatible function calling and tool use format. You can define functions in the 'tools' parameter, and the model can request to call them. This works through OrcaRouter's API without any extra configuration.

Is there a rate limit for Qwen3 Max on OrcaRouter?

OrcaRouter applies rate limits to ensure fair usage. These limits are typically based on tokens per minute and requests per minute. Exact limits depend on your plan. Check OrcaRouter's documentation or your dashboard for specific rates.

What are the limitations of Qwen3 Max?

Like all LLMs, it may hallucinate or produce incorrect information, especially on obscure topics. It has a training cutoff (not publicly disclosed), so it cannot access real-time events without context. The large context can lead to 'lost in the middle' effects. It is text-only and not designed for real-time applications without streaming.

Embed this badge

Paste into your blog post

Qwen: Qwen3 Max•$0.36/M in•1916ms p50•via OrcaRouter

HTML <a href="https://www.orcarouter.ai/models/qwen/qwen3-max" target="_blank"> <img src="https://www.orcarouter.ai/embed/qwen/qwen3-max.svg" alt="Qwen: Qwen3 Max on OrcaRouter" /> </a>

Markdown [![Qwen: Qwen3 Max](https://www.orcarouter.ai/embed/qwen/qwen3-max.svg)](https://www.orcarouter.ai/models/qwen/qwen3-max)

Model card as data

GET /api/public/models/qwen/qwen3-maxOpen

Machine-readable:/llms.txt /llms-full.txt

Qwen3 Max

What is Qwen3 Max?

Who should use Qwen3 Max?

What is OrcaRouter's role?

What input modalities are supported?

What are the key strengths of Qwen3 Max?

When should you consider a cheaper model?

How does Qwen3 Max handle long contexts?

What types of tasks does Qwen3 Max struggle with?

What does the MMLU-Pro score of 84.1 mean?

What are the latency characteristics?

What are the model's strengths and weaknesses on benchmarks?

How does the context window affect performance?

How is Qwen3 Max priced on OrcaRouter?

Are there cost-saving strategies?

What about data handling and privacy?

How do you call Qwen3 Max via OrcaRouter's API?

What parameters are available for Qwen3 Max?

How can you migrate an existing OpenAI application to Qwen3 Max?

How does Qwen3 Max compare to other large MoE models?

How does Qwen3 Max compare to Qwen3-8B?

Code samples

Call from any SDK

Supported parameters

Pricing

Cost calculator

Token & cost estimator

Performance

Public benchmarks

How it compares

More from Qwen

FAQ

Embed this badge

Model card as data