OrcaRouter — One AI gateway: adaptive LLM routing & governance

AI gateway for production

Smart routing and automatic failover on every request.

Routing that's measurably more accurate.

Every prompt is embedded and routed by a model that keeps learning online from real traffic. On the public RouterArena leaderboard (Jun 2026) it leads on accuracy — ahead of GPT-5, Azure, Martian and NotDiamond — at 75.5%.

contextual embeddingsonline learning<1ms overheadRouterArena

* Based on RouterArena leaderboard data, June 2026.

A provider goes down. No one notices.

When a provider rate-limits or 5xxs, OrcaRouter retries the request against a healthy model across 200+ options before the response starts — so transient upstream outages don't surface to your users.

200+ modelsauto-failoverno 429

See and prove every call — cost, model, latency, and why.

See everything. Prove anything.

See exactly what every request cost, which model served it, how long it took, and why it failed — full structured logs you can filter, replay, and copy as a runnable cURL. A route is never a black box.

Per-request logsgrade · model · costcopy-as-cURL

Zero markup. Zero black boxes.

You pay each provider their exact price — we add $0 per token, ever. Every request shows the grade, chosen model, provider, latency, and price, so cost is glass-box, not an opaque blended rate.

$0 / tokenprovider costglass-box receipt

Versioned prompts and caching — without a redeploy.

Change prompts. Not code.

Version prompts behind named labels with A/B splits and one-click rollback. Move a label and every request picks it up instantly — no redeploy, no code change, no client update.

VersionedA/BInstant rollbackNo deploy

Pay once. Reuse for free.

Repeated and cached prompt tokens bill at the provider's cache rate — often a fraction of the input price — across 5-minute and 1-hour ephemeral windows. Same answers, less spend, with cached_tokens on every receipt.

cache_controlcached_tokens5m / 1h windows

Guardrails, budgets, and an agent firewall that enforces.

Guardrails that stop things.

PII Shield and content policies run before the upstream call is billed. A blocked request returns a clean 400 and is never charged — guardrails enforced inline, not logged after the fact.

PII Shieldenforced pre-billingclean 400

Safe for your team. And your agents.

Budgets and roles for people; a risk-scored firewall for agents. Every tool and MCP call is graded ALLOW, REVIEW, or BLOCK before it runs, and anomaly detection flags rate and cost spikes against learned hour-of-week baselines.

ALLOW · REVIEW · BLOCKMCP gatinganomaly detection

Built for the agent era. Before you needed it.

Setup

Live in 60 seconds.

One URL change. Your existing SDK, model names, and streaming all work exactly as before.

Step 1

🔗

Point your SDK at us

Set base_url to api.orcarouter.ai/v1 and swap your API key. No other code changes needed.

→

Step 2

⚡

We route, guard & observe

Every call is routed to the best model, checked against your guardrails, and metered — graded in under 1ms, with failover, caching and full logs built in.

→

Step 3

✓

You ship, on one endpoint

Traffic goes direct to each provider's first-party API at their published rate — we add $0 per token. One OpenAI-compatible endpoint for routing, observability and governance.

Every model. One price list.

200+ models with live, side-by-side pricing — what you'd pay the provider directly. We add $0 on top.

View all 200+ models →

Model	Routed to	Input /M	Output /M	Context	Quality
kimi/kimi-k2.7-codeNEW	Moonshot	$0.950	$4.00	262K	8.0
anthropic/claude-fable-5NEW	Anthropic Direct	$10.00	$50.00	1M	10.0
qwen/qwen3.7-plusNEW	Alibaba Cloud	$0.350	$1.42	1M	8.0
minimax/minimax-m3NEW	—	$0.300	$1.20	1M	9.0
anthropic/claude-opus-4.8NEW	Anthropic Direct	$5.00	$25.00	1M	10.0
google/gemini-3.5-flashNEW	Google Direct	$1.50	$9.00	1M	9.0
qwen/qwen3.7-maxNEW	Alibaba Cloud	$1.25	$3.75	1M	5.0
qwen/qwen3.7-max-2026-05-20NEW	Alibaba Cloud	$1.25	$3.75	1M	5.0
qwen/qwen3.6-flash	Alibaba Cloud	$0.250	$1.50	1M	7.0
qwen/qwen3.6-35b-a3b	Alibaba Cloud	$0.248	$1.48	262K	8.0
openai/gpt-5.5-pro	OpenAI Direct	$30.00	$180.00	—	10.0
openai/gpt-5.5	OpenAI Direct	$5.00	$30.00	—	10.0
deepseek/deepseek-v4-pro	DeepSeek	$0.435	$0.870	1M	9.0
deepseek/deepseek-v4-flash	DeepSeek	$0.140	$0.280	1M	8.0
anthropic/claude-opus-4.7	Anthropic Direct	$5.00	$25.00	1M	10.0
+ 194 more models · Prices update every 60 seconds

Everything your OpenAI client already calls.

Streaming, tool calls, structured outputs, vision, embeddings and audio — routed unchanged across every model.

Model	Streaming	Tools	Structured	Vision	Embeddings	Audio
anthropic/claude-opus-4.8	supported	supported	supported	supported	not supported	not supported
google/gemini-3.1-pro-preview	supported	supported	supported	supported	not supported	supported
grok/grok-4.3	supported	supported	supported	supported	not supported	not supported
anthropic/claude-fable-5	supported	supported	supported	supported	not supported	not supported
anthropic/claude-opus-4.7	supported	supported	supported	supported	not supported	not supported

Pricing

Routing is free.
Pay for features.

We never take a cut of your token spend. Our revenue comes from optional team features.

Zero markup guarantee

You pay providers directly at their published rates. We add nothing on top of token costs. Routing is free; the optional Team plan funds the platform.

$0.00routing fee

Hacker

Free

Forever. Zero markup on all tokens.

✓ Route — 200+ models, auto-failover

✓ Observe — basic dashboard

✓ Manage — prompt versioning

✓ 3 API keys · 0% token markup

Start free

Team

$499/mo

Still zero markup. Pay for features.

✓ Everything in Hacker

✓ Up to 10 team seats

✓ Compliance enforcement & reports

✓ Unlimited API keys

✓ Priority support

Get started →

Enterprise

Custom

SLA commitments + private deployment.

✓ Everything in Team

✓ Private / on-prem deployment

✓ 99.99% uptime SLA

✓ Dedicated infrastructure

✓ Dedicated support & custom pricing

One Gateway. Every Model. Route Smarter. Ship Safer. Spend Less.

Works with the tools you already use

Routing that's measurably more accurate.

A provider goes down. No one notices.

See everything. Prove anything.

Zero markup. Zero black boxes.

Change prompts. Not code.

Pay once. Reuse for free.

Guardrails that stop things.

Safe for your team. And your agents.

Live in 60 seconds.

Point your SDK at us

We route, guard & observe

You ship, on one endpoint

Every model. One price list.

Everything your OpenAI client already calls.

Routing is free.
Pay for features.

Hacker

Team

Enterprise

Independently audited. Continuously compliant.

Smarter, safer, cost-efficient.

Product

Resources

Legal

Connect

One Gateway. Every Model. Route Smarter. Ship Safer. Spend Less.

Works with the tools you already use

Routing that's measurably more accurate.

A provider goes down. No one notices.

See everything. Prove anything.

Zero markup. Zero black boxes.

Change prompts. Not code.

Pay once. Reuse for free.

Guardrails that stop things.

Safe for your team. And your agents.

Live in 60 seconds.

Point your SDK at us

We route, guard & observe

You ship, on one endpoint

Every model. One price list.

Everything your OpenAI client already calls.

Routing is free.Pay for features.

Hacker

Team

Enterprise

Independently audited. Continuously compliant.

Smarter, safer, cost-efficient.

Product

Resources

Legal

Connect

Routing is free.
Pay for features.