One gateway · every model · all your AI traffic

One Gateway. Every Model. Route Smarter. Ship Safer. Spend Less.

OrcaRouter grades every prompt and routes it intelligently. Frontier-quality AI at up to 40% lower cost. Adaptive routing, load balancing, guardrails, agent firewall, observability, and governance — all through a single OpenAI-compatible endpoint.

No credit card · live in 60 secs

Beats GPT-5 & Azure on RouterArenaBacked by published research
- client = OpenAI(api_key="sk-...")
+ client = OpenAI(
+ base_url="https://api.orcarouter.ai/v1",
+ api_key="sk-orca-..."
+ )
 
# Everything else stays the same.
response = client.chat.completions.create(
model="orcarouter/auto", # router picks the best model per request
messages=[{"role": "user", "content": "..."}]
)
# → orcarouter/auto grades the prompt → frontier or open-source, zero token markup ✓

One line. We grade each prompt, route to frontier or OSS, and add $0.

Anthropic: Claude Opus 4.8$5.00 in·$25.00 outAnthropic Direct
Google: Gemini 3.1 Pro Preview$2.00 in·$12.00 outGoogle Direct
grok/grok-4.3$1.25 in·$2.50 out
Anthropic: Claude Fable 5$10.00 in·$50.00 outAnthropic Direct
Anthropic: Claude Opus 4.7$5.00 in·$25.00 outAnthropic Direct
Anthropic: Claude Opus 4.8$5.00 in·$25.00 outAnthropic Direct
Google: Gemini 3.1 Pro Preview$2.00 in·$12.00 outGoogle Direct
grok/grok-4.3$1.25 in·$2.50 out
Anthropic: Claude Fable 5$10.00 in·$50.00 outAnthropic Direct
Anthropic: Claude Opus 4.7$5.00 in·$25.00 outAnthropic Direct
200+
models, one endpoint
0%
token markup, ever
75.5%
routing accuracy
<50ms
mid-stream failover
Building on this? Talk to us.
Feedback shapes the next ship.
Integrations

Works with the tools you already use

Drop-in OpenAI-compatible, or connect agents over the OrcaRouter MCP server — keep your SDK, framework and editor.

OrcaRouter MCP serverOpenAI SDKGoogle GenAI SDKAnthropic SDKLangChainLlamaIndexVercel AI SDKCamelAIDifyCursorOpenCodePromptfooOpenClawOpenHumanGitHubcURLand more
AI gateway for production

Smart routing and automatic failover on every request.

Routing that's measurably more accurate.

Every prompt is embedded and routed by a model that keeps learning online from real traffic. On the public RouterArena leaderboard (Jun 2026) it leads on accuracy — ahead of GPT-5, Azure, Martian and NotDiamond — at 75.5%.

contextual embeddingsonline learning<1ms overheadRouterArena
* Based on RouterArena leaderboard data, June 2026.

A provider goes down. No one notices.

When a provider rate-limits or 5xxs, OrcaRouter retries the request against a healthy model across 200+ options before the response starts — so transient upstream outages don't surface to your users.

200+ modelsauto-failoverno 429

See and prove every call — cost, model, latency, and why.

See everything. Prove anything.

See exactly what every request cost, which model served it, how long it took, and why it failed — full structured logs you can filter, replay, and copy as a runnable cURL. A route is never a black box.

Per-request logsgrade · model · costcopy-as-cURL

Zero markup. Zero black boxes.

You pay each provider their exact price — we add $0 per token, ever. Every request shows the grade, chosen model, provider, latency, and price, so cost is glass-box, not an opaque blended rate.

$0 / tokenprovider costglass-box receipt

Versioned prompts and caching — without a redeploy.

Change prompts. Not code.

Version prompts behind named labels with A/B splits and one-click rollback. Move a label and every request picks it up instantly — no redeploy, no code change, no client update.

VersionedA/BInstant rollbackNo deploy

Pay once. Reuse for free.

Repeated and cached prompt tokens bill at the provider's cache rate — often a fraction of the input price — across 5-minute and 1-hour ephemeral windows. Same answers, less spend, with cached_tokens on every receipt.

cache_controlcached_tokens5m / 1h windows

Guardrails, budgets, and an agent firewall that enforces.

Guardrails that stop things.

PII Shield and content policies run before the upstream call is billed. A blocked request returns a clean 400 and is never charged — guardrails enforced inline, not logged after the fact.

PII Shieldenforced pre-billingclean 400

Safe for your team. And your agents.

Budgets and roles for people; a risk-scored firewall for agents. Every tool and MCP call is graded ALLOW, REVIEW, or BLOCK before it runs, and anomaly detection flags rate and cost spikes against learned hour-of-week baselines.

ALLOW · REVIEW · BLOCKMCP gatinganomaly detection
Built for the agent era. Before you needed it.

Setup

Live in 60 seconds.

One URL change. Your existing SDK, model names, and streaming all work exactly as before.

Step 1
🔗

Point your SDK at us

Set base_url to api.orcarouter.ai/v1 and swap your API key. No other code changes needed.

Step 2

We route, guard & observe

Every call is routed to the best model, checked against your guardrails, and metered — graded in under 1ms, with failover, caching and full logs built in.

Step 3

You ship, on one endpoint

Traffic goes direct to each provider's first-party API at their published rate — we add $0 per token. One OpenAI-compatible endpoint for routing, observability and governance.


Every model. One price list.

200+ models with live, side-by-side pricing — what you'd pay the provider directly. We add $0 on top.

View all 200+ models →
ModelRouted toInput /MOutput /MContextQuality
kimi/kimi-k2.7-codeNEWMoonshot$0.950$4.00262K8.0
anthropic/claude-fable-5NEWAnthropic Direct$10.00$50.001M10.0
qwen/qwen3.7-plusNEWAlibaba Cloud$0.350$1.421M8.0
minimax/minimax-m3NEW$0.300$1.201M9.0
anthropic/claude-opus-4.8NEWAnthropic Direct$5.00$25.001M10.0
google/gemini-3.5-flashNEWGoogle Direct$1.50$9.001M9.0
qwen/qwen3.7-maxNEWAlibaba Cloud$1.25$3.751M5.0
qwen/qwen3.7-max-2026-05-20NEWAlibaba Cloud$1.25$3.751M5.0
qwen/qwen3.6-flashAlibaba Cloud$0.250$1.501M7.0
qwen/qwen3.6-35b-a3bAlibaba Cloud$0.248$1.48262K8.0
openai/gpt-5.5-proOpenAI Direct$30.00$180.0010.0
openai/gpt-5.5OpenAI Direct$5.00$30.0010.0
deepseek/deepseek-v4-proDeepSeek$0.435$0.8701M9.0
deepseek/deepseek-v4-flashDeepSeek$0.140$0.2801M8.0
anthropic/claude-opus-4.7Anthropic Direct$5.00$25.001M10.0
+ 194 more models · Prices update every 60 seconds

Everything your OpenAI client already calls.

Streaming, tool calls, structured outputs, vision, embeddings and audio — routed unchanged across every model.

ModelStreamingToolsStructuredVisionEmbeddingsAudio
anthropic/claude-opus-4.8supportedsupportedsupportedsupportednot supportednot supported
google/gemini-3.1-pro-previewsupportedsupportedsupportedsupportednot supportedsupported
grok/grok-4.3supportedsupportedsupportedsupportednot supportednot supported
anthropic/claude-fable-5supportedsupportedsupportedsupportednot supportednot supported
anthropic/claude-opus-4.7supportedsupportedsupportedsupportednot supportednot supported
Pricing

Routing is free.
Pay for features.

We never take a cut of your token spend. Our revenue comes from optional team features.

Zero markup guarantee
You pay providers directly at their published rates. We add nothing on top of token costs. Routing is free; the optional Team plan funds the platform.
$0.00routing fee

Hacker

Free
Forever. Zero markup on all tokens.
✓ Route — 200+ models, auto-failover
✓ Observe — basic dashboard
✓ Manage — prompt versioning
✓ 3 API keys · 0% token markup
Start free

Enterprise

Custom
SLA commitments + private deployment.
✓ Everything in Team
✓ Private / on-prem deployment
✓ 99.99% uptime SLA
✓ Dedicated infrastructure
✓ Dedicated support & custom pricing
Trust & Compliance

Independently audited. Continuously compliant.

Audit reports available under NDA — request a copy below.

Smarter, safer, cost-efficient.

Swap one line. That's the migration.

Sign up with GitHub — $5 in tokens free. No credit card required. You're live in under a minute.