Model Fusion in Production: Inside OrcaRouter Fusion and the Routing DSL
Three frontier models in parallel, one answer back. Call it in one line — or compose your own.
TL;DR. Claude Fable 5 was delisted. The answer isn't a bigger model — it's a panel: run several frontier models in parallel and let a judge return the strongest answer. OrcaRouter ships this two ways: built-in orcarouter/fusion routers you call like any model, and a Routing DSL to compose your own. This is the field guide to both — with copy-paste recipes, the five arbiters (including synthesize, the Mixture-of-Agents fuse), and how to roll it out without betting your SLA.
Part 1 — Call it in one line: the built-in Fusion routers
Fable 5 was discontinued and restricted, so it's no longer broadly callable. Fusion rebuilds that level from the models you can still call — a drop-in, OpenAI-compatible router that runs a panel of frontier models in parallel and returns the strongest answer. Three curated tiers ship in every workspace:
The three Fusion tiers (panel composition × context window)
Claude Opus 4.8 + GPT-5.5 + Gemini 3.1 Pro
Context Window: 1,000,000
Best for: Fable-5 level Max intelligence
Claude Opus 4.8 + GPT-5.5
Context Window: 1,000,000
Best for: Fable-5 level balanced inference
Gemini 3.5 Flash + MiniMax M2.7 + GLM 5.1
Context Window: 200,000
Best for: Fable-5 level fast + cheap inference
(Context window = the smallest panel member's — the binding constraint on a fan-out.)
These aren't marketing slugs; they're pre-compiled DSL routers managed centrally. Here's the actual orcarouter/fusion program, verbatim:
version: 1
rules:
- id: hard_panel
when: task_class == "code" || task_class == "agent" || code_keyword_density >= 0.3 || has_tools || difficulty >= 0.3
use:
parallel:
- { model: "anthropic/claude-opus-4.8" }
- { model: "openai/gpt-5.5" }
- { model: "google/gemini-3.1-pro-preview" }
arbiter:
strategy: best_of_n
model: "anthropic/claude-opus-4.8"
template: best_answer_v1
max_latency_ms: 120000
default:
delegate: balancedTwo design choices worth calling out:
It only fans out on real work. The when: gate fires the panel for code, agent, tool-using, code-dense, or high-difficulty (difficulty >= 0.3) prompts; everything else falls through to the workspace's balanced default. You pay panel price exactly where it helps, not on "hi."
The judge serves a real answer, verbatim. best_of_n runs an LLM judge (here, Opus 4.8 with the best_answer_v1 template) that picks the single strongest candidate and serves it as-is — never a diluted merge. The output is always a real model's answer.
Part 2 — Select vs. Fuse: best_of_n and the synthesize arbiter
The Fusion routers select. But OrcaRouter also ships a fuse strategy — synthesize, the Mixture-of-Agents pattern added in the routing engine (service/dispatch_parallel/synthesize.go). The difference is the whole game:
Exhibit 2 — Select vs. Fuse
best_of_n (SELECT) synthesize (FUSE)
┌─ Opus 4.8 ─┐ ┌─ Opus 4.8 ─┐
├─ GPT-5.5 ─┼─► judge picks leg k ├─ GPT-5.5 ─┼─► aggregator LLM writes
└─ Gemini ─┘ └─► serve leg k verbatim └─ Gemini ─┘ ONE new fused answer
output = a real model's answer output = a new answer better than any legRecipe for true fusion:
use:
parallel:
- { model: "anthropic/claude-opus-4.8" }
- { model: "openai/gpt-5.5" }
- { model: "google/gemini-3.1-pro-preview" }
arbiter:
strategy: synthesize
model: "anthropic/claude-opus-4.8" # aggregator: fuses candidates into one new answer
template: synthesize_v1Honest caveats:
- Billing is N+1 — every leg bills, plus the aggregator as an extra call.
- OpenAI chat-format only in V1 — the aggregator emits an OpenAI chat completion; Claude/Gemini native clients degrade to serve-first-successful (legs still billed).
The aggregator must be in the router's authorized candidate set, or it degrades.
When to use which: best_of_n when one model's answer is likely fully right (code, factual Q&A) — you want a clean, real answer. synthesize when answers are complementary (research, analysis, long-form) and merging strengths beats any single take.
Part 3 — Build your own: the Routing DSL playbook
Don't want the curated panel? Start from the "Claude Fable 5 Level" templates in the Routing DSL editor (they ship in every workspace and mirror the Fusion routers), then specialize. Six copy-paste patterns:
1 — Ship code that actually runs → fan out, let the tests pick the winner:
- id: hard_code
when: task_class == "code" && difficulty > 0.6
use:
parallel:
- { model: "anthropic/claude-opus-4.8", thinking_budget_tokens: 16000 }
- { model: "openai/gpt-5.5", reasoning_effort: high }
- { model: "google/gemini-3.1-pro-preview" }
arbiter: { strategy: tests_pass }tests_pass is execution-grounded — it serves the candidate that passes your harness, no judge LLM needed.
2 — Stop overpaying for easy prompts → difficulty gate (the Fusion pattern, your models):
- id: easy
when: difficulty < 0.3
use: { delegate: cheapest }
- id: hard
when: difficulty >= 0.3
use:
parallel:
- { model: "anthropic/claude-opus-4.8" }
- { model: "openai/gpt-5.5" }
arbiter: { strategy: best_of_n, model: "anthropic/claude-opus-4.8", template: best_answer_v1 }3 — Keep long agent runs on the rails → escalate only when it wobbles:
- id: agent
when: task_class == "agent" && agent_state.consecutive_errors == 0
use: { model: "anthropic/claude-sonnet-4.6", affinity_ttl: "5m" }
on_low_confidence:
signals: [next_turn_test_failed, self_doubt]
use: { model: "anthropic/claude-opus-4.8", thinking_budget_tokens: 24000 }4 — Make flaky outputs deterministic → vote, escalate on a split:
- id: extract
when: task_class == "rag"
use:
parallel:
- { model: "anthropic/claude-opus-4.8" }
- { model: "openai/gpt-5.5" }
- { model: "google/gemini-3.1-pro-preview" }
arbiter: { strategy: majority }
on_disagreement: { model: "anthropic/claude-opus-4.8", thinking_budget_tokens: 32000 }5 — Beat tail latency & provider blips → race, serve the first responder:
- id: race
when: request.stream == true && difficulty < 0.5
use:
parallel:
- { model: "google/gemini-3.5-flash" }
- { model: "minimax/minimax-m2.7" }
- { model: "z-ai/glm-5.1" }
arbiter: { strategy: first }6 — Roll out without betting the SLA → shadow (evaluate alongside live traffic, log what it would pick + cost delta, serve the live pick) → canary % (dsl_canary_pct 5 → 25 → 100, crypto-random per request). Migrate on measured divergence, roll back instantly.
The cheat-sheet: five arbiters

Economics & honesty
Difficulty-gated fan-out keeps the bill flat (Illustrative; cost = real token-price math) — blended cost = easy_share × cheap + hard_share × panel:

A 70%-easy workload runs the full panel for a third of the all-panel bill.
