Question 1

What is the needle-in-a-haystack benchmark?

Accepted Answer

A test where a unique fact (the 'needle') is placed at varying depths inside a long document, and the model is asked to retrieve it. Accuracy at increasing context length reveals whether the model's full window is usable or whether it degrades in the middle.

Question 2

Does OrcaRouter charge extra for long-context calls?

Accepted Answer

No. Long-context tokens are billed at the upstream provider's per-token rate, same as any other call. OrcaRouter adds zero markup.

Question 3

Why does long-context cost more per token at most providers?

Accepted Answer

Self-attention is quadratic in context length so the per-call compute cost grows fast. Providers price the input tokens higher (often 1.5-2× the standard rate) above certain thresholds (typically 128K).