Question 1

Why do you weight SWE-bench Verified so heavily?

Accepted Answer

SWE-bench Verified is the only public benchmark where models must edit a real codebase to fix a real bug — closer to the day-to-day reality of coding agents than HumanEval's isolated function-completion tasks. We weight it 50% because it predicts real-world performance better than any other single number.

Question 2

How often is this ranking updated?

Accepted Answer

Monthly. We re-pull benchmark numbers from each vendor's published eval cards on the first of each month and re-measure latency continuously.

Question 3

Can I just call the top-ranked model through OrcaRouter?

Accepted Answer

Yes. Each model name is a valid OrcaRouter model ID — pass it as the `model` parameter in your OpenAI-compatible request. OrcaRouter routes to the cheapest live backend serving that exact model.