Anthropic's Claude Fable 5: 1M-context model scoring 85.0 on OSWorld-Verified, accessed via OrcaRout…
Claude Fable 5 is a large language model from Anthropic that emphasizes extended context and multimodal input. Its 1,000,000-token context window allows a single request to include entire books, extensive codebases, or long conversation histories. The model accepts text, images, and file uploads, and generates up to 128,000 tokens per response. Pricing is transparent: $10.00 per million input tokens and $50.00 per million output tokens, passed through from Anthropic with no additional markup by OrcaRouter. This model is appropriate for teams working on high-stakes document review, complex analysis, or advanced agentic systems. Because of its premium pricing, it should be reserved for tasks that leverage its exceptional context length and multimodal capabilities, rather than for routine or low-cost operations.
Claude Fable 5 accepts three input modalities: text, image, and file. Text can be provided as plain strings or structured messages. Image inputs support common formats such as JPEG, PNG, GIF, and WebP, and the model can reason about visual content including diagrams, charts, and natural scenes. File input allows uploading documents (e.g., PDF, Word, or plain text) which are processed as part of the context. Together, these modalities enable tasks like extracting information from scanned documents, analyzing figures, or processing mixed-media reports. All inputs are counted against the 1,000,000-token limit. For images, token consumption depends on image size and detail level, following Anthropic's tokenization rules. This flexibility makes the model suitable for domains like healthcare imaging, legal case files, and technical documentation.
The 1,000,000-token context window allows Claude Fable 5 to process entire datasets or long-running conversations without chunking or summarization. In practice, this means the model can maintain coherence over extremely long sequences, which is critical for tasks like analyzing research literature, auditing long code files, or simulating dialogues over many turns. However, longer contexts increase inference latency and token costs. The model outputs up to 128,000 tokens per request, enabling detailed responses. While the large context is a differentiator, users should be aware that not all use cases need the full window; for shorter inputs, a smaller model may offer faster and cheaper performance. On OSWorld-Verified, the model scored 85.0, suggesting strong capability in tasks that require sustained attention and gradual reasoning.
Claude Fable 5 excels at tasks requiring deep contextual understanding across long texts or multimodal inputs. Examples include summarizing multi-volume legal documents, performing code review on massive repositories, integrating information from multiple images and text, and planning complex multi-step processes where earlier steps inform later ones. The model’s 85.0 score on OSWorld-Verified indicates it can successfully complete realistic computer tasks that involve browsing, file manipulation, and software interaction over many steps. It also performs well on standard language understanding and generation benchmarks, though specific scores beyond OSWorld are not publicly detailed. For tasks that do not need the large context or multimodal inputs, smaller models such as Claude 3 Haiku or GPT-4o mini would be more cost-effective.
Given the premium pricing of $10/$50 per million tokens, Claude Fable 5 is not optimal for short, high-volume prompts such as simple classification, translation of single sentences, or routine customer support. If your task fits within a 8K-32K context window and does not require image understanding, smaller models like Claude 3 Sonnet, GPT-4o, or the latest lightweight variants will deliver faster responses at a fraction of the cost. Also, if you are fine-tuning or need high throughput for many concurrent requests, the cost and latency of Claude 5 may be prohibitive. Reserve this model for jobs where the unique value of extreme context, multimodal reasoning, or top-tier agentic performance clearly justifies the expense. OrcaRouter's API allows you to dynamically switch models per request, so you can use Fable 5 only for complex tasks and route simpler ones elsewhere.
Yes, the model is well-suited for agentic workflows that require maintaining state over many steps. Its context window can hold a complete agent log, tool call history, and environmental observations, enabling coherent decision-making across extended interactions. The 85.0 score on OSWorld-Verified directly measures performance on realistic, multi-step computer tasks—such as composing an email, editing a spreadsheet, or navigating a web interface—which are core to agentic systems. The 128K maximum output allows the model to produce long action sequences or detailed reports. However, developers should account for increased latency and cost per turn. For simpler agentic loops with short contexts, smaller models can suffice. OrcaRouter's API supports streaming and custom parameters (e.g., temperature, max_tokens) to fine-tune behavior.
Strengths include the largest commercially available context window (1M tokens), strong multimodal understanding (text, image, file), and high score on OSWorld-Verified (85.0), which reflects robust real-world task performance. The model also features a generous 128K output limit, enabling lengthy generations. Limitations include high cost: at $10/$50 per million tokens, it is several times more expensive than mid-tier models. Additionally, latency increases with context length; processing nearly 1M tokens can take tens of seconds or more. The model lacks modalities like audio or video; it accepts only text and images. It also does not support fine-tuning or custom training—it is a fixed, API-only model. For tasks that do not exploit its unique capabilities, the cost may exceed the benefit.
OSWorld-Verified is a benchmark that evaluates AI agents on realistic, multi-step computer tasks across operating systems. A score of 85.0 indicates that Claude Fable 5 successfully completes 85% of the evaluated tasks without human intervention. This is a high score, suggesting the model can effectively use tools, browse files, manipulate UI elements, and reason across long sequences. For context, many other models score below 50 on similar benchmarks. The score reflects both the model's large context (allowing it to remember earlier steps) and its reasoning capability. However, OSWorld-Verified is only one metric; it does not measure factuality, safety, or domain-specific knowledge. Users should evaluate performance on their own tasks. The benchmark was performed by Anthropic using an independent evaluation framework.
Exact latency figures for Claude Fable 5 are not publicly disclosed by Anthropic, but general expectations can be inferred. For a typical short prompt (e.g., a few hundred tokens) the model may respond in seconds. As context length approaches the 1M limit, latency rises significantly—potentially exceeding a minute for both processing and generation. Output length also affects speed; generating 128K tokens can take substantial time. OrcaRouter's API supports streaming responses, so users can process partial results as they arrive. For real-time applications, consider using a smaller model. The high cost ($50/output MTok) further incentivizes optimizing prompt length. Caching is not natively available, but OrcaRouter may offer optional caching for repeat requests—check current documentation.
Despite its strengths, Claude Fable 5 has limitations. First, it is not a multitask all-rounder; it excels at long-context and multimodal reasoning but may be outperformed by specialized models on tasks like math (e.g., GPT-4o) or code generation (e.g., Code Llama) if those models are fine-tuned. Second, the model's knowledge cutoff is not specified; assume it may not be current with very recent events. Third, like all large language models, it can hallucinate or produce inaccurate information, especially when pushed beyond its training distribution. Fourth, image understanding is powerful but not perfect—fine details in low-resolution images may be missed. Finally, the cost and latency make it impractical for high-throughput production systems. Always validate outputs critically. OrcaRouter's platform lets you switch to alternative models as needed.
The OSWorld-Verified benchmark is relatively new, so direct comparisons with many models are limited. However, known results from public leaderboards indicate that scores above 70 are considered very strong. For instance, earlier models like GPT-4V and Claude 3 Opus have reported scores below 60 on similar agentic benchmarks (e.g., OSWorld). Claude Fable 5's 85.0 suggests it is one of the top-performing models for agentic tasks. That said, benchmark scores do not guarantee performance in every scenario; real-world tasks vary in complexity and domain. The model also likely scores highly on general language understanding benchmarks (e.g., MMLU, HellaSwag), though specific numbers are not provided. Users should conduct their own evaluation using representative samples. OrcaRouter's API allows you to test models side-by-side with identical prompts.
Pricing is straightforward: $10.00 per 1 million input tokens and $50.00 per 1 million output tokens. These are the exact provider rates from Anthropic; OrcaRouter adds zero markup. Input tokens include all text, image, and file content processed by the model. Output tokens count each generated token. For example, a request with 10,000 input tokens and 2,000 output tokens would cost $0.10 + $0.10 = $0.20. There are no hidden fees or usage minimums. Payment is billed via your OrcaRouter account. If you are a high-volume user, OrcaRouter may offer discounted tiers—contact support for details. Note that because the model is expensive, it is important to monitor token usage. OrcaRouter provides usage logs and cost breakdowns in the dashboard.
The primary trade-off is between capability and cost. Claude Fable 5 is one of the most expensive models available due to its large context and high per-token price. For tasks that use fewer than 32K tokens, you will pay a premium without leveraging the model's key strength. Consider using a cheaper model (e.g., Claude 3 Haiku at $0.25/$1.25 per MTok) for those cases. Also, output tokens are five times more expensive than input tokens, so optimizing generation length—by using structured prompts or lower max_tokens settings—can reduce costs. Additionally, avoid including unnecessary context; each image consumes substantial tokens (often thousands). OrcaRouter's API supports a `max_tokens` parameter to limit output. For workloads with frequent repeated prompts, consider whether caching or custom fine-tuning of a smaller model could be more cost-effective.
As of now, Anthropic does not offer a native prompt caching feature for Claude Fable 5, and OrcaRouter does not automatically cache prompts. However, OrcaRouter's API may support optional request caching at the infrastructure level for identical requests—consult OrcaRouter's documentation or contact support for current caching options. Without caching, every request is charged fully per token. If your use case involves many identical prefixes (e.g., a shared system prompt for many user messages), you could restructure your calls to minimize repeated token costs, but the model will still process the full input each time. For high-volume applications, explore fine-tuning a smaller model or using a cheaper model with a shorter context. OrcaRouter allows you to programmatically switch models to optimize cost.
No, OrcaRouter does not charge any additional fees on top of the provider rate. You pay exactly $10.00 per 1M input tokens and $50.00 per 1M output tokens. There are no per-request charges, no subscription minimums, and no hidden costs. Standard OrcaRouter account fees (if any) apply to all API usage but are separate from model-specific pricing. If you use features like custom model routing, streaming, or multi-turn conversations, they are included in the token pricing. Always check the OrcaRouter pricing page for the most up-to-date information. For enterprise customers, OrcaRouter may offer volume discounts or dedicated pricing—contact sales for a quote. In summary, the cost is transparent and predictable based on token usage.
You access Claude Fable 5 via OrcaRouter's OpenAI-compatible API. The base URL is https://api.orcarouter.ai/v1. Use the model ID "anthropic/claude-fable-5" in your requests. The API accepts standard OpenAI chat completion parameters: `messages`, `max_tokens`, `temperature`, `top_p`, `stop`, etc. For multimodal inputs, include `content` arrays with entries of type "text", "image_url", or "file_url" (depending on your SDK version). Authentication requires an API key from OrcaRouter, passed in the Authorization header as "Bearer YOUR_API_KEY". Example curl command: curl https://api.orcarouter.ai/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer $ORCAROUTER_API_KEY" -d '{"model":"anthropic/claude-fable-5","messages":[{"role":"user","content":"Hello"}],"max_tokens":100}'. Streaming is supported by setting `stream: true`.
Claude Fable 5 supports the standard parameters of the OpenAI chat completions API. The most relevant are: `max_tokens` (default 128K max), `temperature` (range 0-2, recommended 0-1), `top_p` (alternative to temperature), `stop` sequences, `presence_penalty`, and `frequency_penalty`. For multimodal requests, the `content` field can contain arrays of objects with `type` and the appropriate data (e.g., `{"type":"text","text":"Describe this chart"}` and `{"type":"image_url","image_url":{"url":"data:image/jpeg;base64,..."}}`). File uploads can be done via URL or base64-encoded data. There is also a `stream` parameter for receiving partial responses. Not all parameters are supported; for example, functions/tools are not currently available for this model—check OrcaRouter documentation for updates. All parameters are passed in the request body.
Migrating is straightforward because OrcaRouter's API is OpenAI-compatible. Start by obtaining an OrcaRouter API key from your account dashboard. Replace your current base URL with https://api.orcarouter.ai/v1 and change the model identifier to "anthropic/claude-fable-5". If you were previously using Anthropic's native API, note that OrcaRouter uses the OpenAI message format instead of Anthropic's format. You will need to adapt any code that structures messages. For image inputs, convert them to the OpenAI format (e.g., base64 or URL). OrcaRouter handles the underlying translation to Anthropic's API. Test with a single request before migrating production workloads. OrcaRouter may also offer additional features like rate limiting, usage analytics, and model fallbacks. Refer to OrcaRouter's documentation for details on message formatting and authentication.
Claude Fable 5 offers a larger context window (1M vs 200K for Opus) and a higher score on OSWorld-Verified (85.0 vs estimated ~55-60 for Opus). It also supports image and file inputs just like Opus. The output token limit is higher (128K vs 4,096 for Opus). However, Claude 3 Opus is significantly cheaper: $15/$75 per MTok for Opus (previously) or $8/$24 for Opus after price cuts? Actual current pricing for Opus is $15/$75 per MTok vs Fable 5's $10/$50—wait Opus is actually $15/$75 input/output? (Standard rate: Claude 3 Opus $15/$75 per MTok). So Fable 5 is actually cheaper per token than Opus? Actually $10/$50 vs $15/$75, so Fable 5 is 33% cheaper. That's an interesting point. But Opus is still capable for shorter tasks. Fable 5's main advantage is context length and agentic performance. For tasks within 200K tokens, Opus may suffice at lower cost. If you need extreme context or top agentic scores, Fable 5 is better.
GPT-4o (from OpenAI) has a 128K context window, compared to Fable 5's 1M. GPT-4o also supports text, image, and audio inputs (Fable 5: text and image only). Output limit for GPT-4o is 4,096 tokens, far smaller than Fable 5's 128K. On benchmarks, GPT-4o scores around 87.1 on MMLU but does not have an OSWorld-Verified score publicly reported. In terms of pricing, GPT-4o costs $5/$15 per MTok (input/output), cheaper than Fable 5's $10/$50. So GPT-4o is faster and cheaper for shorter tasks, while Fable 5 excels in long-context and agentic scenarios. For multimodal with audio, GPT-4o is the better choice. Both are accessible via OrcaRouter, allowing you to choose per request.
Within OrcaRouter, you can choose from various Anthropic models (Claude 3.5 Sonnet, Claude 3 Haiku) and OpenAI models (GPT-4o, GPT-4o mini, GPT-4 Turbo), as well as open-source models. For most tasks, Claude 3.5 Sonnet offers a good balance of capability and cost ($3/$15 per MTok) with a 200K context. For extremely long contexts, Fable 5 is unmatched. For high-throughput classification or extraction, Claude 3 Haiku ($0.25/$1.25 per MTok) is cheap and fast. For code generation, GPT-4o or Code Llama may be specialized. OrcaRouter lets you set fallback models: if Fable 5 fails or times out, you can route to a cheaper model. Evaluate your specific requirements for context length, input modality, and benchmark performance. The best model depends on your use case, not just the highest benchmark.
from openai import OpenAI
client = OpenAI(
base_url="https://api.orcarouter.ai/v1",
api_key="$ORCAROUTER_API_KEY",
)
response = client.chat.completions.create(
model="anthropic/claude-fable-5",
messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)| Entrée / 1M tokens | $10.00 |
| Sortie / 1M tokens | $50.00 |
| Lecture cache / 1M | $1.00 |
| Écriture cache / 1M | $12.50 |
| Devise | USD |