Google Gemini 3.1 Pro Preview: flagship multimodal model with 1M context window and 95.6 τ²-Bench score, accessed via OrcaRouter API.
Google Gemini 3.1 Pro Preview is a flagship model from Google, offered in preview form. It is a multimodal model capable of processing text, image, video, audio, and file inputs. The model is categorized as flagship tier, indicating that it is designed for high-demand, complex applications where performance and capacity are critical. As a preview, it may have limitations in stability or availability compared to stable releases. Access is provided through the OrcaRouter API.
This model is intended for developers and enterprises that need to handle large context windows up to 1,048,576 tokens and require multimodal understanding. Use cases include long-document analysis, video moderation, advanced chatbots with memory of entire conversations, and complex data extraction from mixed media. The preview status makes it suitable for experimentation and early integration, but production deployments should evaluate stability. It is also ideal for teams already using OrcaRouter’s OpenAI-compatible API who want to test Google’s latest flagship capabilities.
The model supports a context window of 1,048,576 tokens (input) and a maximum output of 65,536 tokens. It accepts input in multiple modalities: audio, file (e.g., PDF, code files), image, text, and video. The headline benchmark score is 95.6 on τ²-Bench, a metric that measures task completion performance. The model is classified as flagship tier by its provider, Google. It is accessed via OrcaRouter’s API at base URL https://api.orcarouter.ai/v1 with model ID "google/gemini-3.1-pro-preview".
As a preview version of Gemini 3.1 Pro, this model sits at the top of Google’s current lineup among preview releases. It offers a significantly larger context window (1M tokens) and higher output limits (65K tokens) compared to earlier Gemini 2.0 models or Gemini 3.0 previews. The τ²-Bench score of 95.6 provides a quantitative benchmark for task-oriented performance. Compared to other preview models from Google, this one targets the most demanding use cases where both breadth of context and depth of reasoning are required.
Gemini 3.1 Pro Preview is multimodal and can process audio, files (including documents, code, and spreadsheets), images, text, and video inputs. This allows it to reason across different data types within a single conversation. For example, you can upload an image along with a text prompt asking about its contents, or analyze a video alongside a transcript. The file input modality supports structured and unstructured data, making it useful for document analysis and data extraction tasks.
The model supports a context window of 1,048,576 tokens for input. This is one of the largest context windows available in a flagship model. It enables processing of very long documents, entire codebases, or hours of video transcript in a single request. When combined with the 65,536 token output limit, it allows for extensive generation of reports, summaries, or multi-step reasoning chains without needing to paginate or chunk inputs.
Ideal use cases include long-document summarization, multi-turn conversational agents with memory of entire user histories, video content analysis, complex data extraction from mixed media, and agentic tasks that require high accuracy (as reflected in the τ²-Bench score). The model also excels in tasks that combine multiple input types, such as analyzing a chart in an image while reading a related text passage. For simpler tasks, a cheaper model may be more cost-effective, but the overhead of the large context is justified for sophisticated applications.
For tasks that require only short text generation, simple classification, or low-latency responses, a smaller or non-flagship model may be more appropriate. The Gemini 3.1 Pro Preview’s large context window and multimodal capacity come with higher computational cost per request. If your use case does not need the full 1M token context or output of 65K tokens, consider using a lighter model available through OrcaRouter, such as Gemini 2.0 Flash or other cost-efficient alternatives. Always evaluate the cost-performance trade-off based on your average input and output token usage.
The model achieved a score of 95.6 on τ²-Bench. τ²-Bench is a benchmark that evaluates task completion performance across a variety of agentic, reasoning, and planning tasks. A score of 95.6 indicates a high level of accuracy in completing such tasks. While the exact composition of τ²-Bench is not provided, the score positions this model as a strong performer for structured decision-making and multi-step reasoning challenges. It serves as a quantitative indicator of the model’s capabilities compared to other large models.
Latency details for Gemini 3.1 Pro Preview are not provided in the available facts. However, given its flagship tier and large context window (1M tokens) and output limit (65K tokens), response times will vary based on input length, output requested, and server load. Processing very long inputs or generating large outputs will take longer than with smaller models. For real-time applications, consider using a faster model. OrcaRouter’s API does not provide specific latency guarantees for this preview model.
The model’s strengths, inferred from its specifications, include very large context capacity (1,048,576 tokens), high output token limit (65,536 tokens), multimodal input support, and a strong τ²-Bench score (95.6). These features make it suitable for complex tasks that require reasoning over long contexts and multiple data types. The preview status may allow early access to advanced capabilities before stable release. The flagship tier classification suggests it is designed for high-demand applications.
As a preview model, Gemini 3.1 Pro Preview may not have the same stability, availability, or support as a stable release. It could experience changes or deprecation without notice. No specific latency or throughput numbers are given, so performance under load is unknown. The benchmark score on τ²-Bench is a single metric and may not reflect performance on all tasks. Additionally, the large context window may increase cost and response time. Users should test thoroughly before production use.
Pricing details for Gemini 3.1 Pro Preview are not provided in the available facts. As a flagship model, it is generally priced higher than smaller or non-flagship variants, with costs typically based on input and output token counts. The large context window (1M tokens) and output limit (65K tokens) can lead to significant token usage per request. OrcaRouter may apply per-token pricing for both input and output, with possible surcharges for multimodal inputs. Users should consult OrcaRouter’s pricing page for current rates.
When using Gemini 3.1 Pro Preview, the largest cost driver is token consumption. A single request that uses the full 1M token context will incur high input token costs. Similarly, generating up to 65K output tokens will increase output costs. For use cases that do not require the full context or output, users may be able to reduce costs by truncating inputs or setting lower max_tokens. Caching (if supported by OrcaRouter) could reduce redundant input cost, but no caching details are provided. Evaluate average usage patterns to decide if a cheaper model is more economical.
The available facts do not specify whether OrcaRouter offers caching for Gemini 3.1 Pro Preview. Many API providers offer token caching for repeated input prefixes, which can lower costs and improve latency. If caching is available, it would be beneficial for use cases with frequent repeated instructions or system prompts. Users should check OrcaRouter’s documentation for caching support. In the absence of caching, the full cost of input tokens is incurred on each request.
No specific price comparisons are provided. Generally, flagship models are more expensive per token than smaller models. Gemini 3.1 Pro Preview, being a flagship preview, likely has higher per-token cost than Gemini 2.0 Flash or Gemini 2.0 Pro. However, because it is a preview, pricing may be promotional or subject to change. Users should compare OrcaRouter’s listed prices for each Google model to determine the most cost-effective option for their workload.
To use Gemini 3.1 Pro Preview on OrcaRouter, make requests to the OpenAI-compatible API endpoint at https://api.orcarouter.ai/v1/chat/completions. Set the model parameter to "google/gemini-3.1-pro-preview". The API accepts standard parameters such as messages, max_tokens, temperature, and top_p. For multimodal inputs, use the content array with appropriate type (text, image_url, etc.). Example code and SDKs are available in OrcaRouter’s documentation.
You can configure maximum output tokens up to 65,536 using the max_tokens parameter. The model supports temperature, top_p, and other common sampling parameters. For multimodal input, specify the content type in the messages array. The context window of 1,048,576 tokens applies to all input tokens combined. All parameters follow the OpenAI chat completions specification. Refer to OrcaRouter’s API reference for any model-specific limitations or additional parameters.
Migrating to OrcaRouter is straightforward because it uses an OpenAI-compatible API. Simply change the base URL to https://api.orcarouter.ai/v1 and update the model ID to "google/gemini-3.1-pro-preview". Authentication methods (API key) are similar. If you were using a different Google model, you may need to adjust for different capabilities (e.g., context window size, multimodal handling). Test with sample requests to ensure compatibility. OrcaRouter’s documentation provides migration guides for common setups.
As a preview model, Gemini 3.1 Pro Preview may have lower rate limits, less reliability, or be subject to changes without notice. It is intended for testing and evaluation. If you need a stable production model, consider using a non-preview model. The API may return responses faster or slower depending on load. Monitor performance and have a fallback model. OrcaRouter may update the model ID or deprecate preview versions; plan accordingly.
Compared to earlier Google models like Gemini 2.0 Pro, this preview offers a substantially larger context window (1M vs. 32K tokens) and higher output limit (65K vs. 8K tokens). It also supports additional input modalities like video and files in a more integrated way. The τ²-Bench score of 95.6 is specific to this model and indicates strong task performance. However, as a preview, it may lack the stability of Gemini 2.0 or Gemini 3.0 stable releases. The flagship tier places it above Gemini 2.0 Flash in capability and cost.
No direct benchmark comparisons are provided. The model’s 1M token context window is among the largest available, rivaling or exceeding many competitors. Its multimodal input support is broad (audio, file, image, text, video). The τ²-Bench score of 95.6 offers a point of comparison for agentic tasks, but without other models’ scores on the same benchmark, a full comparison is not possible. Users should evaluate based on their specific use case requirements.
Choose this model when your task requires the largest possible context window (up to 1M tokens) and high output generation (up to 65K tokens). It is also the best choice when you need to handle multiple input modalities – especially file and video – in a single reasoning pass. The high τ²-Bench score suggests it excels at complex agentic tasks. If you are already using OrcaRouter and want to test Google’s latest flagship capabilities, this preview is a good starting point.
Opt for an alternative if you need a stable, production-verified model (since this is a preview). If your use case has low latency requirements or small token usage, a cheaper model like Gemini 2.0 Flash or a non-Google model would be more cost-effective. Also, if your task does not require the full 1M token context or multimodal input, a smaller model may provide faster and cheaper responses. Evaluate the trade-offs between capability, cost, and reliability for your specific application.
from openai import OpenAI
client = OpenAI(
base_url="https://api.orcarouter.ai/v1",
api_key="$ORCAROUTER_API_KEY",
)
response = client.chat.completions.create(
model="google/gemini-3.1-pro-preview",
messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)| Tier | Input / 1M tokens | Output / 1M tokens | Cache read / 1M | Cache write / 1M |
|---|---|---|---|---|
| ≤ 200K | $2.00 | $12.00 | $0.200 | $0.375 |
| ≤ ∞ | $4.00 | $18.00 | $0.400 | $0.375 |
| Tier selected by input token count of each request | ||||