Google Gemini 3.1 Pro Preview Custom Tools – 1M context, 95.6 τ²-Bench, multimodal via OrcaRouter.
Google Gemini 3.1 Pro Preview Custom Tools is a preview‑stage large language model developed by Google. It is designed for tasks that require long‑form reasoning, large context windows, and integration with external tools. The model accepts inputs in text, audio, image, video, and file formats, making it a multimodal solution for both content understanding and generation. Through OrcaRouter, you can call the model using an OpenAI‑compatible API at base URL https://api.orcarouter.ai/v1 with the model ID "google/gemini-3.1-pro-preview-customtools". This compatibility streamlines integration for teams already familiar with the OpenAI SDK or similar clients. As a preview model, it may have limitations in availability or performance compared to stable releases.
This model is suited for developers, data scientists, and enterprise teams who need to process very long documents (up to 1 million tokens) or combine multiple input modalities (text, audio, image, video, files) in a single reasoning step. It is particularly valuable for tasks that involve custom tool use—where the model must decide when and how to call external functions or APIs. Teams working on research, legal analysis, media processing, or advanced automation will find the large context and strong benchmark performance useful. Because it is a preview, it may be ideal for prototyping and evaluation rather than production systems that require guaranteed uptime or latency.
The model offers a context window of 1,048,576 tokens and a maximum output of 65,536 tokens. Input modalities cover text, audio, image, video, and file uploads. The headline benchmark score is 95.6 on τ²-Bench, a test of tool‑use reasoning. Pricing is $4.00 per 1M input tokens and $18.00 per 1M output tokens, with zero markup when accessed through OrcaRouter. The API is OpenAI‑compatible, and the model ID is "google/gemini-3.1-pro-preview-customtools". As a preview, it reflects the latest capabilities but may be subject to change.
Gemini 3.1 Pro Preview Custom Tools accepts input in text, audio, image, video, and file formats. This allows you to include audio recordings, photographs, video clips, and uploaded documents alongside text prompts in a single request. The model can reason across these modalities to produce text output. This multimodal capability makes it possible to describe an image and ask a question about it, transcribe audio while performing analysis, or combine a video with a textual instruction. The exact resolution, codec, or file size limits are not provided in the available facts, but the model can ingest diverse media natively.
The "Custom Tools" designation means the model is optimised to invoke user‑defined functions or APIs as part of its reasoning. In a typical workflow, you provide a set of function definitions (including names, parameters, and descriptions), and the model decides when to call them to fulfil a request. This capability enables autonomous workflows such as querying a database, sending an email, or executing a code snippet. The model can chain multiple tool calls together. The high τ²-Bench score (95.6) indicates strong performance on tasks that require planning and tool orchestration.
The model supports a context window of 1,048,576 tokens (approximately equivalent to 1 million tokens). This allows you to pass in entire books, long codebases, multi‑turn conversations, or extensive logs as context. The maximum output is 65,536 tokens per request. These sizes are among the largest available in the current model landscape. The large context is useful for tasks like summarizing a full transcript, answering questions over a large document set, or maintaining a very long conversation history without truncation.
Because Gemini 3.1 Pro Preview Custom Tools is priced at $4.00 per 1M input tokens and $18.00 per 1M output tokens, it is a premium offering. For simpler tasks—like short‑form text classification, basic summarization, or single‑turn chat—a smaller, cheaper model may be more cost‑effective. Consider alternatives from OrcaRouter such as Gemini 1.5 Flash (lower cost, lower latency) or other lightweight models if you do not need the 1M context window, multimodal input, or the tool‑use benchmark performance. Use this model when the task complexity justifies the higher per‑token cost.
The model achieved a headline score of 95.6 on τ²-Bench (τ²-Bench). This benchmark evaluates a model's ability to perform tool‑use reasoning: planning and executing sequences of function calls to accomplish a realistic task. The high score suggests strong competence in autonomous task completion and decision‑making. τ²-Bench is a newer benchmark that focuses on real‑world scenario complexity. A score of 95.6 is considered very high, though you should note that no single benchmark fully captures all aspects of model quality. The model may have different performance on other benchmarks not listed here.
Based on the τ²-Bench result, the model excels at tasks requiring structured reasoning and tool orchestration. This includes multi‑step retrieval, data transformation, and API calling. The large context window also allows it to handle very long instructions or external data without losing coherence. The multimodal input capability is another strength, enabling it to reason across different media types. For use cases like analyzing a video clip and answering questions about it, or processing an audio file alongside a text query, this model is well‑positioned compared to text‑only alternatives.
No benchmark or model is perfect. The τ²-Bench score of 95.6 does not guarantee the same performance on every real‑world task, especially those outside the benchmark's scope. The model may underperform on tasks requiring very specific domain knowledge or on safety‑oriented evaluations not covered by τ²-Bench. As a preview model, it may have higher latency or lower reliability than a fully released model. The available facts do not include latency figures, so you should test with your own workloads. Additionally, the large context window may increase processing time and cost, and not all tasks benefit from the full million‑token capacity.
Exact latency numbers are not provided in the available facts for Gemini 3.1 Pro Preview Custom Tools. In general, models with a very large context window (over 1M tokens) can take longer to process requests, especially those that use the full context. Latency also depends on request complexity, tool‑call count, and current server load. OrcaRouter may offer streaming responses to reduce time to first token. For real‑time applications, you may want to compare performance with smaller models. Consider running your own latency tests with typical prompts to determine if the speed meets your requirements.
Pricing for Gemini 3.1 Pro Preview Custom Tools is $4.00 per 1 million input tokens and $18.00 per 1 million output tokens. These rates are billed at the provider rate with zero markup when accessed through OrcaRouter. That means the price you see is the price Google charges, with no additional fee from OrcaRouter. Input tokens include all tokens in the prompt (text, image tokens, audio tokens, etc.). Output tokens are the generated response. The model's maximum output is 65,536 tokens, so a single request could cost up to 65,536 / 1,000,000 * 18.00 = approximately $1.18 in output tokens, plus input token costs.
"Zero markup" means OrcaRouter passes through the exact per‑token cost from the provider (Google) to you, without adding any surcharge. You pay $4.00 per 1M input tokens and $18.00 per 1M output tokens—the same rate as if you were calling Google's API directly. OrcaRouter may have separate subscription or usage fees for the gateway service, but the model's per‑token price is not inflated. This pricing structure is transparent and helps you budget accurately. Always check OrcaRouter's current terms for any additional charges.
The high per‑token cost means you should carefully estimate your usage. For prompts that use the full 1M context window, input costs can reach $4.00 per request. If your task can be accomplished with a smaller context, consider truncating or using a cheaper model. Caching is not mentioned in the available facts; if OrcaRouter offers prompt caching, it could reduce costs for repeated inputs. Also, because the model is a preview, pricing may change when a stable version is released. Evaluate your workload's typical token count to decide if the cost is justified.
You access the model through OrcaRouter's OpenAI‑compatible API. Set your base URL to `https://api.orcarouter.ai/v1` and use the model ID `google/gemini-3.1-pro-preview-customtools`. The API accepts standard OpenAI‑style request formats. An example using Python's openai library: ``` import openai client = openai.OpenAI(base_url="https://api.orcarouter.ai/v1", api_key="YOUR_ORCAROUTER_KEY") response = client.chat.completions.create( model="google/gemini-3.1-pro-preview-customtools", messages=[{"role": "user", "content": "Hello"}] ) ``` You need a valid OrcaRouter API key. Authentication is via the `Authorization` header.
Since the API is OpenAI‑compatible, you can use standard parameters such as `temperature`, `top_p`, `max_tokens`, `stop`, `frequency_penalty`, `presence_penalty`, and `stream`. For multimodal requests, you can include images, audio, video, or files in the message content using the array format. For tool use, define functions in the `tools` parameter as a list of JSON objects. The model may return `tool_calls` in the response. Parameters specific to Google's own API (like `safetySettings`) may or may not be available; consult OrcaRouter's documentation for details. The exact parameter support may vary for preview models.
Migrating from the standard OpenAI API is straightforward. Change the `base_url` to `https://api.orcarouter.ai/v1` and update the `model` parameter to `google/gemini-3.1-pro-preview-customtools`. Replace your API key with an OrcaRouter key. Most code that uses `openai.ChatCompletion.create` or the newer `client.chat.completions.create` will work with minimal changes. If you use tool calls, the format is identical to OpenAI's. However, note that this model has a different tokenizer and may produce different output for the same prompt. Test thoroughly before switching.
OrcaRouter uses API key authentication. Include your key in the request header as `Authorization: Bearer YOUR_ORCAROUTER_API_KEY`. You obtain a key by signing up for OrcaRouter. The key should be kept secret and not exposed in client‑side code. The exact authentication method may vary; always refer to OrcaRouter's current API documentation. Some endpoints may support additional auth methods, but the OpenAI‑compatible endpoint uses the standard bearer token pattern. Make sure your requests are sent over HTTPS.
Gemini 1.5 Pro also supports a 1M token context window and multimodal input, but the 3.1 Pro Preview Custom Tools achieved a τ²-Bench score of 95.6, which is a significant improvement over the 1.5 series (exact numbers for 1.5 are not provided). The "Custom Tools" optimisation is the key differentiator, indicating better performance on tool‑use tasks. Pricing is higher for the preview model: Gemini 1.5 Pro is typically cheaper. If you do not need the latest tool‑use performance, Gemini 1.5 Pro may be a more cost‑effective choice. Because the 3.1 Pro is a preview, it may have less stability or uptime guarantee than the stable 1.5 Pro.
GPT‑4o also supports multimodal input and tool use, but its context window is typically 128k tokens—much smaller than the 1M tokens of this model. The τ²-Bench score for GPT‑4o is not provided in available facts, so direct comparison is not possible. In general, Gemini 3.1 Pro Preview Custom Tools offers a significantly larger context window, making it more suitable for long‑document tasks. GPT‑4o may have better performance on certain language benchmarks or broader ecosystem support. The pricing for GPT‑4o is also different; compare per‑token costs, but note that this model's output cost ($18/M) is relatively high.
Claude 3 Opus supports a context window of 200k tokens, far less than the 1M of Gemini 3.1 Pro Preview. Benchmarks like τ²-Bench are not typically reported for Claude, so direct comparisons are speculative. Claude is known for strong reasoning and instruction following. Choosing between them depends on your need for a 1M context and multimodal input versus specific strengths in safety, writing style, or ecosystem. If your use case requires processing very large documents or multiple media types, the Gemini model's larger context and multimodal support are advantages. Cost and availability through OrcaRouter are also factors.
from openai import OpenAI
client = OpenAI(
base_url="https://api.orcarouter.ai/v1",
api_key="$ORCAROUTER_API_KEY",
)
response = client.chat.completions.create(
model="google/gemini-3.1-pro-preview-customtools",
messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)| Input / 1M tokens | $4.00 |
| Output / 1M tokens | $18.00 |
| Cache read / 1M | $0.400 |
| Currency | USD |