Google Gemini 3 Flash Preview – Multimodal model with 1M token context, 88.2 MMLU-Pro, accessible via OrcaRouter.
Google Gemini 3 Flash Preview is a multimodal model developed by Google, optimized for speed and large-context processing. It accepts input in text, image, file, audio, and video formats, and can generate up to 65,536 tokens of output. The model has a context window of 1,048,576 tokens, allowing it to reason across very long sequences. It scores 88.2 on the MMLU-Pro benchmark, indicating strong performance across a wide range of academic and reasoning tasks. This preview version is available through OrcaRouter's OpenAI-compatible API under the model ID google/gemini-3-flash-preview.
Gemini 3 Flash Preview targets developers and organizations building applications that require fast, multimodal reasoning with large context. It is well-suited for use cases like video analysis, long-document digest, and real-time audio-video understanding. The model's pricing—$0.50 per million input tokens and $3.00 per million output tokens—makes it accessible for startups and enterprises alike. Because it is a preview, early adopters can evaluate its capabilities before a stable release. OrcaRouter provides seamless access to this model, including OpenAI-compatible endpoints and zero markup on provider rates.
Gemini 3 Flash Preview supports five input modalities: text, image, file, audio, and video. Text can be plain or structured; images can include photos, diagrams, and screenshots; files cover formats like PDFs and documents; audio includes speech and music; video can be processed with both visual and audio tracks. The model can combine multiple modalities in a single prompt—for example, analyzing a video while also reading an attached PDF. This versatility allows it to handle complex, mixed-media tasks without requiring separate pipelines. Input tokens are counted based on each modality's specific tokenizer rules.
Gemini 3 Flash Preview is a pre-release version of Google's third-generation Flash model. As a preview, it may undergo changes in behavior, performance, and availability. Google typically updates preview models based on user feedback, and they may eventually replace preview endpoints with stable releases. While the model is functional and suitable for testing and development, production deployments should monitor for updates. OrcaRouter mirrors the provider’s endpoint, ensuring that any changes from Google are reflected promptly. The model ID google/gemini-3-flash-preview will remain consistent unless Google modifies its naming.
The model can process text and images together for tasks like captioning, visual question answering, and document extraction. It can read text from scanned documents, interpret charts, and answer questions about the content. For text-only inputs, it supports language understanding, summarization, translation, and code generation. The large context window (1,048,576 tokens) allows it to handle very long conversations, full books, or extensive codebases. Its MMLU-Pro score of 88.2 suggests robust reasoning across a broad set of subjects, including science, math, and humanities.
Audio input can be direct speech or recorded audio; the model can transcribe, translate, or analyze the content. Video input combines visual frames and audio track—suitable for summarizing video content, detecting objects, or understanding scenes with spoken narration. The context window means long videos or audio files can be ingested in a single turn, as long as the token count fits within the limit. Output is text-based; the model does not generate audio or video. OrcaRouter's API supports sending audio files (e.g., MP3, WAV) and video files (e.g., MP4) as part of the message content.
The Flash variant is optimized for speed and cost, making it ideal for real-time applications: live transcription, interactive multimodal chatbots, quick document summarization, and content moderation across media types. It also excels in scenarios requiring large context, such as analyzing entire meeting transcripts or processing lengthy research papers with embedded figures. Use cases that benefit from both speed and multimodal reasoning—like video captioning or legal document review—are a strong fit. However, for tasks that require deeper reasoning on a single modality (e.g., pure code generation), a specialized model might perform better.
Gemini 3 Flash Preview is priced at $0.50/1M input and $3.00/1M output, which is low for a multimodal model but not the lowest available. If your use case is purely text-only and requires even lower latency or cost, consider dedicated text models like Gemini 2.0 Flash (if available) or similarly priced alternatives. On the other hand, if you need superior reasoning on complex benchmarks (e.g., MATH, GPQA) and have a larger budget, you might opt for a larger model like Gemini 3 Pro or GPT-4o. For high-volume, latency-sensitive, multimodal workloads, this Flash model strikes a good balance.
MMLU-Pro is an expanded version of the Massive Multitask Language Understanding benchmark, covering 57 subjects with more challenging questions. A score of 88.2 indicates that the model correctly answered 88.2% of the questions, placing it among the top-performing models in this evaluation. It reflects strong knowledge and reasoning across diverse domains, from law to physics. This score is competitive with other frontier models, especially considering that Flash models are optimized for speed rather than maximum accuracy. The provided score is the headline benchmark fact for this model and should be interpreted as a general indicator of capability, not a guarantee for every specific task.
While specific latency numbers are not provided, Flash models from Google are designed for high throughput and low latency. The model is intended to be faster than larger counterparts like Gemini 3 Pro, making it suitable for real-time interactions. Users can expect lower per-request times compared to non-Flash variants, though actual speed depends on factors such as input length, output length, and concurrent usage. OrcaRouter does not introduce additional latency beyond the provider's API. For best performance, keep prompts concise and use streaming responses. The large output limit (65,536 tokens) may increase generation time for longer answers.
The MMLU-Pro score (88.2) suggests strong reasoning and general knowledge. The model's ability to handle a 1M-token context and multiple input modalities (text, image, file, audio, video) gives it an edge in multimodal tasks over models that only support text. Flash models traditionally excel at speed and cost efficiency. The high output token limit (65,536) allows generation of long-form summaries or extended analyses. These strengths make it a versatile option for applications that need to process varied data types quickly, at scale.
As a Flash preview, it may not match the accuracy of larger, non-Flash models on specialized benchmarks (e.g., coding competitions, multi-step math reasoning). The model does not generate images or audio—only text outputs. Its preview status means it could have intermittent availability or partial feature coverage. Also, while the context window is large, very long inputs will be truncated if they exceed 1,048,576 tokens. The MMLU-Pro score is a single data point; real-world performance can vary. For tasks requiring absolute precision in niche domains, validation is recommended.
Pricing is $0.50 per million input tokens and $3.00 per million output tokens. These rates are provided by Google and are billed at the provider rate—OrcaRouter adds zero markup. Input tokens include all text and visual/audio tokens encoded from files, images, and video. Output tokens are only the text generated by the model. There are no additional fees for API access through OrcaRouter beyond the per-token costs. This transparent pricing allows you to estimate costs easily: for example, a 1,000-token input and 500-token output would cost roughly $0.0005 + $0.0015 = $0.002.
At $0.50/1M input and $3.00/1M output, Gemini 3 Flash Preview is priced competitively for a multimodal model with a 1M context window. Larger models like Gemini 3 Pro or GPT-4o typically cost more per token, especially for output. Smaller text-only models may be cheaper (e.g., Gemini 2.0 Flash at $0.10/$0.40 per 1M tokens, if applicable). For multimodal workloads, this model offers a cost-effective middle ground. The zero markup from OrcaRouter ensures you pay exactly Google's rate. If your usage is high, even a small per-token difference can matter, so compare against your specific task's token profile.
The provided pricing facts do not include any caching discounts or volume tiers. Google may offer reduced rates for cached tokens in some models, but that is not confirmed for Gemini 3 Flash Preview. OrcaRouter's pricing reflects the raw per-token cost with no markup, so you are not paying extra for the gateway. For large-scale deployments, contact Google directly for potential enterprise agreements. Always check the latest pricing on OrcaRouter's pricing page or within your account dashboard, as rates are subject to change by the provider. Currently, the stated per-million token rates are what apply.
You use OrcaRouter's OpenAI-compatible API at the base URL https://api.orcarouter.ai/v1. The model ID is "google/gemini-3-flash-preview". Authentication is handled via an API key from OrcaRouter. For example, with curl you can send a POST request to /v1/chat/completions. The request format follows OpenAI's Chat Completions structure. You must include the model parameter set to the exact model ID. OrcaRouter handles the routing to Google's endpoint. Ensure your API key has appropriate permissions. Streaming is supported by setting stream: true in the request body.
You can use standard OpenAI Chat Completions parameters: model, messages (with role: system, user, assistant), temperature, top_p, max_tokens (capped at 65,536), stop sequences, frequency_penalty, presence_penalty, logit_bias, and stream. For multimodal messages, include base64-encoded data or file IDs in the content array. The model automatically detects input modality. Note that not all OpenAI features (like function calling) may be supported—check OrcaRouter documentation. The context window of 1,048,576 tokens is applied to the total message token count. If exceeded, the oldest messages are truncated.
If you are already using Google's Vertex AI or Gemini API, migrating requires minimal changes. Adjust your API base URL to https://api.orcarouter.ai/v1, point to the model ID "google/gemini-3-flash-preview", and replace your Google authentication with an OrcaRouter API key. The message format is similar—OrcaRouter translates between OpenAI and Google formats. For multimodal content, ensure you follow OrcaRouter's attachment guidelines (e.g., base64-encoded data with proper MIME types). Test with a small number of requests to confirm parity. OrcaRouter provides support documentation and example code for various languages.
The response structure matches OpenAI's Chat Completion format: an object with choices, usage, and id. Each choice includes a message object with role and content. Token usage is reported as prompt_tokens and completion_tokens. The finish_reason field indicates why generation stopped (stop, length). Streaming responses emit delta objects. If you're using an OpenAI SDK, you only need to change the API key and base URL. OrcaRouter's endpoint behaves like an OpenAI API, simplifying integration. Any quirks specific to Google's model (e.g., safety filters) are preserved; check the response for potential refusal messages.
Gemini 3 Flash Preview is the next generation of Google's Flash model, offering a larger context window (1,048,576 vs. the previous 32K–1M depending on version) and improved multimodal support including video. The MMLU-Pro score of 88.2 for 3 Flash Preview suggests better reasoning than reported scores for 2 Flash (not provided, but typically lower). Pricing for 2 Flash is lower per token, making it more budget-friendly for simple tasks. Gemini 3 Flash Preview is faster and more capable for complex multimodal reasoning, but 2 Flash remains a cost-effective alternative for text-only or simple image tasks.
GPT-4o from OpenAI also supports multimodal inputs (text, image, audio) and has a context window of 128K tokens, significantly smaller than Gemini 3 Flash Preview's 1M tokens. GPT-4o pricing varies but is generally higher per token (e.g., $2.50/1M input, $10/1M output). Gemini 3 Flash Preview's lower cost and larger context make it more suitable for long-form or high-volume multimodal tasks. However, GPT-4o may have different strengths in creative writing or code generation, and its benchmarks (e.g., MMLU) are comparable. The choice depends on context size needs and integration preferences.
Within Google's lineup, Gemini 3 Pro is a larger, more expensive model designed for maximum accuracy (higher MMLU-Pro scores). Flash is the cost- and speed-optimized variant. Gemini 2 Flash is older and cheaper but with smaller context and possibly lower benchmark scores. Gemini 3 Flash Preview offers a middle ground: near-Pro-level reasoning (88.2 MMLU-Pro) at a fraction of the cost. For users who need the largest context and best speed, 3 Flash Preview is ideal. For premium reasoning on smaller inputs, 3 Pro may be better. For simple tasks, 2 Flash or other lightweight models could suffice.
from openai import OpenAI
client = OpenAI(
base_url="https://api.orcarouter.ai/v1",
api_key="$ORCAROUTER_API_KEY",
)
response = client.chat.completions.create(
model="google/gemini-3-flash-preview",
messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)| Input / 1M tokens | $0.500 |
| Output / 1M tokens | $3.00 |
| Cache read / 1M | $0.050 |
| Currency | USD |