Gemini 3 Flash Preview

google/gemini-3-flash-preview
by Google · 2025-12-17

Google Gemini 3 Flash Preview – Multimodal model with 1M token context, 88.2 MMLU-Pro, accessible via OrcaRouter.

ctx1.05M tokens
Inputtext + image + file + audio + video
Outputtext
p50 TTFT3.75 s
INPUT$0.50/ 1M tokens
OUTPUT$3.00/ 1M tokens
p50 TTFT3.75 s7d
p95 TTFT10.00 s7d
TRAFFIC1.1Mtokens / 7d

Model details

What is Google Gemini 3 Flash Preview?

Google Gemini 3 Flash Preview is a multimodal model developed by Google, optimized for speed and large-context processing. It accepts input in text, image, file, audio, and video formats, and can generate up to 65,536 tokens of output. The model has a context window of 1,048,576 tokens, allowing it to reason across very long sequences. It scores 88.2 on the MMLU-Pro benchmark, indicating strong performance across a wide range of academic and reasoning tasks. This preview version is available through OrcaRouter's OpenAI-compatible API under the model ID google/gemini-3-flash-preview.

Who is the target audience for this model?

Gemini 3 Flash Preview targets developers and organizations building applications that require fast, multimodal reasoning with large context. It is well-suited for use cases like video analysis, long-document digest, and real-time audio-video understanding. The model's pricing—$0.50 per million input tokens and $3.00 per million output tokens—makes it accessible for startups and enterprises alike. Because it is a preview, early adopters can evaluate its capabilities before a stable release. OrcaRouter provides seamless access to this model, including OpenAI-compatible endpoints and zero markup on provider rates.

What multimodal inputs does it support?

Gemini 3 Flash Preview supports five input modalities: text, image, file, audio, and video. Text can be plain or structured; images can include photos, diagrams, and screenshots; files cover formats like PDFs and documents; audio includes speech and music; video can be processed with both visual and audio tracks. The model can combine multiple modalities in a single prompt—for example, analyzing a video while also reading an attached PDF. This versatility allows it to handle complex, mixed-media tasks without requiring separate pipelines. Input tokens are counted based on each modality's specific tokenizer rules.

What is the preview status and how stable is it?

Gemini 3 Flash Preview is a pre-release version of Google's third-generation Flash model. As a preview, it may undergo changes in behavior, performance, and availability. Google typically updates preview models based on user feedback, and they may eventually replace preview endpoints with stable releases. While the model is functional and suitable for testing and development, production deployments should monitor for updates. OrcaRouter mirrors the provider’s endpoint, ensuring that any changes from Google are reflected promptly. The model ID google/gemini-3-flash-preview will remain consistent unless Google modifies its naming.

Code samples

from openai import OpenAI

client = OpenAI(
    base_url="https://api.orcarouter.ai/v1",
    api_key="$ORCAROUTER_API_KEY",
)

response = client.chat.completions.create(
    model="google/gemini-3-flash-preview",
    messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)

Pricing

Input / 1M tokens$0.500
Output / 1M tokens$3.00
Cache read / 1M$0.050
CurrencyUSD

Performance

p50 TTFT
3.75 s
Output speed
851 tok/s
p95 TTFT
10.00 s
Error rate
0%

Public benchmarks

37.8
AA Coding
Better than 47% of models compared
35.0
AA Intelligence
Better than 35% of models compared
55.7
AA Math
Better than 32% of models compared
AIME 2025
55.7
GPQA Diamond
81.2
Humanity's Last Exam
14.1
IFBench
55.1
LiveCodeBench
79.7
Long-Context Recall
48.0
MMLU-Pro
88.2
SciCode
49.9
TerminalBench Hard
31.8
τ²-Bench
43.3
Source: artificialanalysis.ai

FAQ

What is the cost to use Gemini 3 Flash Preview?
Pricing is $0.50 per million input tokens and $3.00 per million output tokens, billed at the provider rate with zero markup added by OrcaRouter.
What is the context window size?
The context window is 1,048,576 tokens for input and the model can generate up to 65,536 output tokens.
What are the supported input modalities?
Text, image, file, audio, and video are all accepted as input. Output is text-only.
How does it compare to Gemini 2 Flash?
Gemini 3 Flash Preview has a larger context window (1M vs up to 1M but often smaller), higher MMLU-Pro score (88.2), and expanded multimodal support including video. It is faster and more capable for complex tasks, but Gemini 2 Flash is cheaper per token.
How does OrcaRouter handle data privacy?
OrcaRouter passes your requests to Google's API. Data handling follows Google's privacy policy. OrcaRouter does not log or store your content beyond what is necessary to process the request. Review both providers' policies for details.
Can I call Gemini 3 Flash Preview using an OpenAI-compatible API?
Yes. Use OrcaRouter's API at https://api.orcarouter.ai/v1 with model ID "google/gemini-3-flash-preview". Authentication uses an OrcaRouter API key. The request and response formats follow OpenAI's Chat Completions schema.
What are the model's main strengths?
High inference speed, large 1M-token context, multimodal input (text, image, file, audio, video), strong MMLU-Pro benchmark (88.2), and low cost relative to larger models.
Is Gemini 3 Flash Preview available for production?
It is a preview version, meaning it may have changes, intermittent availability, or limited support. It is suitable for testing and development; for critical production workloads, consider using the stable release once available.
How do I estimate token usage for multimodal inputs?
Each modality has its own tokenization. Images, audio, and video are split into tokens based on resolution and duration. OrcaRouter reports token usage in the API response. You can also consult Google's documentation for detailed token counting rules.
What happens if I exceed the context window?
Inputs exceeding 1,048,576 tokens will be truncated from the oldest content. The model will ignore the excess tokens. Ensure your messages fit within the limit by monitoring total tokens in your request.

Embed this badge

Google: Gemini 3 Flash Preview$0.50/M in3750ms p50via OrcaRouter
HTML <a href="https://www.orcarouter.ai/models/google/gemini-3-flash-preview" target="_blank"> <img src="https://www.orcarouter.ai/embed/google/gemini-3-flash-preview.svg" alt="Google: Gemini 3 Flash Preview on OrcaRouter" /> </a>
Markdown [![Google: Gemini 3 Flash Preview](https://www.orcarouter.ai/embed/google/gemini-3-flash-preview.svg)](https://www.orcarouter.ai/models/google/gemini-3-flash-preview)