Gemini 3 Flash Preview

Name: Google: Gemini 3 Flash Preview API
Brand: Google

google/gemini-3-flash-preview

by Google · 2025-12-17

Google Gemini 3 Flash Preview – Multimodal model with 1M token context, 88.2 MMLU-Pro, accessible via OrcaRouter.

Endpoints:/v1/chat/completions /v1beta/models/{model}:generateContent

ctx1.05M tokens

Inputtext + image + file + audio + video

Outputtext

p50 TTFT3.75 s

from openai import OpenAI

client = OpenAI(
    base_url="https://api.orcarouter.ai/v1",
    api_key="$ORCAROUTER_API_KEY",
)

INPUT$0.50/ 1M tokens

OUTPUT$3.00/ 1M tokens

p50 TTFT3.75 s7d

p95 TTFT10.00 s7d

TRAFFIC1.1Mtokens / 7d

Get the Gemini 3 Flash Preview API →▶ Try in playground </> Use via API

What is Google Gemini 3 Flash Preview?

Google Gemini 3 Flash Preview is a multimodal model developed by Google, optimized for speed and large-context processing. It accepts input in text, image, file, audio, and video formats, and can generate up to 65,536 tokens of output. The model has a context window of 1,048,576 tokens, allowing it to reason across very long sequences. It scores 88.2 on the MMLU-Pro benchmark, indicating strong performance across a wide range of academic and reasoning tasks. This preview version is available through OrcaRouter's OpenAI-compatible API under the model ID google/gemini-3-flash-preview.

Who is the target audience for this model?

Gemini 3 Flash Preview targets developers and organizations building applications that require fast, multimodal reasoning with large context. It is well-suited for use cases like video analysis, long-document digest, and real-time audio-video understanding. The model's pricing—$0.50 per million input tokens and $3.00 per million output tokens—makes it accessible for startups and enterprises alike. Because it is a preview, early adopters can evaluate its capabilities before a stable release. OrcaRouter provides seamless access to this model, including OpenAI-compatible endpoints and zero markup on provider rates.

What multimodal inputs does it support?

Gemini 3 Flash Preview supports five input modalities: text, image, file, audio, and video. Text can be plain or structured; images can include photos, diagrams, and screenshots; files cover formats like PDFs and documents; audio includes speech and music; video can be processed with both visual and audio tracks. The model can combine multiple modalities in a single prompt—for example, analyzing a video while also reading an attached PDF. This versatility allows it to handle complex, mixed-media tasks without requiring separate pipelines. Input tokens are counted based on each modality's specific tokenizer rules.

What is the preview status and how stable is it?

Gemini 3 Flash Preview is a pre-release version of Google's third-generation Flash model. As a preview, it may undergo changes in behavior, performance, and availability. Google typically updates preview models based on user feedback, and they may eventually replace preview endpoints with stable releases. While the model is functional and suitable for testing and development, production deployments should monitor for updates. OrcaRouter mirrors the provider’s endpoint, ensuring that any changes from Google are reflected promptly. The model ID google/gemini-3-flash-preview will remain consistent unless Google modifies its naming.

Code samples

from openai import OpenAI

client = OpenAI(
    base_url="https://api.orcarouter.ai/v1",
    api_key="$ORCAROUTER_API_KEY",
)

response = client.chat.completions.create(
    model="google/gemini-3-flash-preview",
    messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)

Pricing

Input / 1M tokens	$0.500
Output / 1M tokens	$3.00
Cache read / 1M	$0.050
Currency	USD

Performance

last 7 days

p50 TTFT

3.75 s

Output speed

851 tok/s

p95 TTFT

10.00 s

Error rate

Public benchmarks

Last evaluated 2025-12-17

37.8

AA Coding

Better than 47% of models compared

35.0

AA Intelligence

Better than 35% of models compared

55.7

AA Math

Better than 32% of models compared

AIME 2025

55.7

GPQA Diamond

81.2

Humanity's Last Exam

14.1

IFBench

55.1

LiveCodeBench

79.7

Long-Context Recall

48.0

MMLU-Pro

88.2

SciCode

49.9

TerminalBench Hard

31.8

τ²-Bench

43.3

Source: artificialanalysis.ai

More from Google

See all models from google →

Gemini 3.1 Pro PreviewFlagship

google/gemini-3.1-pro-preview

$2.00 in · $12.00 out / 1M

1.05M ctx· quality 10/10

Gemini 3.1 Pro Preview Custom Tools

google/gemini-3.1-pro-preview-customtools

$4.00 in · $18.00 out / 1M

1.05M ctx· quality 10/10

Gemini 3.5 FlashCheapest

google/gemini-3.5-flash

$1.50 in · $9.00 out / 1M

1.05M ctx· quality 9/10

FAQ

What is the cost to use Gemini 3 Flash Preview?

Pricing is $0.50 per million input tokens and $3.00 per million output tokens, billed at the provider rate with zero markup added by OrcaRouter.

What is the context window size?

The context window is 1,048,576 tokens for input and the model can generate up to 65,536 output tokens.

What are the supported input modalities?

Text, image, file, audio, and video are all accepted as input. Output is text-only.

How does it compare to Gemini 2 Flash?

Gemini 3 Flash Preview has a larger context window (1M vs up to 1M but often smaller), higher MMLU-Pro score (88.2), and expanded multimodal support including video. It is faster and more capable for complex tasks, but Gemini 2 Flash is cheaper per token.

How does OrcaRouter handle data privacy?

OrcaRouter passes your requests to Google's API. Data handling follows Google's privacy policy. OrcaRouter does not log or store your content beyond what is necessary to process the request. Review both providers' policies for details.

Can I call Gemini 3 Flash Preview using an OpenAI-compatible API?

Yes. Use OrcaRouter's API at https://api.orcarouter.ai/v1 with model ID "google/gemini-3-flash-preview". Authentication uses an OrcaRouter API key. The request and response formats follow OpenAI's Chat Completions schema.

What are the model's main strengths?

High inference speed, large 1M-token context, multimodal input (text, image, file, audio, video), strong MMLU-Pro benchmark (88.2), and low cost relative to larger models.

Is Gemini 3 Flash Preview available for production?

It is a preview version, meaning it may have changes, intermittent availability, or limited support. It is suitable for testing and development; for critical production workloads, consider using the stable release once available.

How do I estimate token usage for multimodal inputs?

Each modality has its own tokenization. Images, audio, and video are split into tokens based on resolution and duration. OrcaRouter reports token usage in the API response. You can also consult Google's documentation for detailed token counting rules.

What happens if I exceed the context window?

Inputs exceeding 1,048,576 tokens will be truncated from the oldest content. The model will ignore the excess tokens. Ensure your messages fit within the limit by monitoring total tokens in your request.

Embed this badge

Paste into your blog post

Google: Gemini 3 Flash Preview•$0.50/M in•3750ms p50•via OrcaRouter

HTML <a href="https://www.orcarouter.ai/models/google/gemini-3-flash-preview" target="_blank"> <img src="https://www.orcarouter.ai/embed/google/gemini-3-flash-preview.svg" alt="Google: Gemini 3 Flash Preview on OrcaRouter" /> </a>

Markdown [![Google: Gemini 3 Flash Preview](https://www.orcarouter.ai/embed/google/gemini-3-flash-preview.svg)](https://www.orcarouter.ai/models/google/gemini-3-flash-preview)

Gemini 3 Flash Preview

What is Google Gemini 3 Flash Preview?

Who is the target audience for this model?

What multimodal inputs does it support?

What is the preview status and how stable is it?

What can Gemini 3 Flash Preview do with text and images?

How does it handle audio and video?

What are the best use cases for this Flash model?

When might you choose a cheaper or more powerful model instead?

What does the MMLU-Pro score of 88.2 mean?

How fast is Gemini 3 Flash Preview for inference?

What are the model’s key strengths based on benchmarks?

What are the honest limitations of Gemini 3 Flash Preview?

What is the pricing structure for Gemini 3 Flash Preview?

How does the pricing compare to other models?

Are there any caching discounts or volume pricing?

How do I call Gemini 3 Flash Preview via OrcaRouter's API?

What parameters are available when calling the model?

How to migrate from Google's native API to OrcaRouter?

Is there any difference in response format compared to OpenAI?

How does Gemini 3 Flash Preview compare to Gemini 2 Flash?

How does it compare to GPT-4o?

How does it compare to other Google models?

Code samples

Pricing

Performance

Public benchmarks

More from Google

FAQ

Embed this badge

Gemini 3 Flash Preview

Model details

What is Google Gemini 3 Flash Preview?

Who is the target audience for this model?

What multimodal inputs does it support?

What is the preview status and how stable is it?

What can Gemini 3 Flash Preview do with text and images?

How does it handle audio and video?

What are the best use cases for this Flash model?

When might you choose a cheaper or more powerful model instead?

What does the MMLU-Pro score of 88.2 mean?

How fast is Gemini 3 Flash Preview for inference?

What are the model’s key strengths based on benchmarks?

What are the honest limitations of Gemini 3 Flash Preview?

What is the pricing structure for Gemini 3 Flash Preview?

How does the pricing compare to other models?

Are there any caching discounts or volume pricing?

How do I call Gemini 3 Flash Preview via OrcaRouter's API?

What parameters are available when calling the model?

How to migrate from Google's native API to OrcaRouter?

Is there any difference in response format compared to OpenAI?

How does Gemini 3 Flash Preview compare to Gemini 2 Flash?

How does it compare to GPT-4o?

How does it compare to other Google models?

Code samples

Pricing

Performance

Public benchmarks

More from Google

FAQ

Embed this badge