Gemini 3.5 Flash

Name: Gemini 3.5 Flash API
Brand: google

google/gemini-3.5-flash

by google · 2026-05-23

Google's efficient multimodal model with 1M context, high output, and cost-effective pricing via OrcaRouter.

Endpoints:/v1/chat/completions /v1beta/models/{model}:generateContent

ctx1.05M tokens

Inputtext + image + video + file + audio

Outputtext

p50 TTFT10.00 s

from openai import OpenAI

client = OpenAI(
    base_url="https://api.orcarouter.ai/v1",
    api_key="$ORCAROUTER_API_KEY",
)

INPUT$1.50/ 1M tokens

OUTPUT$9.00/ 1M tokens

p50 TTFT10.00 s7d

p95 TTFT10.00 s7d

TRAFFIC4.5Mtokens / 7d

Get the Gemini 3.5 Flash API →▶ Try in playground </> Use via API

What is Gemini 3.5 Flash?

Gemini 3.5 Flash is a large language model developed by Google, fine-tuned for speed and efficiency. It belongs to the Gemini family and is designed to handle multimodal inputs—text, image, video, file, and audio—while delivering fast responses. The model supports a context window of 1,048,576 tokens, enabling it to process very long sequences, such as entire books, hour-long videos, or extensive code repositories. Its maximum output length of 65,536 tokens allows for lengthy generations, including full reports or extended code files. Gemini 3.5 Flash is accessed through OrcaRouter's OpenAI-compatible API, which means you can integrate it into existing applications with minimal code changes.

Who should use Gemini 3.5 Flash?

Gemini 3.5 Flash is ideal for developers and organizations that need a balance between high throughput, low latency, and cost. It is particularly suited for production environments where inference speed matters, such as real-time chatbots, content moderation pipelines, or automated customer support. The generous context window benefits users who need to analyze large datasets, long documents, or extensive conversation histories without chunking. Additionally, teams building multimodal applications—like image captioning, video summarization, or audio transcription—can leverage its native support for multiple input types. If your workload demands extremely high reasoning capability or complex mathematics, consider a more powerful, slower model instead.

What input modalities does Gemini 3.5 Flash support?

Gemini 3.5 Flash accepts five input modalities: text, image, video, file, and audio. Text inputs can be plain strings or structured messages. Images can be passed as base64-encoded data or URLs; the model can interpret visual content like charts, diagrams, or photographs. Video inputs are supported as sequences of frames or compressed video files, allowing the model to analyze motion and temporal changes. File inputs cover common formats such as PDF, DOCX, or code files; the model can extract and reason over their content. Audio inputs can be raw or compressed (e.g., MP3, WAV), enabling speech transcription and sound analysis. All modalities can be combined in a single request, making Gemini 3.5 Flash a versatile tool for multimodal tasks.

How is Gemini 3.5 Flash accessed through OrcaRouter?

OrcaRouter exposes Gemini 3.5 Flash via its OpenAI-compatible API. The base URL is https://api.orcarouter.ai/v1, and the specific model ID is "google/gemini-3.5-flash". You can call it using any OpenAI SDK or direct HTTP requests, simply by changing the base URL and model name. Authentication is handled through an API key provided by OrcaRouter. The API supports standard chat completions endpoints, streaming, and optional parameters such as temperature, top_p, and max_tokens. OrcaRouter adds zero markup to the provider rate, so you pay exactly $1.50 per 1M input tokens and $9.00 per 1M output tokens. No additional gateway fees are applied.

Code samples

from openai import OpenAI

client = OpenAI(
    base_url="https://api.orcarouter.ai/v1",
    api_key="$ORCAROUTER_API_KEY",
)

response = client.chat.completions.create(
    model="google/gemini-3.5-flash",
    messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)

Pricing

Input / 1M tokens	$1.50
Output / 1M tokens	$9.00
Cache read / 1M	$0.150
Cache write / 1M	$0.083
Currency	USD

Performance

last 7 days

p50 TTFT

10.00 s

Output speed

10766 tok/s

p95 TTFT

10.00 s

Error rate

0.44%

Public benchmarks

Last evaluated 2026-06-25

49.0

AA Coding

Better than 68% of models compared

47.0

AA Intelligence

Better than 58% of models compared

51.0

AA Math

Better than 27% of models compared

GPQA Diamond

45.0 index

MMLU-Pro

59.0 index

τ²-Bench

42.0 index

Source: artificialanalysis.ai

More from google

See all models from google →

Gemini 3.1 Pro PreviewFlagship

google/gemini-3.1-pro-preview

$2.00 in · $12.00 out / 1M

1.05M ctx· quality 10/10

Gemini 3.1 Pro Preview Custom Tools

google/gemini-3.1-pro-preview-customtools

$4.00 in · $18.00 out / 1M

1.05M ctx· quality 10/10

Gemini 3 Flash PreviewCheapest

google/gemini-3-flash-preview

$0.50 in · $3.00 out / 1M

1.05M ctx· quality 9/10

FAQ

How much does Gemini 3.5 Flash cost on OrcaRouter?

Input tokens are $1.50 per 1 million tokens; output tokens are $9.00 per 1 million tokens. OrcaRouter bills at the provider rate with zero markup. There are no additional fees.

What is the context window size of Gemini 3.5 Flash?

It supports a context window of 1,048,576 tokens (about 1 million tokens). This includes both input and output tokens combined.

What are the main strengths of Gemini 3.5 Flash?

It is optimized for low latency, high throughput, and cost efficiency. It supports multimodal inputs (text, image, video, file, audio) and a large context window, making it ideal for real-time applications and long-document processing.

How does Gemini 3.5 Flash compare to Gemini 3.5 Pro?

Flash is faster and cheaper but has lower benchmark performance on complex reasoning and mathematical tasks. Pro is more accurate but slower and more expensive. Flash is better for high-volume, latency-sensitive applications.

How is data handled when using Gemini 3.5 Flash via OrcaRouter?

OrcaRouter acts as a proxy and does not store your data. However, Google's data handling policies apply to the underlying model. OrcaRouter recommends reviewing Google's terms for data retention and privacy.

How do I call Gemini 3.5 Flash using an OpenAI-compatible API?

Use base URL https://api.orcarouter.ai/v1, model ID "google/gemini-3.5-flash", and pass an OrcaRouter API key in the Authorization header. The API supports standard chat completions and streaming.

What output length can Gemini 3.5 Flash generate?

It can generate up to 65,536 tokens per response. This is significantly larger than many models, allowing for long-form content, code, or extended reasoning.

Is there any discount for repeated or cached tokens?

Based on the provided facts, OrcaRouter does not offer caching or volume discounts. Each token is billed at the standard rate regardless of reuse.

Embed this badge

Paste into your blog post

Gemini 3.5 Flash•$1.50/M in•10000ms p50•via OrcaRouter

HTML <a href="https://www.orcarouter.ai/models/google/gemini-3.5-flash" target="_blank"> <img src="https://www.orcarouter.ai/embed/google/gemini-3.5-flash.svg" alt="Gemini 3.5 Flash on OrcaRouter" /> </a>

Markdown [![Gemini 3.5 Flash](https://www.orcarouter.ai/embed/google/gemini-3.5-flash.svg)](https://www.orcarouter.ai/models/google/gemini-3.5-flash)

Gemini 3.5 Flash

What is Gemini 3.5 Flash?

Who should use Gemini 3.5 Flash?

What input modalities does Gemini 3.5 Flash support?

How is Gemini 3.5 Flash accessed through OrcaRouter?

What tasks is Gemini 3.5 Flash best suited for?

When should you choose a cheaper model over Gemini 3.5 Flash?

Can Gemini 3.5 Flash handle streaming and real-time interactions?

What are the best practices for using Gemini 3.5 Flash's context window?

What are Gemini 3.5 Flash's known strengths?

What are Gemini 3.5 Flash's honest limitations?

How does latency compare to other models?

What is the model's output quality for code and structured data?

How is Gemini 3.5 Flash priced on OrcaRouter?

What are the cost trade-offs between input and output tokens?

Does OrcaRouter offer any caching or discount features?

How do costs compare to other models on OrcaRouter?

How do I call Gemini 3.5 Flash via OrcaRouter's API?

What parameters are available for Gemini 3.5 Flash?

Can I migrate from the Google AI or Vertex AI API to OrcaRouter?

What error handling should I expect when using the API?

How does Gemini 3.5 Flash compare to Gemini 3.5 Pro?

How does Gemini 3.5 Flash compare to GPT-4o Mini?

How does Gemini 3.5 Flash compare to Claude 3 Haiku?

What is the main advantage of Gemini 3.5 Flash over open-source models?

Code samples

Pricing

Performance

Public benchmarks

More from google

FAQ

Embed this badge

Gemini 3.5 Flash

Model details

What is Gemini 3.5 Flash?

Who should use Gemini 3.5 Flash?

What input modalities does Gemini 3.5 Flash support?

How is Gemini 3.5 Flash accessed through OrcaRouter?

What tasks is Gemini 3.5 Flash best suited for?

When should you choose a cheaper model over Gemini 3.5 Flash?

Can Gemini 3.5 Flash handle streaming and real-time interactions?

What are the best practices for using Gemini 3.5 Flash's context window?

What are Gemini 3.5 Flash's known strengths?

What are Gemini 3.5 Flash's honest limitations?

How does latency compare to other models?

What is the model's output quality for code and structured data?

How is Gemini 3.5 Flash priced on OrcaRouter?

What are the cost trade-offs between input and output tokens?

Does OrcaRouter offer any caching or discount features?

How do costs compare to other models on OrcaRouter?

How do I call Gemini 3.5 Flash via OrcaRouter's API?

What parameters are available for Gemini 3.5 Flash?

Can I migrate from the Google AI or Vertex AI API to OrcaRouter?

What error handling should I expect when using the API?

How does Gemini 3.5 Flash compare to Gemini 3.5 Pro?

How does Gemini 3.5 Flash compare to GPT-4o Mini?

How does Gemini 3.5 Flash compare to Claude 3 Haiku?

What is the main advantage of Gemini 3.5 Flash over open-source models?

Code samples

Pricing

Performance

Public benchmarks

More from google

FAQ

Embed this badge