DeepSeek V4 Flash

Name: DeepSeek: DeepSeek V4 Flash API
Brand: DeepSeek

deepseek/deepseek-v4-flash

ToolsJSONReasoning

by DeepSeek · 2026-04-24

DeepSeek V4 Flash efficient MoE — 284B total / 13B active params, 1M context, optimized for fast everyday workloads.

Endpoints:/v1/chat/completions /v1/responses

ctx1.05M tokens

Max output384K

Inputtext

p50 TTFT496 ms

from openai import OpenAI

client = OpenAI(
    base_url="https://api.orcarouter.ai/v1",
    api_key="$ORCAROUTER_API_KEY",
)

INPUT$0.15/ 1M tokens

OUTPUT$0.29/ 1M tokens

p50 TTFT496 ms7d

p95 TTFT1.95 s7d

TRAFFIC2439.9Mtokens / 7d

Get the DeepSeek V4 Flash API →▶ Try in playground </> Use via API

What is DeepSeek V4 Flash?

DeepSeek V4 Flash is a large language model from the Chinese AI company DeepSeek. It processes text inputs only and is designed for scenarios that demand a large context window (1,048,576 tokens) and a high maximum output (384,000 tokens). The model is accessible through OrcaRouter's OpenAI-compatible API, using the model ID "deepseek/deepseek-v4-flash". It is billed at provider rates with zero markup, meaning users pay exactly what OrcaRouter pays DeepSeek.

Who should use DeepSeek V4 Flash?

Developers and organizations that work with very long documents – such as legal firms processing contracts, researchers analyzing full-length papers, or engineers debugging extensive codebases – will find the 1M token context window useful. The 384k output limit also suits tasks like generating long-form reports or detailed step-by-step plans. Users who need a cost-effective option for extended interactions (input at $0.14/M tokens, output at $0.28/M tokens) without markup on OrcaRouter may prefer this model over alternatives with higher per-token pricing.

What input modalities does DeepSeek V4 Flash support?

Code samples

Call from any SDK

OpenAI-compatible — keep the SDK you already use

OpenAI SDKhttps://api.orcarouter.ai/v1

from openai import OpenAI

client = OpenAI(
    base_url="https://api.orcarouter.ai/v1",
    api_key="$ORCAROUTER_API_KEY",
)

response = client.chat.completions.create(
    model="deepseek/deepseek-v4-flash",
    messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)

Supported parameters

include_reasoning
logprobs
max_tokens
reasoning
response_format
stop
stream
stream_options
temperature
thinking
tool_choice
tools
top_logprobs
top_p
user_id

Pricing

Input / 1M tokens	$0.147
Output / 1M tokens	$0.295
Cache read / 1M	$0.020
Currency	USD

Cost calculator

Tokens / month10MM

70%

Estimate based on list price

Token & cost estimator

Expected output tokens

Input tokens: 20Cost per request: $0.000150

Estimate only — actual token counts depend on the provider's tokenizer.

Performance

last 7 days

p50 TTFT

496 ms

Output speed

97.7 tok/s

p95 TTFT

1.95 s

Error rate

0.16%

Public benchmarks

Last evaluated 2026-06-25

42.0

AA Coding

Better than 52% of models compared

#51 of 106

47.0

AA Intelligence

Better than 57% of models compared

#47 of 110

50.0

AA Math

Better than 26% of models compared

#60 of 81

GPQA Diamond

41.0 index

Humanity's Last Exam

32.1

IFBench

79.2

Long-Context Recall

63.0

MMLU-Pro

57.0 index

SciCode

44.9

TerminalBench Hard

35.6

τ²-Bench

34.0 index

Source: artificialanalysis.ai

Community buzz

What developers are saying this week

Hacker News50 mentions · 7d

Reddit73 mentions · 7d

How it compares

	DeepSeek V4 Flash	DeepSeek V4 Pro	DeepSeek V3	deepseek/deepseek-reasoner
Input $/M	$0.15	$0.44	$0.15	$0.15
Output $/M	$0.29	$0.88	$0.29	$0.29
Context	1.0M	1.0M	1.0M	1.0M
Quality	7/10	8/10	5/10	5/10
Compare side-by-side		Compare side-by-side	Compare side-by-side	Compare side-by-side

FAQ

How much does DeepSeek V4 Flash cost on OrcaRouter?

Input tokens cost $0.14 per 1 million tokens, and output tokens cost $0.28 per 1 million tokens. OrcaRouter charges exactly the provider rate with zero markup.

What is the context window of DeepSeek V4 Flash?

The context window is 1,048,576 tokens (1 million tokens). The maximum output per request is 384,000 tokens.

What are the main strengths of DeepSeek V4 Flash?

Its strengths are the very large context window and high output token limit, combined with a low price point and a strong τ²-Bench score of 95.0, indicating good reasoning and tool-use capabilities.

How does DeepSeek V4 Flash compare to GPT-4 or Claude?

DeepSeek V4 Flash offers much larger context (1M vs 128k/200k) and output (384k vs ~4k) at a fraction of the cost. However, it is text-only and may have less broad general knowledge or safety tuning.

Does OrcaRouter mark up the price of DeepSeek V4 Flash?

No. OrcaRouter passes through the provider rate with zero markup. You pay $0.14 per 1M input and $0.28 per 1M output exactly as charged by DeepSeek.

How do I call DeepSeek V4 Flash via the OrcaRouter API?

Use the OpenAI-compatible base URL https://api.orcarouter.ai/v1, set the model parameter to "deepseek/deepseek-v4-flash", and include your OrcaRouter API key in the Authorization header.

What data handling policies apply to DeepSeek V4 Flash?

Data passes through OrcaRouter to DeepSeek's servers in China. Review OrcaRouter's privacy policy and DeepSeek's terms. No additional data protections are explicitly offered.

Is DeepSeek V4 Flash multimodal?

No, it only accepts text inputs. For images, audio, or video, you would need to preprocess them into text or use a different model.

What parameters can I set when using DeepSeek V4 Flash?

Standard OpenAI chat completions parameters: model, messages, max_tokens, temperature, top_p, frequency_penalty, presence_penalty, stop, stream, etc. The max_tokens cannot exceed 384,000.

Which use cases are best suited for DeepSeek V4 Flash?

Long-document analysis, code generation with extended reasoning, multi-turn conversations needing deep context, and tasks that produce large outputs such as detailed reports or plans.

Embed this badge

Paste into your blog post

DeepSeek: DeepSeek V4 Flash•$0.15/M in•496ms p50•via OrcaRouter

HTML <a href="https://www.orcarouter.ai/models/deepseek/deepseek-v4-flash" target="_blank"> <img src="https://www.orcarouter.ai/embed/deepseek/deepseek-v4-flash.svg" alt="DeepSeek: DeepSeek V4 Flash on OrcaRouter" /> </a>

Markdown [![DeepSeek: DeepSeek V4 Flash](https://www.orcarouter.ai/embed/deepseek/deepseek-v4-flash.svg)](https://www.orcarouter.ai/models/deepseek/deepseek-v4-flash)

Model card as data

GET /api/public/models/deepseek/deepseek-v4-flashOpen

Machine-readable:/llms.txt /llms-full.txt

DeepSeek V4 Flash

What is DeepSeek V4 Flash?

Who should use DeepSeek V4 Flash?

What input modalities does DeepSeek V4 Flash support?

What are the key strengths of DeepSeek V4 Flash?

When should you consider a cheaper or smaller model instead?

What best practices improve results with DeepSeek V4 Flash?

What does the τ²-Bench score of 95.0 represent?

How fast is DeepSeek V4 Flash?

What are the honest limitations of DeepSeek V4 Flash?

How is DeepSeek V4 Flash priced on OrcaRouter?

Does OrcaRouter offer caching or discounts for DeepSeek V4 Flash?

How does the cost compare to other models on OrcaRouter?

How do I call DeepSeek V4 Flash via OrcaRouter's API?

What parameters are available for DeepSeek V4 Flash?

Can I migrate an existing OpenAI application to use DeepSeek V4 Flash via OrcaRouter?

How does OrcaRouter handle data with DeepSeek V4 Flash?

How does DeepSeek V4 Flash compare to GPT-4 Turbo?

How does DeepSeek V4 Flash compare to Claude 3 Opus?

How does DeepSeek V4 Flash compare to Mistral Large?

When should I choose DeepSeek V4 Flash over other models on OrcaRouter?