
ИИ стал поверхностью атак в 2025 году. В 2026 году мы делаем защиту бесплатной.
Prompt injection is now the #1 risk to LLM applications — and it cannot be patched. Today, OrcaRouter Security Research is releasing our agent Firewall and input/output Guardrails free to every user: same API key, one switch in your console, no code changes. This is the threat landscape that made it non-negotiable — and the architecture that contains it.
By OrcaRouter Security Research · June 2026
In June 2025, attackers exfiltrated corporate data from Microsoft 365 Copilot. The victim did nothing wrong. They didn't click a link, open an attachment, or approve a prompt. They received an email. Their AI assistant later read it — and obeyed the instructions hidden inside it. The chain, disclosed by Aim Security as EchoLeak (CVE-2025-32711), gathered sensitive context from mail, files, and chat history and smuggled it out through an auto-loading image URL. Zero clicks.
EchoLeak was not an outlier. It was a preview. A year later, we can say plainly what the public incident record now demonstrates: your AI systems are your attack surface, and most organizations cannot see the attacks against them. Today we are publishing The AI Threat Report 2026 and, alongside it, releasing the two controls we built to contain these attacks — free, at the gateway, for every OrcaRouter user.
The year the attacks went agentic — and the leaks got industrial
The 2026 incident record reads like a stress test of every assumption enterprise security was built on:
- Chat & Ask AI left roughly 300 million private chat messages from more than 25 million users exposed through a Firebase misconfiguration (404 Media; Malwarebytes, Jan 2026).
- Sears Home Services exposed 3.7 million AI chat transcripts and call recordings — names, addresses, emails — spanning 2024–2026 (ExpressVPN; Cybernews, Mar 2026).
- An attacker chained a single CVE (CVE-2026-39987 in the marimo notebook tool) into a live LLM agent that extracted cloud credentials, pulled an SSH key from AWS Secrets Manager, and exfiltrated an entire internal PostgreSQL database in under two minutes (Sysdig; The Hacker News, May 2026).
- Microsoft and Salesforce both shipped patches for AI-agent data-leak flaws. In CVE-2026-21520, a poisoned SharePoint field steered Copilot into emailing customer data to an attacker — and the data left even after a safety mechanism flagged the attack (Dark Reading).
The economics underneath these headlines have inverted in the attacker's favor. Telemetry from production LLM applications shows the average successful attack completing in 42 seconds, with 90% of them leaking sensitive data (Pillar Security). 13% of organizations have already been breached through an AI model or application — and 97% of those lacked basic AI access controls (IBM, 2025). OWASP's Q1 2026 round-up put numbers on the trend: prompt-injection attacks rose 340% year over year.
And a new loss class needs no breach at all. Denial-of-wallet — a hijacked or runaway agent that simply spends — has been observed burning $46,000 a day (Sysdig, "LLMjacking"). No data is stolen. There is only a bill.

Why your current stack can't see any of it
Traditional security assumes a boundary: trusted inside, untrusted outside, controls at the seam. Language models dissolve that boundary, because a model's input is also its programming. Every email, document, web page, and tool result an agent reads can carry instructions it will follow. There is no reliable, general mechanism by which today's models separate content to process from commands to obey.
That is why prompt injection holds the #1 position in the OWASP Top 10 for LLM Applications — and why it will not be "patched" the way a buffer overflow is patched. It is a structural property of the medium. Your web application firewall inspects the request and sees a perfectly valid API call; the attack is in the words. Your per-request checks pass every single step of a chained attack, because the damage lives in the sequence — volume, repetition, and spend against time — not in any one call.
The conclusion is uncomfortable but clear: AI security is not a model-training problem. It is an architecture problem — and it is solvable with the same discipline enterprises already apply to every other production system.

The defense is architectural: two planes, six layers, at the gateway
Every attack above succeeds against unscoped authority and fails against scoped, policed, audited authority. Containing them requires controlling two distinct planes:
The content plane — what the model reads and writes. This is the job of Guardrails.
The action plane — what the agent does: the tools it calls, the networks it reaches, the money it spends. This is the job of the Firewall.
A defense that watches only one plane will miss the chained attacks that produce headlines, because the most damaging incidents cross both: an injection arrives as content, then cashes out as an action. OrcaRouter places six independent, auditable layers between a request and a regret:
1. Scoped identity — every agent calls through its own key carrying allowed models, an IP allow-list, a hard spend cap, and an expiry. An out-of-scope request dies before any content is read.
2. Input guardrails — injection and jailbreak rules, PII detection and masking, secret blocking, and a semantic LLM-judge that catches what regex cannot.
3. The action firewall — every tool call, MCP dispatch, and network egress is judged against ordered, default-deny policy with six verdicts: allow, audit, deny, sanitize (redact arguments and proceed), pending-approval (hold irreversible steps for a human), and cap-cost (hard-stop a run at a spend ceiling). A hijacked agent cannot reach a tool, a host, or a dollar you never listed.
4. Output guardrails — the reply is screened on the way out for unsafe output, PII, and secrets, with grounding checks. This is the layer that catches EchoLeak's exfiltration URL before it leaves.
5. Anomaly detection — behavioral baselines flag what static rules can't predict: the same call hammered in a tight window, spend spiking against a learned hour-of-week baseline, a tool-to-tool transition the workspace has never made.
6. Signed audit — every match, verdict, approval, and policy change lands in a tamper-evident trail, correlated by agent run and session, exportable as evidence.
The decisive property is placement. These controls live at the gateway, in the request path, so they bind to credentials rather than application code — enforceable across every team and framework, with no agent rewrites.
We don't grade our own homework
Security claims are worth exactly as much as the evidence behind them, so we put ours in the open. OrcaRouter's Guardrails and Firewall ship with an evaluation harness that scores them against more than 80 open-source red-team corpora — every one cited and licensed:
HarmBench (MIT; ICML 2024), JailbreakBench (NeurIPS 2024), and AdvBench (Zou et al., 2023) for harmful-behavior and jailbreak robustness;
NVIDIA's garak (Apache-2.0), the open LLM vulnerability scanner, for injection and encoding attacks;
AgentDojo (NeurIPS 2024) — the agent prompt-injection benchmark the US and UK AI Safety Institutes used in joint red-teaming — to grade the action-plane firewall specifically;
TruthfulQA and others for grounding and hallucination.
OrcaRouter itself integrates open tooling directly: OSV for dependency CVEs and Semgrep for code that transits a prompt. No black box. No "trust us."

Built for the audit that's coming
On August 2, 2026, the EU AI Act becomes fully applicable, and "show me" replaces "tell me" as the regulatory baseline. The same evidentiary instinct is spreading through SOC 2 scopes, cyber-insurance questionnaires, and procurement reviews. OrcaRouter ships 36 compliance framework packs — including OWASP LLM Top 10, NIST AI RMF, ISO/IEC 42001, EU AI Act, SOC 2, HIPAA, PCI DSS, and GDPR — that materialize controls into your workspace and generate signed evidence. One well-placed control layer produces the attestation for all of them at once.
What's launching today — and why it's free
OrcaRouter Firewall + Guardrails are now free for every user. Same API key. One switch in your console. No code to change.
We made them free deliberately. The report's data is unambiguous on this point: prohibition without a paved road produces more shadow AI, not less — and shadow AI already drives one in five breaches at a $670,000 premium (IBM, 2025). The remedy that works is economic as much as technical: make the governed path the easiest path. A control you have to pay extra for, integrate by hand, and justify to a budget committee is a control most teams will skip — and skipping it is exactly how organizations end up explaining the incident reports this report described in advance.
So there is nothing to integrate and nothing to buy. You attach Guardrails and a Firewall policy to the key you already use and follow the rollout that survives contact with production: observe (run in audit mode and let your real traffic write the baseline), shadow (run the real policy in would-block mode until false positives approach zero), then enforce (flip verdicts live, with human approval reserved for the genuinely irreversible). Most teams convert in weeks — and keep the controls on.
The bottom line
The 2026 threat landscape is not a reason to slow AI adoption. It is the operating manual for surviving it. Every attack in this report beats unscoped authority and dies against scoped, policed, audited authority — and that property is buildable now, at the gateway, in weeks, for free.
Read the full report: The AI Threat Report 2026 · Turn it on: OrcaRouter 🐋
