Home/Blog/Prompt Injection And Authorization
AI Security

Why Prompt Injection Is An Authorization Problem

Prompt injection influences what a model wants to do. The incident happens when the surrounding system lets that influenced model use someone else's authority to read data, call tools, or change state.

The dangerous part is borrowed authority

NIST defines prompt injection as an attack that exploits untrusted input being combined with a prompt from a higher-trust party. Its work on agent hijacking shows the operational consequence: malicious instructions inside data can cause an agent to take unintended actions. A 2026 NIST security presentation goes one step further and describes indirect prompt injection as usually involving a confused-deputy issue.

The deputy is the agent. It can read the user's inbox, query internal data, call an MCP tool, or approve a workflow because the system gave it credentials. The attacker does not need those credentials directly. They only need to influence the deputy that already has them.

Attack path versus least-privilege path
Broad authority: successful confused deputy
  1. User asks agent to summarize an external document.
  2. Document contains hidden instructions to export secrets.
  3. Agent treats untrusted data as a new command.
  4. Tool accepts the agent's broad credential.
  5. Secrets leave the system under legitimate identity.
Scoped authority: injection contained
  1. User grants a read-only, task-specific capability.
  2. Document may still influence the model.
  3. Agent proposes an export outside original intent.
  4. Policy checks scope, resource, destination, and risk.
  5. Action is denied or requires explicit approval.

Prompt defenses reduce probability; authorization limits impact

Classifiers, structured prompts, content sanitization, model training, and red teaming all matter. Google describes a layered defense that also includes confirmation for risky operations. Anthropic explicitly warns that no browser agent is immune to prompt injection. That means backend design must assume a malicious instruction can occasionally reach the planning loop.

Authorization is the deterministic boundary after that probabilistic failure. Every proposed action must be checked against the original principal, current task, permitted resources, destination, data class, side-effect level, budget, and expiration. The model can suggest an action; it cannot grant itself permission.

PrincipalWhose authority is being used, and can responsibility be attributed?
IntentDoes this action still match the user's approved task?
CapabilityIs the exact tool, resource, operation, and destination allowed?
RiskDoes the action require step-up approval, dry-run, or denial?

Authorize the action, not the conversation

A user approving “help with my email” is not approving every future action an email might instruct the agent to take. Authorization cannot be inferred from conversational proximity. It must be evaluated again at the action boundary, with credentials scoped to that action and short enough to expire with the task.

The most useful design separates data-reading from action-taking. A low-privilege component can inspect untrusted documents. A privileged executor receives a structured action proposal and checks policy without inheriting the document's instructions. High-risk actions require an independent approval channel that shows the target, side effect, and data being released.

LayerControlWhat it prevents or contains
Input boundaryClassifiers, sanitization, provenance labels, instruction/data separation.Reduces the chance that hostile content influences the model.
Agent identityDedicated identity, responsible principal, short-lived task credentials.Prevents anonymous or permanently privileged actions.
Tool gatewayPer-action authorization, argument validation, destination allowlists.Blocks actions outside task intent even after model compromise.
Data boundaryRow, tenant, field, and purpose-level access controls.Limits what a hijacked agent can read or disclose.
Approval boundaryStep-up confirmation for destructive, external, or financial actions.Stops silent privilege use and makes side effects visible.
Evidence boundaryLog principal, input provenance, decision, tool call, result, and policy.Makes confused-deputy incidents detectable and reconstructable.

What I would build

I would build an authorization broker in front of every agent tool. The broker would exchange a user's broad session for a narrow capability token tied to one task, one tool, allowed resources, approved destinations, a maximum side-effect class, and a short expiration. Each tool call would be checked independently.

The visual security product would show two traces for every action: the influence trace explaining which user input, retrieved document, memory, or tool output shaped the proposal; and the authority trace explaining which identity, scope, policy, and approval allowed or denied execution.

The design principle

You may not be able to guarantee that an agent never believes a hostile instruction. You can guarantee that believing it is not enough to obtain authority. Prompt injection becomes manageable when model influence and backend permission are treated as separate systems.