5 Practical Defenses for Prompt Injection in LLMs

...explained visually, with usage.

May 04, 2026

InsForge: The first backend built for AI coding agents

InsForge (open-source) solves the most frustrating bottleneck in AI-assisted development: backend configuration.

Agents can build a beautiful frontend in minutes, set up API routes, and lay out the component architecture. But the moment it needs to enable auth or configure a database, it completely falls apart.

The reason is that every backend platform today (Firebase, Supabase, AWS) was designed for humans clicking through dashboards. When agents try to interact with these platforms through MCP servers, they get fragmented context like table names without schema details or auth endpoints without security configs. So agents end up guessing, hallucinating, and generating broken code.

InsForge fixes this at the infrastructure level rather than the tooling level. It introduces a semantic layer where every backend primitive (auth, database, storage, AI features) is exposed as structured, machine-readable capabilities with metadata, constraints, and documentation baked in.

Primitives are also aware of each other, so auth knows about database permissions and storage understands access policies.

Because agents get a complete, structured context instead of inferring what’s missing, InsForge delivers:

Roughly 2x more accuracy than Supabase MCP
1.6x faster task completion
30% better token efficiency

To test this out, we built a full ChatGPT clone with auth, database, storage, and AI integration, built entirely with Claude Code using InsForge as the backend. No manual configuration was needed, not because of any magic, but because the agent could reason about the entire backend as one coherent system.

InsForge works with any AI coding agent, including Cursor, Claude Code, Windsurf, and Codex. You can use all the primitives together or just pick what you need, like database only or auth only.

It’s fully open-source under Apache 2.0.

Find GitHub repo here → (don’t forget to star it ⭐️)

Practical defenses for prompt injection

OWASP ranks prompt injection as the #1 threat to LLM applications.

Most agent deployments still rely on a single defense, i.e., the system prompt instructions telling the model to ignore suspicious input.

That’s a behavioral constraint on the model, not a structural one, and an attacker only needs to bypass it once.

There are five composable defenses that move the security boundary to actual architectural enforcement. Let’s walk through each.

To learn these defenses from full LLMOps principles with code, start here →
We published the above LLMOps course, which covers the fundamentals of AI engineering & LLMs, Building blocks of LLMs like tokenization, embeddings, attention, architectural designs and training, decoding, generation parameters, the LLM Application Lifecycle, context engineering, prompt management, defense, control, memory, temporal context, evaluation, tool use, red teaming, Adaptive LLMs, and Serving.

Label before use

The simplest defense is to wrap untrusted text in explicit tags before it enters the prompt.

A developer prompt, for instance, would sandwich email body content inside <Untrusted> and </Untrusted> delimiters.

This gives the model a structural signal that the enclosed content is data, not instructions.

Google’s Spotlight research formalized this idea and showed that even simple delimiters reduce injection success rates significantly.

Encoding untrusted input in Base64 before passing it to the model is a more aggressive variant of the same principle.

Instruction hierarchy

Not all instructions should carry equal weight.

The instruction hierarchy defense assigns explicit trust ranks, like developer/system prompts rank highest, user messages rank medium, and third-party context (tool outputs, retrieved documents) ranks lowest.

When instructions conflict, the highest-ranked source wins.

OpenAI published a paper on this approach, training models to respect a tiered instruction priority.

The core insight is that an instruction embedded in a fetched webpage should never override a constraint set by the developer.

Principle of least privilege

If your agent only needs to read emails, don’t give them tools to send, draft, or delete them.

This is borrowed directly from OS-level security, and it’s the easiest defense to implement in practice.

Most agent frameworks let you define which tools are available per session or per task.

Restricting the tool set to the minimum required capability means that even a successful injection can’t escalate beyond the agent’s permissions.

An attacker who tricks the model into calling send_email gets nothing if that tool was never mounted.

Human in the loop

For sensitive actions (sending messages, modifying data, making purchases), require explicit user approval before execution.

The agent generates a plan, the user reviews it, and only then does the system run it.

This is a runtime defense, not a model-level one. It doesn’t prevent injection from happening, but it prevents injected instructions from causing real-world harm.

The tradeoff is latency and user friction, so most implementations apply it selectively to high-risk actions rather than every tool call.

Planner-executor separation

This is the most architecturally robust defense. Instead of one LLM that both reasons over untrusted data and calls tools, you split the work across two models.

The Planner (reasoning layer) sees untrusted data but has no tool access.

It produces a structured plan of actions. The Executor (action layer) has tools but never directly consumes untrusted input. It only executes the structured plan the Planner produced.

Google DeepMind’s CaMeL framework formalized this pattern, drawing from traditional security concepts like control flow integrity and information flow control.

CaMeL practically solved the AgentDojo security benchmark by ensuring that untrusted data can never influence the program’s control flow.

The Dual LLM pattern (in 2023) proposed the same core idea, where a privileged LLM plans and a quarantined LLM handles untrusted content, with strict isolation between them.

Combining defenses

No single defense is sufficient on its own. The strongest setups layer all five to label untrusted content, enforce instruction hierarchy, restrict tools to what’s needed, require approval for sensitive actions, and separate planning from execution when the stakes justify the complexity.

To learn these defenses from full LLMOps principles with code, start here →

We published the above LLMOps course, which covers the fundamentals of AI engineering & LLMs, Building blocks of LLMs like tokenization, embeddings, attention, architectural designs and training, decoding, generation parameters, the LLM Application Lifecycle, context engineering, prompt management, defense, control, memory, temporal context, evaluation, tool use, red teaming, Adaptive LLMs, and Serving.

👉 Over to you: Which of these five defenses are you using in your agent pipelines today, and which ones have you been putting off?

The main visual of this newsletter was inspired by ByteByteGo’s post on a similar topic.

Thanks for reading!

Daily Dose of Data Science

Discussion about this post

Ready for more?