Hands-on Guide to Red Teaming LLM Apps

...explained in just 10 lines of code.

Nov 18, 2025

MongoDB AI Learning Hub

MongoDB’s new AI learning hub is a goldmine for devs who want to build intelligent, next-gen AI apps:

It includes self-paced tracks, beginner guides, and skill badges to help you level up across key AI stacks like vector search fundamentals, RAG, and much more!

For instance:

What is an AI stack helps you understand the ecosystem behind generative AI apps.
Intro to Memory Management for Agents teaches how to manage memory in AI agents for better context and performance.
And there’s also a hands-on tutorial that guides you on how to build an AI agent using LangGraph.js and MongoDB.

You can find the full repository here →

Thanks to MongoDB for partnering today!

Hands-on guide to Red Teaming LLM apps

In August, OpenAI launched a red-teaming challenge ($500k reward money) to find vulnerabilities in gpt-oss-20b:

Not only this, Meta hired hundreds of red teamers for Llama 2, and Google also has an internal AI red team.

The thing is that evaluating LLMs against correctness, faithfulness, or factual accuracy gives you a good model…

…but not the IDEAL model.

Because none of these metrics tells how easily the model can be exploited to do something it should never do.

A well-crafted prompt can make even the safest model leak PII, generate harmful content, or give away internal data. That’s why every major AI lab treats red teaming as a core part of model development.

In practice, fixing this demands implementing SOTA adversarial strategies like prompt injections, jailbreaking, response manipulation, etc.

Alongside these strategies, you need well-crafted and clever prompts that mimic real hackers.

This will help in evaluating the LLM’s response against PII leakage, bias, toxic outputs, unauthorized access, and harmful content generation.

Lastly, a single-turn chatbot needs different tests than a multi-turn conversational agent.

For instance, single-turn tests focus on immediate jailbreaks, while multi-turn tests manipulate LLMs through conversational grooming and trust-building.

Setting this up is a lot of work to do but...

…but all of this is actually implemented in DeepTeam, a recently trending open-source framework that performs end-to-end LLM red teaming in a few lines of code.

DeepTeam GitHub repi

You can use it to:

Detect 40+ vulnerabilities (like bias, misinformation, PII leakage, over-reliance on context, and harmful content generation)
Simulate 10+ attack methods (like jailbreaking, prompt injection, automated evasion, data extraction, and response manipulation).
Run red teaming without creating any dataset
Generate detailed risk assessments
and much more.

Let’s see this in practice.

Get started by installing DeepTeam:

Below, we have our LLM app we want to perform red teaming on:

We have kept a simple LLM call here for simplicity, but you can have any LLM app here (RAG, Agent, etc.)

So we define the vulnerabilities we want to detect (Bias and Toxicity) and the strategy we want to detect them with (which is Prompt Injection in this case, and it means bias and toxicity will be smartly injected in the prompts):

Bias also accepts “Gender”, “Politics”, and “Religion” as types.
Toxicity accepts “profanity”, “insults”, “threats,” and “mockery” as types.
You can specify multiple types during instantiation.

Done!

Running this script (uv run llm_tests.py) produces a detailed report about the exact prompt produced by DeepTeam, the LLM response, whether the test passed, and the reason for test success/failure:

You can also generate a summary of the risk_assessment object as follows:

Lastly, you can further assess the risk report by logging everything in your Confident AI dashboard:

You can find the full documentation here →

The framework also implements all SOTA red teaming techniques from the latest research.

Once you’ve uncovered your vulnerabilities, DeepTeam also offers guardrails to prevent issues in production.

Lastly, the setup does not require any dataset because adversarial attacks are dynamically simulated at run-time based on the specified vulnerabilities.

You can see the full implementation on GitHub and try it yourself!

But the core insight applies regardless of the framework you use:

LLM security is a red teaming problem, not a benchmarking problem.

You need to think like an attacker from day one.

You can find the DeepTeam GitHub repo here →

And you can find the full documentation here →

👉 Over to you: How do you red team your LLM app?

Thanks for reading!

Neural Foundry

Nov 19

The timing of this couldnt be better with all the security concerns around production LLMs. What really stands out is how DeepTeam automates the adversarial prompt generation instead of relying on manual test cases. I'm curious though, when testing multi-turn conversaional agents, how many turns does the framework typicaly simulate before it stops? Also wondering if theres a way to customize the attack vectors for domain-specific vulnerabilities.

Expand full comment

Daily Dose of Data Science

Discussion about this post

Ready for more?