12 MCP, RAG, and Agents Cheat Sheets for AI Engineers

...explained with visuals.

Aug 12, 2025

The Web MCP is here with 5,000 Monthly Credits

Bright Data has launched a free tier of The Web MCP, the first and only MCP designed to give LLMs and autonomous agents unblocked, real-time access to the web.

Now you can: /scrape /search /crawl /navigate The live web with 5,000 free monthly credits.

Get 5000 free Web MCP credits

Built for developers and researchers working with open-source tools:

Key features:

Integrates seamlessly with your workflow, integrates with LangChain, AutoGPT, OpenAgents, and custom stacks.
Enables agents to dynamically expand their context with live web data
All major LLMs and IDEs are supported (locally hosted, SSE, and Streamable HTTP)
No setup fees, no credit card required.

Whether you're building agentic workflows, RAG pipelines, or real-time assistants, The Web MCP is the protocol layer that connects your models to the open web.

Start building with 5,000 free monthly credits here →

Thanks to Bright Data for partnering today!

12 MCP, RAG, and Agents Cheat Sheets for AI Engineers

Here’s a recap of several visual summaries posted in the Daily Dose of Data Science newsletter.

1) Function calling & MCP for LLMs:

Before MCPs became popular, AI workflows relied on traditional Function Calling for tool access.
Now, MCP (Model Context Protocol) is introducing a shift in how developers structure tool access and orchestration for Agents.
Learn more here →

2) 4 stages of training LLMs from scratch

This visual covers the 4 stages of building LLMs from scratch that are used to make them applicable for real-world use cases.
These are:
- Pre-training
- Instruction fine-tuning
- Preference fine-tuning
- Reasoning fine-tuning
Learn more here →

3) 3 prompting techniques for reasoning in LLMs

A large part of what makes LLM apps so powerful isn't just their ability to predict the next token accurately, but their ability to reason through it.
This visual covers three popular prompting techniques that help LLMs think more clearly before they answer.
Learn more here →

4) Train LLMs using other LLMs

LLMs don't just learn from raw text; they also learn from each other:
Llama 4 Scout and Maverick were trained using Llama 4 Behemoth.
Gemma 2 and 3 were trained using Google's proprietary Gemini.
Distillation helps us do so, and the visual below depicts three popular techniques.
Learn more here →

5) Supervised & Reinforcement fine-tuning in LLMs

RFT lets us transform any open-source LLM into a reasoning powerhouse without any labeled data.
This visual covers the differences between supervised fine-tuning and reinforcement fine-tuning.
Learn more here →

6) Transformer vs. Mixture of Experts

Mixture of Experts (MoE) is a popular architecture that uses different "experts" to improve Transformer models.
Experts are like which are feed-forward networks but smaller compared to those in traditional Transformer models.
Learn more here →

7) RAG vs Agentic RAG

Naive RAG retrieves once and generates once, it cannot dynamically search for more info, and it cannot reason through complex queries.
Also, there's little adaptability. The LLM can't modify its strategy based on the problem at hand.
Agentic RAG solves this.
Learn more here →

8) 5 Agentic AI design patterns

Agentic behaviors allow LLMs to refine their output by incorporating self-evaluation, planning, and collaboration!
This visual depicts the 5 most popular design patterns employed in building AI agents.
Learn more here →

9) 5 levels of Agentic AI systems

Agentic systems don't just generate text; they make decisions, call functions, and even run autonomous workflows.
The visual explains 5 levels of AI agency—from simple responders to fully autonomous agents.
Learn more here →

10) Traditional RAG vs HyDE

One critical problem with the traditional RAG system is that questions are not semantically similar to their answers. As a result, several irrelevant chunks get retrieved during the retrieval due to a higher cosine similarity than the documents actually containing the answer.
HyDE solves this by generating a hypothetical response first.
Learn more here →

11) RAG vs Graph RAG

Answering questions that need global context is difficult with traditional RAG since it only retrieves the top-k relevant chunks.
Graph RAG makes it more robust with graph structures, which helps it build long-range dependencies instead of local text grouping that happens in RAG.
Learn more here →

12) KV caching

KV caching is a technique used to speed up LLM inference.
In a gist, instead of redundantly computing KV vectors of all context tokens, we cache them. This saves time during inference.
Learn more here →

Thanks for reading!