The Web MCP is here with 5,000 Monthly Credits
Bright Data has launched a free tier of The Web MCP, the first and only MCP designed to give LLMs and autonomous agents unblocked, real-time access to the web.
Now you can: /scrape
/search
/crawl
/navigate
The live web with 5,000 free monthly credits.
Built for developers and researchers working with open-source tools:
Key features:
Integrates seamlessly with your workflow, integrates with LangChain, AutoGPT, OpenAgents, and custom stacks.
Enables agents to dynamically expand their context with live web data
All major LLMs and IDEs are supported (locally hosted, SSE, and Streamable HTTP)
No setup fees, no credit card required.
Whether you're building agentic workflows, RAG pipelines, or real-time assistants, The Web MCP is the protocol layer that connects your models to the open web.
Start building with 5,000 free monthly credits here →
Thanks to Bright Data for partnering today!
12 MCP, RAG, and Agents Cheat Sheets for AI Engineers
Here’s a recap of several visual summaries posted in the Daily Dose of Data Science newsletter.
1) Function calling & MCP for LLMs:
Before MCPs became popular, AI workflows relied on traditional Function Calling for tool access.
Now, MCP (Model Context Protocol) is introducing a shift in how developers structure tool access and orchestration for Agents.
2) 4 stages of training LLMs from scratch
This visual covers the 4 stages of building LLMs from scratch that are used to make them applicable for real-world use cases.
These are:
Pre-training
Instruction fine-tuning
Preference fine-tuning
Reasoning fine-tuning
3) 3 prompting techniques for reasoning in LLMs
A large part of what makes LLM apps so powerful isn't just their ability to predict the next token accurately, but their ability to reason through it.
This visual covers three popular prompting techniques that help LLMs think more clearly before they answer.
4) Train LLMs using other LLMs
LLMs don't just learn from raw text; they also learn from each other:
Llama 4 Scout and Maverick were trained using Llama 4 Behemoth.
Gemma 2 and 3 were trained using Google's proprietary Gemini.
Distillation helps us do so, and the visual below depicts three popular techniques.
5) Supervised & Reinforcement fine-tuning in LLMs
RFT lets us transform any open-source LLM into a reasoning powerhouse without any labeled data.
This visual covers the differences between supervised fine-tuning and reinforcement fine-tuning.
6) Transformer vs. Mixture of Experts
Mixture of Experts (MoE) is a popular architecture that uses different "experts" to improve Transformer models.
Experts are like which are feed-forward networks but smaller compared to those in traditional Transformer models.
7) RAG vs Agentic RAG
Naive RAG retrieves once and generates once, it cannot dynamically search for more info, and it cannot reason through complex queries.
Also, there's little adaptability. The LLM can't modify its strategy based on the problem at hand.
Agentic RAG solves this.
8) 5 Agentic AI design patterns
Agentic behaviors allow LLMs to refine their output by incorporating self-evaluation, planning, and collaboration!
This visual depicts the 5 most popular design patterns employed in building AI agents.
9) 5 levels of Agentic AI systems
Agentic systems don't just generate text; they make decisions, call functions, and even run autonomous workflows.
The visual explains 5 levels of AI agency—from simple responders to fully autonomous agents.
10) Traditional RAG vs HyDE
One critical problem with the traditional RAG system is that questions are not semantically similar to their answers. As a result, several irrelevant chunks get retrieved during the retrieval due to a higher cosine similarity than the documents actually containing the answer.
HyDE solves this by generating a hypothetical response first.
11) RAG vs Graph RAG
Answering questions that need global context is difficult with traditional RAG since it only retrieves the top-k relevant chunks.
Graph RAG makes it more robust with graph structures, which helps it build long-range dependencies instead of local text grouping that happens in RAG.
12) KV caching
KV caching is a technique used to speed up LLM inference.
In a gist, instead of redundantly computing KV vectors of all context tokens, we cache them. This saves time during inference.
Thanks for reading!