In today’s newsletter:
Trace and evaluate any AI/LLM application.
Get Agents to browse the web like humans.
9 MCP projects for AI Engineers.
Trace and evaluate any AI/LLM application!
Building an Agent is only the first step.
The real challenge begins when you test it on real-world scenarios, and that too at scale.
LangWatch lets you add evaluation tracking to any of your existing LLM workflows.
You can keep using pandas and your favorite tools, just add a few lines to start tracking your experiments:
Initialize an eval (line 3)
Decorate the LLM workflow logic method (line 5)
Log the evaluation (line 11)
Done!
You can also integrate LangWatch evaluations into CI/CD workflows so every model update is automatically checked before deployment.
GitHub repo → (don’t forget to star)
Get Agents to browse the web like humans
Stagehand allows Agents to independently control a browser.
Integrating its MCP server with Claude Desktop delivers a more reliable alternative to OpenAI Operator.
Here's us testing it live:
Stagehand bridges the gap between:
brittle traditional automation like Playwright, Selenium, etc., and
unpredictable full-agent solutions like OpenAI Operator.
You can always decide how much control to give to the AI.
100% open-source with 12k+ stars!
9 MCP projects for AI Engineers
We have covered several MCP projects in this newsletter before.
Here’s a recap along with visuals & full code walk-through issues:
#1) 100% local MCP client
An MCP client is a component in an AI app (like Cursor) that establishes connections to external tools. Learn how to build it 100% locally.
#2) MCP-powered Agentic RAG
Learn how to create an MCP-powered Agentic RAG that searches a vector database and falls back to web search if needed.
#3) MCP-powered financial analyst
Build an MCP-powered AI agent that fetches, analyzes & generates insights on stock market trends, right from Cursor or Claude Desktop.
#4) MCP-powered Voice Agent
This project teaches you how to build an MCP-driven voice Agent that queries a database and falls back to web search if needed.
#5) A unified MCP server
This project builds an MCP server to query and chat with over 200+ data sources using natural language through a unified interface powered by MindsDB and Cursor IDE.
#6) MCP-powered shared memory for Claude Desktop and Cursor
Devs use Claude Desktop and Cursor independently with no context sharing. Learn how to add a common memory layer to cross-operate without losing context.
#7) MCP-powered RAG over complex docs
Learn how to use MCP to power an RAG app over complex documents with tables, charts, images, complex layouts, and whatnot.
#8) MCP-powered synthetic data generator
Learn how to build an MCP server that can generate any type of synthetic dataset. It uses Cursor as the MCP host and SDV to generate realistic tabular synthetic data.
#9) MCP-powered deep researcher
ChatGPT has a deep research feature. It helps you get detailed insights on any topic. Learn how you can build a 100% local alternative to it.
👉 Over to you: What other MCP projects would you like to learn about?
Thanks for reading!
Hey @Avi Chawla! Great post!
I would love it if you could check out my last post on Next-Gen Agent Protocols.
https://open.substack.com/pub/kyarmin/p/secure-protocols-for-ai-agents-beyond?utm_source=share&utm_medium=android&r=x5pw3