0:00
/
0:00

Building a Browser Automation Agent

...powered by 100% local stack.

Get 10M Free GPT-5 Tokens to Build in Factory

GPT-5 is now the default model in Factory, which is found to be highly agentic, detail-oriented, and comprehensive, particularly when searching and planning.

Within Factory, GPT-5 enables you to delegate software development tasks to Droids with confidence.

GPT-5 works best in Factory for:

  • Complex refactoring

  • Incident response

  • Technical architecture design

The team is also offering 10M free tokens for folks to get started with GPT-5!

Redeem your 10M tokens to build production-grade software here →

Get 10M free GPT-5 tokens

Thanks to FactoryAI for partnering today!


Building a Browser Automation Agent

The browser is still the most universal interface, with 4.3 billion pages visited every day!

The video above gives a quick demo of how we can completely automate it with a local stack:

  • Stagehand open-source AI browser automation.

  • CrewAI for orchestration.

  • Ollama to run gpt-oss.

Let's build it!

System overview:

  • The user enters an automation query.

  • Planner Agent creates an automation plan.

  • The Browser Automation Agent executes it using the Stagehand tool.

  • The Response Agent generates a response.

Now, let's dive into the code!

Define LLM

We use three LLMs:

  • Planner LLM: Creates a structured plan for an automation task.

  • Automation LLM: Executes the plan using the Stagehand tool.

  • Response LLM: Synthesizes final response.

Define Automation Planner Agent

The planner agent receives an automation task from the user and creates a structured layout for execution by the browser agent.

Define Stagehand Browser Tool

A custom CrewAI tool utilizes AI to interact with web pages.

It leverages Stagehand's computer-use agentic capabilities to autonomously navigate URLs, perform page actions, and extract data to answer questions.

Define Browser Automation Agent

Browser Automation Agent utilizes the aforementioned Stagehand tool for autonomous browser control and plan execution.

Define Response Synthesis Agent

Synthesis Agent acts as final quality control, refining output from the browser automation agent to generate a polished response.

Create CrewAI Agentic Flow

Finally, we connect our Agents within a workflow using CrewAI Flows.

Done!

Here’s our multi-agent browser automation workflow in action, where we asked it to find the top contributor on the Stagehand GitHub repo:

It initiated a local browser session, navigated the web page, and extracted the information.

You can find the Stagehand GitHub repo here →

You can find the code in the GitHub Repository →

Thanks for reading!


P.S. For those wanting to develop “Industry ML” expertise:

At the end of the day, all businesses care about impact. That’s it!

  • Can you reduce costs?

  • Drive revenue?

  • Can you scale ML models?

  • Predict trends before they happen?

We have discussed several other topics (with implementations) that align with such topics.

Develop "Industry ML" Skills

Here are some of them:

  • Learn sophisticated graph architectures and how to train them on graph data.

  • So many real-world NLP systems rely on pairwise context scoring. Learn scalable approaches here.

  • Learn how to run large models on small devices using Quantization techniques.

  • Learn how to generate prediction intervals or sets with strong statistical guarantees for increasing trust using Conformal Predictions.

  • Learn how to identify causal relationships and answer business questions using causal inference in this crash course.

  • Learn how to scale and implement ML model training in this practical guide.

  • Learn techniques to reliably test new models in production.

  • Learn how to build privacy-first ML systems using Federated Learning.

  • Learn 6 techniques with implementation to compress ML models.

All these resources will help you cultivate key skills that businesses and companies care about the most.

Discussion about this video

User's avatar