Today, we're building a brand monitoring app that scraps web mentions of a brand at scale and produces insights about a company.
Tech stack:
Bright Data to scrape data at scale.
CrewAI for orchestration.
ollama to serve DeepSeek-R1 locally.
Here's the workflow:
Use Bright Data to scrape brand mentions across X, Instagram, YouTube, websites, etc.
Invoke platform-specific Crews to analyze the data and generate insights.
Merge all insights to get the final report.
Let's implement this!
The code is linked later in the issue.
Scraping tool
To monitor a brand, we must scrape data across various sources—X, YouTube, Instagram, websites, etc.
Thus, we'll first gather recent search results from Bright Data's SERP API.
Platform-specific scraping function
The above output will contain links to web pages, X posts, YouTube videos, Instagram posts, etc.
To scrape those sources, we use Bright Data's platform-specific scrapers.
Set up DeepSeek R1 locally
We'll serve R1 locally through Ollama.
To do this:
First, we download it locally.
Next, we define it with the CrewAI's LLM class.
Here's the code👇
Crew Setup
We will have multiple Crews, one for each platform (X, Instagram, YouTube, etc.)
Each Crew will have two Agents:
Analysis Agent → It analyses the scraped content.
Writer Agent → It produces insights from the analysis.
Below, let's implement the X Crew!
Note: The implementation for other Crews is available in the GitHub repo linked later.
X Analyst Agent
This Agent analyzes the posts scraped by Bright Data and extracts key insights. It is also assigned a task to do so.
X Writer Agent
The Agent takes the output of the X analyst agent and generates insights.
Create a Flow
Finally, we use CrewAI Flows to orchestrate the workflow:
We start the Flow by using the Scraping tool.
Next, we invoke platform-specific scrapers.
Finally, we invoke platform-specific Crews.
We wrap the app in a clear streamlit interface for interactivity and run the Flow.
When Agents use tools, they run into issues like IP blocks, bot traffic, captcha solvers, etc. This hinders the Agent's execution.
It lets you:
Scrape data for Agents at scale without getting blocked.
Simulate user behavior using advanced browser tools.
Build Agentic apps with real-time and historical web data.
Thanks to Bright Data for working with us on this demo.
Find the code in this GitHub repo: Brand monitoring repo.
Thanks for reading!
Share this post