[Hands-on] Build an MCP-powered Audio Analysis Toolkit

Playback speed

Share post at current time

Share from 0:00

0:00

[Hands-on] Build an MCP-powered Audio Analysis Toolkit

...explained step-by-step with code.

Avi Chawla

Jun 23, 2025

In today’s newsletter:

Build an MCP-powered audio analysis toolkit.
Fire Enrich: Open-source data enrichment tool.
Discriminative vs. Generative Models.
MCP and A2A, explained visually.

Build an MCP-powered audio analysis toolkit

Today, we are releasing another MCP demo, which is an MCP-driven Audio Analysis toolkit that accepts an audio file and lets you:

1. Transcribe it
2. Perform sentiment analysis
3. Summarize it
4. Identify named entities mentioned
5. Extract broad ideas
6. Interact with it

…all via MCPs.

Here’s our tech stack:

AssemblyAI for transcription and audio analysis.
Claude Desktop as the MCP host.

Here's our workflow:

User's audio input is sent to AssemblyAI via a local MCP server.
AssemblyAI transcribes it while providing the summary, speaker labels, sentiment, and topics.
Post-transcription, the user can also chat with audio.

Let’s implement this!

Transcription MCP tool

This tool accepts an audio input from the user and transcribes it using AssemblyAI.

We also store the full transcript to use in the next tool.

Audio analysis tool

Next, we have a tool that returns specific insights from the transcript, like speaker labels, sentiment, topics, and summary.

Based on the user’s input query, the corresponding flags will be automatically set to True when the Agent will prepare the tool call via MCP:

Create MCP Server

Now, we’ll set up an MCP server to use the tools we created above.

Integrate MCP server with Claude Desktop

Go to File → Settings → Developer → Edit Config and add the following code.

Once the server is configured, Claude Desktop will show the two tools we built above in the Tools menu:

transcribe_audio
get_audio_data

And now you can interact with it:

We have also created a Streamlit UI for the audio analysis app.

You can upload the audio, extract insights, and chat with it using AssemblyAI’s LeMUR.

And that was our MCP-powered audio analysis toolkit.

Here's the workflow again for your reference:

User-provided audio is sent to AssemblyAI through the MCP server.
AssemblyAI processes it, MCP host returns requested insights.

You can find the code in this repo →

Fire Enrich: Open-source data enrichment tool

Firecrawl released an open-source Clay alternative.

Just upload a CSV with emails, and AI agents automatically fill in missing data like decision makers, company size, and more.

GitHub repo → (don’t forget to star)

Discriminative vs. Generative Models

Here’s a visual that depicts how generative and discriminative models differ:

We have seen this topic come up in several interviews, so let’s learn more.

Discriminative models:

learn decision boundaries that separate different classes.
maximize the conditional probability: P(Y|X) — Given X, maximize the probability of label Y.
are specifically meant for classification tasks.

Generative models:

maximize the joint probability: P(X, Y)
learn the class-conditional distribution P(X|Y)
are typically not preferred to solve downstream classification tasks.

Since generative models learn the underlying distribution, they can generate new samples. But this is not possible with discriminative models.

Furthermore, generative models possess discriminative properties, i.e., they can be used for classification tasks (if needed). But discriminative models do not possess generative properties.

We covered this in more detail in this newsletter issue →

MCP and A2A, explained visually

In a gist:

Agent2Agent (A2A) protocol lets AI agents connect to other Agents.
Model context protocol lets AI Agents connect to Tools/APIs.

So using A2A, while two Agents might be talking to each other...they themselves might be communicating to MCP servers.

In that sense, they do not compete with each other.

The thing about A2A protocol is that Agents can communicate and collaborate with other Agents, even if they are built on different platforms or frameworks.

In MCP, tools (functions) are represented with docstrings.
In A2A, Agents are represented using an Agent Card, which is a JSON file that lists the Agent's capabilities, input, authentication schemes, etc.

For practical details, we built an Agent network with A2A Protocol here →

Thanks for reading!