Categorization of Clustering Algorithms

6 types of clustering algorithms in a single frame.

Nov 14, 2024

Dynamiq is a completely open-source, low-code, and all-in-one Gen AI framework for developing LLM applications with AI Agents and RAGs.

Here’s what stood out for me about Dynamiq:

It seamlessly orchestrates multiple AI agents.
It facilitates RAG applications.
It easily manages complex LLM workflows.
It has a highly intuitive API.

All this makes it 10x easier to build production-ready AI applications.

If you're an AI Engineer, Dynamiq will save you hours of tedious orchestrations!

Start building agentic AI/LLM apps today:

Dynamiq GitHub

Thanks to Dynamiq for partnering with us today.

Categorization of clustering algorithms

There’s a whole world of clustering algorithms beyond KMeans, which a data scientist must be familiar with.

In the following visual, we have summarized 6 different types of clustering algorithms:

1) Centroid-based: Cluster data points based on proximity to centroids.

2) Connectivity-based: Cluster points based on proximity between clusters.

3) Density-based: Cluster points based on their density. It is more robust to clusters with varying densities and shapes than centroid-based clustering.

DBSCAN is a popular algorithm here, but it has high run-time.
DBSCAN++ solves this.
It is a faster and more scalable alternative to DBSCAN.
We covered both DBSCAN and DBSCAN++ in detail here.

4) Graph-based: Cluster points based on graph distance.

5) Distribution-based: Cluster points based on their likelihood of belonging to the same distribution.

Gaussian Mixture Models is one example.
We discussed it in detail and implemented it from scratch (only NumPy) here: Gaussian Mixture Models.

6) Compression-based: Transform data to a lower dimensional space and then perform clustering.

👉 Over to you: What other clustering algorithms will you include here?

P.S. For those wanting to develop “Industry ML” expertise:

We have discussed several other topics (with implementations) in the past that align with such topics.

Develop "Industry ML" Skills

Here are some of them:

Learn sophisticated graph architectures and how to train them on graph data: A Crash Course on Graph Neural Networks – Part 1
Learn techniques to run large models on small devices: Quantization: Optimize ML Models to Run Them on Tiny Hardware
Learn how to generate prediction intervals or sets with strong statistical guarantees for increasing trust: Conformal Predictions: Build Confidence in Your ML Model’s Predictions.
Learn how to identify causal relationships and answer business questions: A Crash Course on Causality – Part 1
Learn how to scale ML model training: A Practical Guide to Scaling ML Model Training.
Learn techniques to reliably roll out new models in production: 5 Must-Know Ways to Test ML Models in Production (Implementation Included)
Learn how to build privacy-first ML systems: Federated Learning: A Critical Step Towards Privacy-Preserving Machine Learning.
Learn how to compress ML models and reduce costs: Model Compression: A Critical Step Towards Efficient Machine Learning.

All these resources will help you cultivate key skills that businesses and companies care about the most.

SPONSOR US

Get your product in front of 450k+ data scientists and other tech professionals.

Our newsletter puts your products and services directly in front of an audience that matters — thousands of leaders, senior data scientists, machine learning engineers, data analysts, etc., who have influence over significant tech decisions and big purchases.

To ensure your product reaches this influential audience, reserve your space here or reply to this email to ensure your product reaches this influential audience.

Daily Dose of Data Science

Discussion about this post