Daily Dose of Data Science

Daily Dose of Data Science

Share this post

Daily Dose of Data Science
Daily Dose of Data Science
Categorization of Clustering Algorithms
Copy link
Facebook
Email
Notes
More
User's avatar
Discover more from Daily Dose of Data Science
A free newsletter for continuous learning about data science and ML, lesser-known techniques, and how to apply them in 2 minutes. We keep things no-fluff. Join 100,000+ data scientists from top companies like Google, NVIDIA, Microsoft, Uber, etc.
Already have an account? Sign in

Categorization of Clustering Algorithms

6 types of clustering algorithms in a single frame.

Avi Chawla's avatar
Avi Chawla
Nov 14, 2024
18

Share this post

Daily Dose of Data Science
Daily Dose of Data Science
Categorization of Clustering Algorithms
Copy link
Facebook
Email
Notes
More
Share

​Develop Agentic AI/LLM apps 10x faster [open-source]​

​Dynamiq is a completely open-source, low-code, and all-in-one Gen AI framework for developing LLM applications with AI Agents and RAGs.

Here’s what stood out for me about ​Dynamiq​:

  • It seamlessly orchestrates multiple AI agents.

  • It facilitates RAG applications.

  • It easily manages complex LLM workflows.

  • It has a highly intuitive API.

All this makes it 10x easier to build production-ready AI applications.

If you're an AI Engineer, ​Dynamiq will save you hours of tedious orchestrations!

​Start building agentic AI/LLM apps today:

Dynamiq GitHub

Thanks to Dynamiq for partnering with us today.


Categorization of clustering algorithms

There’s a whole world of clustering algorithms beyond KMeans, which a data scientist must be familiar with.

In the following visual, we have summarized 6 different types of clustering algorithms:

1) Centroid-based: Cluster data points based on proximity to centroids.

2) Connectivity-based: Cluster points based on proximity between clusters.

3) Density-based: Cluster points based on their density. It is more robust to clusters with varying densities and shapes than centroid-based clustering.

  • DBSCAN is a popular algorithm here, but it has high run-time.

  • ​DBSCAN++​ solves this.

  • It is a faster and more scalable alternative to DBSCAN.

  • We covered both DBSCAN and DBSCAN++ in detail here.

4) Graph-based: Cluster points based on graph distance.

5) Distribution-based: Cluster points based on their likelihood of belonging to the same distribution.

  • ​Gaussian Mixture Models is one example.

  • We discussed it in detail and implemented it from scratch (only NumPy) here: ​Gaussian Mixture Models.

6) Compression-based: Transform data to a lower dimensional space and then perform clustering.

👉 Over to you: What other clustering algorithms will you include here?

Thanks for reading Daily Dose of Data Science! Subscribe below and receive a free data science PDF (530+ pages) with 150+ core data science and machine learning lessons.


P.S. For those wanting to develop “Industry ML” expertise:

We have discussed several other topics (with implementations) in the past that align with such topics.

Develop "Industry ML" Skills

Here are some of them:

  • Learn sophisticated graph architectures and how to train them on graph data: A Crash Course on Graph Neural Networks – Part 1

  • Learn techniques to run large models on small devices: Quantization: Optimize ML Models to Run Them on Tiny Hardware

  • Learn how to generate prediction intervals or sets with strong statistical guarantees for increasing trust: Conformal Predictions: Build Confidence in Your ML Model’s Predictions.

  • Learn how to identify causal relationships and answer business questions: A Crash Course on Causality – Part 1

  • Learn how to scale ML model training: A Practical Guide to Scaling ML Model Training.

  • Learn techniques to reliably roll out new models in production: 5 Must-Know Ways to Test ML Models in Production (Implementation Included)

  • Learn how to build privacy-first ML systems: Federated Learning: A Critical Step Towards Privacy-Preserving Machine Learning.

  • Learn how to compress ML models and reduce costs: Model Compression: A Critical Step Towards Efficient Machine Learning.

All these resources will help you cultivate key skills that businesses and companies care about the most.


SPONSOR US

Get your product in front of 450k+ data scientists and other tech professionals.

Our newsletter puts your products and services directly in front of an audience that matters — thousands of leaders, senior data scientists, machine learning engineers, data analysts, etc., who have influence over significant tech decisions and big purchases.

To ensure your product reaches this influential audience, reserve your space here or reply to this email to ensure your product reaches this influential audience.


Subscribe to Daily Dose of Data Science

A free newsletter for continuous learning about data science and ML, lesser-known techniques, and how to apply them in 2 minutes. We keep things no-fluff. Join 100,000+ data scientists from top companies like Google, NVIDIA, Microsoft, Uber, etc.
Emily's avatar
swagc's avatar
Jayadratha Gayen's avatar
Raj's avatar
Khushi Gupta's avatar
18 Likes
18

Share this post

Daily Dose of Data Science
Daily Dose of Data Science
Categorization of Clustering Algorithms
Copy link
Facebook
Email
Notes
More
Share

Discussion about this post

User's avatar
FREE Daily Dose of Data Science PDF
Collection of posts on core DS/ML topics.
Apr 20, 2023 • 
Avi Chawla
566

Share this post

Daily Dose of Data Science
Daily Dose of Data Science
FREE Daily Dose of Data Science PDF
Copy link
Facebook
Email
Notes
More
22
15 DS/ML Cheat Sheets
Single frame summaries of must-know DS/ML concepts and techniques.
Sep 22, 2024 • 
Avi Chawla
121

Share this post

Daily Dose of Data Science
Daily Dose of Data Science
15 DS/ML Cheat Sheets
Copy link
Facebook
Email
Notes
More
You Will NEVER Use Pandas’ Describe Method After Using These Two Libraries
Generate a comprehensive data summary in seconds.
Feb 6, 2024 • 
Avi Chawla
229

Share this post

Daily Dose of Data Science
Daily Dose of Data Science
You Will NEVER Use Pandas’ Describe Method After Using These Two Libraries
Copy link
Facebook
Email
Notes
More
14

Ready for more?

© 2025 Avi Chawla
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More

Create your profile

User's avatar

Only paid subscribers can comment on this post

Already a paid subscriber? Sign in

Check your email

For your security, we need to re-authenticate you.

Click the link we sent to , or click here to sign in.