The Limitations Of Elbow Curve And What You Should Replace It With
A better alternative to the Elbow curve.
We commonly use the Elbow curve to determine the number of clusters (k
) for KMeans.
However, the Elbow curve:
has a subjective interpretation
involves ambiguity in determining the Elbow point accurately
only considers a within-cluster distance, and more.
Silhouette score is an alternative measure used to evaluate clustering quality.
It is computed as follows:
For every data point (
i
), find:a(i)
: average distance to every other data point within the clusterb(i)
: average distance to every data point in the nearest cluster.
Silhouette score for a specific data point (
i
) is:
Silhouette score for the whole clustering is:
Some properties of the Silhouette score are:
it ranges from [-1,1]
a higher score indicates better clustering
it can be used as an evaluation metric for clustering in the absence of ground truth labels
In contrast to the Elbow curve, the Silhouette score:
provides a quantitative (and objective) measure
involves no ambiguity
considers BOTH within-cluster and between-cluster distance.
The visual below compares the Elbow curve and the Silhouette plot.
It’s clear that the Elbow curve is highly misleading and inaccurate.
In a dataset with 25 clusters:
The Elbow curve depicts 4 as the number of optimal clusters.
The Silhouette curve depicts 25 as the number of optimal clusters.
Get started with Silhouette score here: Sklearn Docs.
👉 Over to you: What are some other measures to evaluate clustering quality?
👉 Read what others are saying about this post on LinkedIn and Twitter.
👉 Tell the world what makes this newsletter special for you by leaving a review here :)
👉 If you liked this post, don’t forget to leave a like ❤️. It helps more people discover this newsletter on Substack and tells me that you appreciate reading these daily insights. The button is located towards the bottom of this email.
👉 If you love reading this newsletter, feel free to share it with friends!
👉 Sponsor the Daily Dose of Data Science Newsletter. More info here: Sponsorship details.
Find the code for my tips here: GitHub.
I like to explore, experiment and write about data science concepts and tools. You can read my articles on Medium. Also, you can connect with me on LinkedIn and Twitter.