How To Avoid Getting Misled by t-SNE Projections?
Some key and lesser-known observations from t-SNE results.
t-SNE is among the most powerful dimensionality reduction techniques to visualize high-dimensional datasets.
In my experience, most folks have at least heard of the t-SNE algorithm.
In fact, do you know that it was first proposed 15 years ago?
So there’s definitely a reason why it continues to be one of the most powerful dimensionality reduction approaches today.
If you are curious to learn more, I have a full 25 min deep dive on tSNE that explains everything from sctrach: Formulating and Implementing the t-SNE Algorithm From Scratch.
Despite its popularity, many consistently make misleading conclusions from the t-SNE projections of their high-dimensional data.
In this post, I want to point out a few of these mistakes so that you don’t make those mistakes ever.
To begin, the performance of the t-SNE algorithm is primarily reliant on perplexity
— a hyperparameter of t-SNE.
That is why it is considered the most important hyperparameter in the t-SNE algorithm.
Simply put, the perplexity
value provides a rough estimate for the number of neighbors a point may have in a cluster.
And different values of perplexity create very different low-dimensional cluster spaces, as depicted below:
As shown above, most projections do depict the original clusters. However, they vary significantly in shape.
There are five takeaways from the above image:
NEVER make any conclusions about the original cluster shape by looking at these projections.
Different projections have different low-dimensional cluster shapes, and they do not resemble the original cluster shape.
For low perplexity values (5 and 10), cluster shapes significantly differ from the original ones.
Although, in this case, the clusters were color-coded, which provided more clarity. But it may not always be the case, as tSNE is an unsupervised algorithm.
Cluster sizes in a t-SNE plot do not convey anything either.
The dimensions (or coordinates of data points) created by t-SNE in low dimensions have no inherent meaning.
The axes tick labels of the low-dimensional plots are different and somewhat random.
Similar to PCA’s principal components, they offer no interpretability.
The distances between clusters in a projection do not mean anything.
In the original dataset, the blue and red clusters are close.
Yet, most projections do not preserve the global structure of the original dataset.
Strange things happen at
perplexity=2
andperplexity=100
.At
perplexity=2
, the low-dimensional mapping conveys nothing.As discussed earlier, the
perplexity
value provides a rough estimate of the number of neighbors a point may have in a cluster.t-SNE tries to maintain approx. 2 points per cluster. That is why the distortion.
At
perplexity=100
, the global structure is preserved, but the local structure gets distorted.Thus, tweaking the perplexity hyperparameter is extremely critical here.
That is why I mentioned above that it is the most important hyperparameter of this algorithm.
As a concluding note, it is found that the ideal perplexity values typically lie in the range [5,50]
So try experimenting in that range and see what looks promising.
Next time you use t-SNE, consider the above points, as these plots can get tricky to interpret.
This is especially true if you don’t understand the internal workings of this algorithm.
Nonetheless, understanding the algorithm will massively help you develop an intuition on its interpretability.
If you are curious to learn more, I have a full 25-minute deep dive on tSNE that explains everything from scratch: Formulating and Implementing the t-SNE Algorithm From Scratch.
In fact, here’s an intriguing thing we cover.
Consider the image below taken from sklearn documentation of t-SNE:
“Why is the learning rate so high compared to the typical range?”
Ever wondered about this?
It will become super clear to understand once you understand the internal mechanics.
👉 Over to you: What are some other common mistakes people make when using t-SNE?
Thanks for reading!
Are you preparing for ML/DS interviews or want to upskill at your current job?
Every week, I publish in-depth ML dives. The topics align with the practical skills that typical ML/DS roles demand.
Join below to unlock all full articles:
Here are some of the top articles:
[FREE] A Beginner-friendly and Comprehensive Deep Dive on Vector Databases.
A Detailed and Beginner-Friendly Introduction to PyTorch Lightning: The Supercharged PyTorch
Don’t Stop at Pandas and Sklearn! Get Started with Spark DataFrames and Big Data ML using PySpark.
Federated Learning: A Critical Step Towards Privacy-Preserving Machine Learning.
Sklearn Models are Not Deployment Friendly! Supercharge Them With Tensor Computations.
Deploy, Version Control, and Manage ML Models Right From Your Jupyter Notebook with Modelbit
Join below to unlock all full articles:
👉 If you love reading this newsletter, share it with friends!
👉 Tell the world what makes this newsletter special for you by leaving a review here :)