How To Avoid Getting Misled by t-SNE Projections?

Some key and lesser-known observations from t-SNE results.

Avi Chawla

Apr 17, 2024

Advertise | Deep Dives | Student Discount

t-SNE is among the most powerful dimensionality reduction techniques to visualize high-dimensional datasets.

In my experience, most folks have at least heard of the t-SNE algorithm.

In fact, do you know that it was first proposed 15 years ago?

So there’s definitely a reason why it continues to be one of the most powerful dimensionality reduction approaches today.

If you are curious to learn more, I have a full 25 min deep dive on tSNE that explains everything from sctrach: Formulating and Implementing the t-SNE Algorithm From Scratch.

Despite its popularity, many consistently make misleading conclusions from the t-SNE projections of their high-dimensional data.

In this post, I want to point out a few of these mistakes so that you don’t make those mistakes ever.

To begin, the performance of the t-SNE algorithm is primarily reliant on perplexity — a hyperparameter of t-SNE.

That is why it is considered the most important hyperparameter in the t-SNE algorithm.

Simply put, the perplexity value provides a rough estimate for the number of neighbors a point may have in a cluster.

And different values of perplexity create very different low-dimensional cluster spaces, as depicted below:

As shown above, most projections do depict the original clusters. However, they vary significantly in shape.

There are five takeaways from the above image:

NEVER make any conclusions about the original cluster shape by looking at these projections.
- Different projections have different low-dimensional cluster shapes, and they do not resemble the original cluster shape.
- For low perplexity values (5 and 10), cluster shapes significantly differ from the original ones.
- Although, in this case, the clusters were color-coded, which provided more clarity. But it may not always be the case, as tSNE is an unsupervised algorithm.
Cluster sizes in a t-SNE plot do not convey anything either.
The dimensions (or coordinates of data points) created by t-SNE in low dimensions have no inherent meaning.
- The axes tick labels of the low-dimensional plots are different and somewhat random.
- Similar to PCA’s principal components, they offer no interpretability.
The distances between clusters in a projection do not mean anything.
- In the original dataset, the blue and red clusters are close.
- Yet, most projections do not preserve the global structure of the original dataset.
Strange things happen at perplexity=2 and perplexity=100.
- At perplexity=2, the low-dimensional mapping conveys nothing.
  - As discussed earlier, the perplexity value provides a rough estimate of the number of neighbors a point may have in a cluster.
  - t-SNE tries to maintain approx. 2 points per cluster. That is why the distortion.
- At perplexity=100, the global structure is preserved, but the local structure gets distorted.
- Thus, tweaking the perplexity hyperparameter is extremely critical here.
- That is why I mentioned above that it is the most important hyperparameter of this algorithm.

As a concluding note, it is found that the ideal perplexity values typically lie in the range [5,50]

So try experimenting in that range and see what looks promising.

Next time you use t-SNE, consider the above points, as these plots can get tricky to interpret.

This is especially true if you don’t understand the internal workings of this algorithm.

Nonetheless, understanding the algorithm will massively help you develop an intuition on its interpretability.

If you are curious to learn more, I have a full 25-minute deep dive on tSNE that explains everything from scratch: Formulating and Implementing the t-SNE Algorithm From Scratch.

In fact, here’s an intriguing thing we cover.

Consider the image below taken from sklearn documentation of t-SNE: