A few days back, we discussed accelerating the tSNE algorithm using GPUs for faster processing.
Here’s the visual from that post for a quick recap:
In a gist, the idea was to use tSNE-CUDA, which is an optimized CUDA version of the tSNE algorithm, which, as the name suggests, can leverage hardware accelerators.
And why is an optimized implementation needed in the first place?
It’s needed because the biggest issue with tSNE (which we also discussed here) is that its run-time is quadratically related to the number of data points.
Thus, it can get pretty difficult to use tSNE from Sklearn for large datasets.
tSNE-CUDA addressed this by providing immense speedups over the standard Sklearn implementation using a GPU.
But what if you don’t have access to a GPU?
openTSNE is another optimized Python implementation of t-SNE, which provides massive speed improvements and enables us to scale t-SNE to millions of data points — a place where Sklearn implementation may never reach.
The effectiveness is evident from the image below:
As depicted above, the openTSNE implementation:
is 20 times faster than the Sklearn implementation.
produces similar quality clustering as the Sklearn implementation.
The authors have also provided the following benchmarking results:
As depicted above, openTSNE can produce low dimensional visualization of a million data points in just ~15 minutes.
However, it is clear from their benchmarks that the run-time of the Sklearn implementation has already reached a couple of hours with just ~250k data points.
Isn’t that an insane speedup, that too, without ever utilizing a GPU?
Download the notebook here to try out openTSNE: openTSNE Jupyter notebook.
👉 Over to you: What are some other ways to boost the tSNE algorithm?
Are you overwhelmed with the amount of information in ML/DS?
Every week, I publish no-fluff deep dives on topics that truly matter to your skills for ML/DS roles.
For instance:
A Beginner-friendly Introduction to Kolmogorov Arnold Networks (KANs).
5 Must-Know Ways to Test ML Models in Production (Implementation Included).
Understanding LoRA-derived Techniques for Optimal LLM Fine-tuning
8 Fatal (Yet Non-obvious) Pitfalls and Cautionary Measures in Data Science
Implementing Parallelized CUDA Programs From Scratch Using CUDA Programming
You Are Probably Building Inconsistent Classification Models Without Even Realizing.
How To (Immensely) Optimize Your Machine Learning Development and Operations with MLflow.
And many many more.
Join below to unlock all full articles:
SPONSOR US
Get your product in front of 77,000 data scientists and other tech professionals.
Our newsletter puts your products and services directly in front of an audience that matters — thousands of leaders, senior data scientists, machine learning engineers, data analysts, etc., who have influence over significant tech decisions and big purchases.
To ensure your product reaches this influential audience, reserve your space here or reply to this email to ensure your product reaches this influential audience.