Make Sklearn KMeans 20x times faster
The KMeans algorithm is commonly used to cluster unlabeled data. But with large datasets, scikit-learn takes plenty of time to train and predict.
To speed-up KMeans, use Faiss by Facebook AI Research. It provides faster nearest-neighbor search and clustering.
Faiss uses "Inverted Index", an optimized data structure to store and index the data points. This makes performing clustering extremely efficient.
Additionally, Faiss provides parallelization and GPU support, which further improves the performance of its clustering algorithms.
Read more: GitHub.
Share this post on LinkedIn: Post Link.
Find the code for my tips here: GitHub.
I like to explore, experiment and write about data science concepts and tools. You can read my articles on Medium. Also, you can connect with me on LinkedIn.