It’s SO EASY to accelerate model training with GPUs today. All it takes is just a simple .cuda()
call:
But have you ever understood the underlying implementations of how GPUs accelerate computing tasks?
More specifically, what happens under the hood when we do a .cuda()
call?
For those who are curious to learn, I have written an article that covers the internal mechanics of GPU programming: Implementing Parallelized CUDA Programs From Scratch Using CUDA Programming.
We cover the end-to-end details of CUDA and do a hands-on demo on CUDA programming by implementing parallelized implementations of various operations we typically perform in deep learning.
The article is beginner-friendly so if you have written any CUDA program before, that’s okay.
Read it here: Implementing Parallelized CUDA Programs From Scratch Using CUDA Programming.
In my opinion, GPUs are among the biggest black-box aspects that are deeply rooted in deep learning.
Most real-world models are influenced by GPUs for faster training.
Yet, understanding how they work is possibly the most overlooked aspect of deep learning by most practitioners.
If you are always curious about underlying details, this article is for you.
Learn how parallelized CUDA implementations are written here: Implementing Parallelized CUDA Programs From Scratch Using CUDA Programming.
Have a good day!
Avi