Discussion about this post

User's avatar
Neural Foundry's avatar

Really solid breakdown on this one. The pipeline parallelism piece is underrated tho, most people just default to data parallelism without thinking abot GPU idle time. The animations make it super clear how much compute gets wasted when layers are just sitting there doing nothing between transfers.

Expand full comment

No posts

Ready for more?