Today, instead of releasing a new newsletter issue, I decided to open access to three of my favorite and most loved deep dives, which I write alongside this newsletter.
Here they are:
Model Compression: A Critical Step Towards Efficient Machine Learning
Learn 4 techniques to build less memory-intensive models that are suitable for deployments.
Most ML companies use them to save 1000s of dollars every month in deployments.
Why Bagging is So Ridiculously Effective At Variance Reduction?
Ever wondered what is the mathematical foundation behind Bagging? You will learn this here. I am sure you will have an eye-opening moment after reading this one.
You will also understand why we sample rows from the training dataset with replacement.
One of my subscribers (Anket Hirulkar) shared his story: He read this article and described the mathematical foundation behind Bagging in his senior data scientist interview. To his surprise, even the interviewers didn’t know the true origin of Bagging and why it is so effective. They extended his interview by 15-20 minutes to learn more. And yes, he secured the job :)
Of course, I am not saying he secured the position just because he knew Bagging.
However, the interviewers were genuinely impressed that he made that additional effort to understand core details, as people tend to build only a shallow understanding of critical concepts.
It is really hard for me to cover every little detail in a short daily email. So, if you are serious and can spare only 30-40 mins a week (which I am sure you can), start extracting value from the deep dives. There are already 30-35 deep articles, and trust me, the more you delay, the more difficult it will be to catch up.
We get into sooo much detail, cover the intuition, include catchy diagrams, and, at times, share implementations from scratch (only Python and NumPy, no sklearn) so that you build a true understanding of the internal mechanisms.
You Cannot Build Large Data Projects Until You Learn Data Version Control!
ML programs are a combination of both code and data.
To ensure reproducibility, we must also version datasets, but this is infeasible with tools like Git because datasets are GBs in size.
Learn about data version control (DVC) to build more reliable and reproducible ML pipelines.
Please note that these three deep dives will be locked after 5 days.
In addition to this, a deep dive into vector databases has always been open. Read it here in case you missed it: A Beginner-friendly and Comprehensive Deep Dive on Vector Databases.
Have a good day!
Avi