Many folks often struggle to understand the core essence of Bagging and boosting. Here’s a simplified visual guide depicting what goes under the hood.
In a gist, an ensemble combines multiple models to build a more powerful model.
They are fundamentally built on the idea that by aggregating the predictions of multiple models, the weaknesses of individual models can be mitigated. Combining models is expected to provide better overall performance.
Whenever I wish to intuitively illustrate their immense power, I use the following image:
Ensembles are primarily built using two different strategies:
Bagging
Boosting
1) Bagging (short for Bootstrapped Aggregation):
creates different subsets of data (this is called bootstrapping)
trains one model per subset
aggregates all predictions to get the final prediction
Some common models that leverage Bagging are:
Random Forests
Extra Trees
2) Boosting:
is an iterative training process
the subsequent model puts more focus on misclassified samples from the previous model
the final prediction is a weighted combination of all predictions
Some common models that leverage Boosting are:
XGBoost,
AdaBoost, etc.
Overall, ensemble models significantly boost the predictive performance compared to using a single model. They tend to be more robust, generalize better to unseen data, and are less prone to overfitting.
👉 Over to you: What are some challenges/limitations of using ensembles? Let’s discuss it today :)
👉 Read what others are saying about this post on LinkedIn and Twitter.
👉 Tell the world what makes this newsletter special for you by leaving a review here :)
👉 If you liked this post, don’t forget to leave a like ❤️. It helps more people discover this newsletter on Substack and tells me that you appreciate reading these daily insights. The button is located towards the bottom of this email.
👉 If you love reading this newsletter, feel free to share it with friends!
👉 Sponsor the Daily Dose of Data Science Newsletter. More info here: Sponsorship details.
Find the code for my tips here: GitHub.
I like to explore, experiment and write about data science concepts and tools. You can read my articles on Medium. Also, you can connect with me on LinkedIn and Twitter.
Loved the explanation. Great work Avi!!