Under default conditions, decision trees always overfit.
This is because a decision tree (in sklearn’s implementation, for instance), is allowed to grow until all leaves are pure.
As the model correctly classifies ALL training instances, this leads to 100% overfitting and poor generalization.
Random Forest addresses this by introducing randomness in two ways:
While creating a bootstrapped dataset.
While deciding a node’s split criteria by choosing candidate features randomly.
This aids the Bagging objective, whose mathematical foundations we covered in this detailed article: Why Bagging is So Ridiculously Effective At Variance Reduction?
That said, there’s one more algorithm that introduces more randomness into a random forest.
It’s called the ExTra Trees algorithm
Note: ExTra Trees does not mean more trees. Instead, it’s a short form for Extra Randomized.
ExtRa Trees are Random Forests with an additional source of randomness.
Here’s how it works:
Create a bootstrapped dataset for each tree (same as RF)
Select candidate features randomly for node splitting (same as RF)
Now, Random Forest calculates the best-split threshold for each candidate feature.
But ExtRa Trees chooses this split threshold randomly as well.
This is the source of extra randomness.
After that, the best candidate feature is selected. This further reduces the variance of the model.
Below, I have compared three models — decision tree, random forest, and ExTra trees on a dummy dataset:
Decision Trees entirely overfit.
Random Forests work better.
ExTra Trees performs marginally better.
⚠️ A cautionary measure while using ExtRa Trees from Sklearn.
By default, the bootstrap
flag is set to False
.
Make sure you run it with bootstrap=True
, otherwise, it will use the whole dataset for each tree.
If you want to get into the mathematical foundations of Bagging, which will also help you build your own Bagging models, we covered it here: Why Bagging is So Ridiculously Effective At Variance Reduction?
👉 Over to you: Can you think of another way to add randomness to Random Forest?
Are you overwhelmed with the amount of information in ML/DS?
Every week, I publish no-fluff deep dives on topics that truly matter to your skills for ML/DS roles.
For instance:
Quantization: Optimize ML Models to Run Them on Tiny Hardware
A Beginner-friendly Introduction to Kolmogorov Arnold Networks (KANs)
5 Must-Know Ways to Test ML Models in Production (Implementation Included)
Understanding LoRA-derived Techniques for Optimal LLM Fine-tuning
8 Fatal (Yet Non-obvious) Pitfalls and Cautionary Measures in Data Science
Implementing Parallelized CUDA Programs From Scratch Using CUDA Programming
You Are Probably Building Inconsistent Classification Models Without Even Realizing
And many many more.
Join below to unlock all full articles:
SPONSOR US
Get your product in front of 82,000 data scientists and other tech professionals.
Our newsletter puts your products and services directly in front of an audience that matters — thousands of leaders, senior data scientists, machine learning engineers, data analysts, etc., who have influence over significant tech decisions and big purchases.
To ensure your product reaches this influential audience, reserve your space here or reply to this email to ensure your product reaches this influential audience.