Although decision trees are simple and intuitive, they always need a bit of extra caution. Here's what you should always remember while training them.
In sklearn's implementation, by default, a decision tree is allowed to grow until all leaves are pure. This leads to overfitting as the model attempts to classify every sample in the training set.
There are various techniques to avoid this, such as pruning and ensembling. Also, make sure that you tune hyperparameters if you use sklearn's implementation.
This was a gentle reminder as many of us often tend to use sklearn’s implementations in their default configuration.
It is always a good practice to know what a default implementation is hiding underneath.
👉 Read what others are saying about this post on LinkedIn and Twitter.
👉 If you liked this post, leave a heart react 🤍.
👉 If you love reading this newsletter, feel free to share it with friends!
Find the code for my tips here: GitHub.
I like to explore, experiment and write about data science concepts and tools. You can connect with me on LinkedIn and Twitter.
overfitting a model is something I'd love for you to write more about. There is an intuition here that is not always a science I can clearly understand. I'm not always sure when cognitive bias can be objectively determined and when experienced intuition should take precedent