Linear Regression is the most widely used ML algorithm.
But it is sensitive to outliers.
In fact, even a few outliers can significantly impact Linear Regression performance.
Instead, try RANSAC Regression. It is
non-deterministic,
iterative, and
robust to outliers.
It works as follows:
Select a subset of data
Fit a model
Calculate residuals
Classify points as outliers/inliers based on thresholds applied to residuals
Repeat (until max iterations or when a condition is met)
As shown above, while Linear Regression is influenced by outliers, RANSAC Regression isn't.
Nonetheless, it is always recommended to experiment with many robust methods and see which one fits your data best.
Having said that, what are some other popular models that are robust to outliers? Let me know :)
👉 Get started with RANSAC: Sklearn Docs.
👉 If you liked this post, don’t forget to leave a like ❤️. It helps more people discover this newsletter on Substack and tells me that you appreciate reading these daily insights.
The button is located towards the bottom of this email.
Thanks for reading!
Latest full articles
If you’re not a full subscriber, here’s what you missed last month:
You Cannot Build Large Data Projects Until You Learn Data Version Control!
Why Bagging is So Ridiculously Effective At Variance Reduction?
Sklearn Models are Not Deployment Friendly! Supercharge Them With Tensor Computations.
Deploy, Version Control, and Manage ML Models Right From Your Jupyter Notebook with Modelbit
Gaussian Mixture Models (GMMs): The Flexible Twin of KMeans.
To receive all full articles and support the Daily Dose of Data Science, consider subscribing:
👉 Tell the world what makes this newsletter special for you by leaving a review here :)
👉 If you love reading this newsletter, feel free to share it with friends!