Visualize The Performance Of Linear Regression With This Simple Plot
Assumption turned into performance validation.
Linear regression assumes that the model residuals (=actual-predicted) are normally distributed.
If the model is underperforming, it may be due to a violation of this assumption.
A QQ plot (short for Quantile-Quantile) is a great way to verify this and also determine the model's performance.
As the name suggests, it depicts the quantiles of the observed distribution (residuals in this case) against the quantiles of a reference distribution, typically the standard normal distribution.
A good QQ plot will:
Show minimal deviations from the reference line, indicating that the residuals are approximately normally distributed.
A bad QQ plot will:
Exhibit significant deviations, indicating a departure from the normality of residuals.
Display patterns of skewness with its diverging ends, etc.
Thus, the more aligned the QQ plot looks, the more confident you can be about your model.
This is especially useful when the regression line is difficult to visualize, i.e., in a high-dimensional dataset.
So remember...
After running a linear model, always check the distribution of the residuals.
This will help you:
Validate the model's assumptions
Determine how good your model is
Find ways to improve it (if needed)
👉 Over to you: What are some other ways/plots to determine the linear model's performance?
I covered another way in one of my previous posts: Visualize The Performance Of Any Linear Regression Model With This Simple Plot.
👉 Read what others are saying about this post on LinkedIn and Twitter.
👉 Tell the world what makes this newsletter special for you by leaving a review here :)
👉 If you liked this post, don’t forget to leave a like ❤️. It helps more people discover this newsletter on Substack and tells me that you appreciate reading these daily insights. The button is located towards the bottom of this email.
👉 If you love reading this newsletter, feel free to share it with friends!
Find the code for my tips here: GitHub.
I like to explore, experiment and write about data science concepts and tools. You can read my articles on Medium. Also, you can connect with me on LinkedIn and Twitter.
Great post as usual, Avi. I enjoy reading all your "daily dose" posts.
I just want to comment that it seems that the link to the Jupyter notebook is broken...Thanks for sharing your knowledge! Please, keep going.
Unbiasedness and minimal variance of a linear regression model has nothing to do with normality of residuals. It only gives you another way to derive parameters using MLE and makes your hypotheses tests valid.