Exploring and analyzing data is a fundamental aspect of data science.
Here, visualizations play a crucial role in understanding complex patterns and relationships.
They offer a concise way to:
understand the intricacies of statistical models,
validate model assumptions,
evaluate model performance, and much more.
The visual above depicts 9 of the most important and must-know plots in data science.
KS Plot: It compares the cumulative distribution functions (CDFs) of a dataset to a theoretical distribution or between two datasets to assess the distributional differences.
SHAP Plot: It provides a summary of feature importance to a model’s predictions, by considering interactions/dependencies between them.
QQ Plot: It is used to assess the distributional similarity between observed data and theoretical distribution.
Here, we plot the quantiles of the two distributions against each other.
Deviations from the straight line indicate a departure from the assumed distribution.
Cumulative Explained Variance Plot: I covered this in a detailed post before: How Many Dimensions Should You Reduce Your Data To When Using PCA?
Gini-Impurity vs. Entropy: They are used to measure the impurity or disorder of a node or split in a decision tree.
The plot compares Gini impurity and Entropy across different splits. This provides insights into the tradeoff between these measures.
Bias-Variance Tradeoff: It is used to find the right balance between the bias and the variance of a model.
ROC Curve: It depicts the trade-off between the true positive rate (TPR) and the false positive rate (FPR) across different classification thresholds.
Precision-Recall Curve: It depicts the trade-off between Precision and Recall across different classification thresholds.
Elbow Curve: The plot helps identify the optimal number of clusters for k-means algorithm.
Over to you: What more plots will you include here?
👉 Read what others are saying about this post on LinkedIn and Twitter.
👉 Tell the world what makes this newsletter special for you by leaving a review here :)
👉 If you liked this post, don’t forget to leave a like ❤️. It helps more people discover this newsletter on Substack and tells me that you appreciate reading these daily insights. The button is located towards the bottom of this email.
👉 If you love reading this newsletter, feel free to share it with friends!
👉 Sponsor the Daily Dose of Data Science Newsletter. More info here: Sponsorship details.
Find the code for my tips here: GitHub.
I like to explore, experiment and write about data science concepts and tools. You can read my articles on Medium. Also, you can connect with me on LinkedIn and Twitter.
Thanks a lot really very useful
This summary is really awesome, so many useful ways to understand ML !
However, I would advice against elbow method, as many article showed how wrong it can be. Here is a link of an excellent and recent articles, but they are many more :
https://towardsdatascience.com/are-you-still-using-the-elbow-method-5d271b3063bd
Thanks again for your excellent work 😊