9 Most Important Plots in Data Science

...in a single frame

Avi Chawla

May 26, 2023

Exploring and analyzing data is a fundamental aspect of data science.

Here, visualizations play a crucial role in understanding complex patterns and relationships.

They offer a concise way to:

understand the intricacies of statistical models,
validate model assumptions,
evaluate model performance, and much more.

The visual above depicts 9 of the most important and must-know plots in data science.

KS Plot: It compares the cumulative distribution functions (CDFs) of a dataset to a theoretical distribution or between two datasets to assess the distributional differences.
SHAP Plot: It provides a summary of feature importance to a model’s predictions, by considering interactions/dependencies between them.
QQ Plot: It is used to assess the distributional similarity between observed data and theoretical distribution.
- Here, we plot the quantiles of the two distributions against each other.
- Deviations from the straight line indicate a departure from the assumed distribution.
Cumulative Explained Variance Plot: I covered this in a detailed post before: How Many Dimensions Should You Reduce Your Data To When Using PCA?
Gini-Impurity vs. Entropy: They are used to measure the impurity or disorder of a node or split in a decision tree.
- The plot compares Gini impurity and Entropy across different splits. This provides insights into the tradeoff between these measures.
Bias-Variance Tradeoff: It is used to find the right balance between the bias and the variance of a model.
ROC Curve: It depicts the trade-off between the true positive rate (TPR) and the false positive rate (FPR) across different classification thresholds.
Precision-Recall Curve: It depicts the trade-off between Precision and Recall across different classification thresholds.
Elbow Curve: The plot helps identify the optimal number of clusters for k-means algorithm.

Over to you: What more plots will you include here?

👉 Read what others are saying about this post on LinkedIn and Twitter.

👉 Tell the world what makes this newsletter special for you by leaving a review here :)

Review Daily Dose of Data Science

👉 If you liked this post, don’t forget to leave a like ❤️. It helps more people discover this newsletter on Substack and tells me that you appreciate reading these daily insights. The button is located towards the bottom of this email.

👉 If you love reading this newsletter, feel free to share it with friends!

Share Daily Dose of Data Science

👉 Sponsor the Daily Dose of Data Science Newsletter. More info here: Sponsorship details.

Find the code for my tips here: GitHub.

I like to explore, experiment and write about data science concepts and tools. You can read my articles on Medium. Also, you can connect with me on LinkedIn and Twitter.

CHINTALAPATI SUBRAMANYAM

Thanks a lot really very useful

Expand full comment

Anthony Lannes

This summary is really awesome, so many useful ways to understand ML !

However, I would advice against elbow method, as many article showed how wrong it can be. Here is a link of an excellent and recent articles, but they are many more :

https://towardsdatascience.com/are-you-still-using-the-elbow-method-5d271b3063bd

Thanks again for your excellent work 😊

2 more comments...

Daily Dose of Data Science

9 Most Important Plots in Data Science

...in a single frame

Discussion about this post