Scatter plots, bar plots, line plots, box plots, and heatmaps are the most frequently used plots for data visualization.
Although they are simple and known to almost everyone, I believe they are not the right choice to cover every possible scenario.
Instead, many other plots originate from these standard plots that can be much more suitable, if used appropriately.
Therefore, today, let’s discuss a few alternatives to these popular plots.
I will also explain specific situations where they can be more useful over standard plots.
This post is a consolidation of some of my previous plotting posts published in this newsletter.
If you have never seen them before, then there’s new information for you.
If you have seen them before, then this will be a good referesher for you.
In any case, a consolidated guide will be quite useful to look back later instead of scrolling through individual newsletter issues.
Also, before I begin, this post is not intended to discourage the use of these traditional plots. They will always have there place.
Instead, it is to highlight specific situations where they can replaced with better plotting ideas.
Let’s begin!
#1) Size-encoded heatmaps
A traditional heatmap represents the values using a color scale. Yet, mapping the cell color to exact numbers is still challenging.
Embedding a size component to heatmaps can be extremely helpful in such cases.
In essence, the bigger the size, the higher the absolute value:
This is especially useful to make heatmaps cleaner, as many values nearer to zero will immediately shrink.
#2) Waterfall charts
To visualize the change in value over time, a line (or bar) plot may not always be an apt choice.
This is because a line plot (or bar plot) depicts the actual values in the chart. Thus, it is difficult to visually estimate the scale and direction of incremental changes.
Instead, you can use a waterfall chart.
It elegantly depicts these rolling differences, as depicted below:
Here, the start and final values are represented by the first and last bars.
Also, the consecutive changes are automatically color-coded, making them easier to interpret.
#3) Bump charts
When visualizing the change in rank over time of multiple categories, using a bar chart may not be appropriate.
This is because bar charts quickly become cluttered with many categories.
Instead, try Bump Charts. They are specifically used to visualize the rank of different items over time.
Comparing the bar chart and bump chart above, it is far easier to interpret the change in rank with a bump chart rather than a bar chart.
#4) Raincloud Plots
Visualizing data distributions using box plots and histograms can be misleading at times.
This is because:
It is possible to get the same box plot with entirely different data.
Altering the number of bins changes the shape of a histogram.
Thus, to avoid misleading conclusions, it is always recommended to plot the data distribution as precisely as possible.
These include:
Box plots for data statistics.
Strip plots for data overview.
KDE plots for the probability distribution of data.
With Raincloud plots, you can:
Combine multiple plots to prevent incorrect/misleading conclusions
Reduce clutter and enhance clarity
Improve comparisons between groups
Capture different aspects of the data through a single plot
#5-6) Hexbin and Density Plots
Scatter plots can get too dense to interpret when you have thousands of data points.
Instead, you can replace them with Hexbin plots.
Hexbin plots bin the area of a chart into hexagonal regions. Each region is assigned a color intensity based on the method of aggregation used (the number of points, for instance).
Another choice is a density plot, which illustrates the distribution of points in a two-dimensional space.
A contour is created by connecting points of equal density. In other words, a single contour line depicts an equal density of data points.
#7-8) Bubble charts and Dot plots
As discussed above, bar plots quickly get messy and cluttered as the number of categories increases.
A bubble plot is often a better alternative in such cases.
They are like scatter plots but:
with one categorical axis
and one continuous axis
As depicted above:
It is difficult to interpret the bar plot because it has too many bars packed into a small space,
But size-encoded bubbles make it pretty easy to visualize the change over time.
Another alternative to bar plots in such situations is dot plots.
Both dot plots and bubble charts are based on the idea that, at times, when we have a bar plot with many bars, we’re often not paying attention to the individual bar lengths.
Instead, we mostly consider the individual endpoints that denote the total value.
These plots precisely help us depict that while also eliminating the long bars of little to no use.
👉 Over to you: Are there any other lesser-known yet valuable plots that I haven’t covered here. If yes, when do you use them?
Let me help you more…
Every week, I publish in-depth ML deep dives. The topics align with the practical skills that typical ML/DS roles demand.
Join below to unlock all full articles:
Here are some of the top articles:
[FREE] A Beginner-friendly Introduction to Kolmogorov Arnold Networks (KANs).
Implementing Parallelized CUDA Programs From Scratch Using CUDA Programming
Understanding LoRA-derived Techniques for Optimal LLM Fine-tuning
8 Fatal (Yet Non-obvious) Pitfalls and Cautionary Measures in Data Science.
5 Must-Know Ways to Test ML Models in Production (Implementation Included).
Don’t Stop at Pandas and Sklearn! Get Started with Spark DataFrames and Big Data ML using PySpark.
Join below to unlock all full articles:
SPONSOR US
Get your product in front of more than 76,000 data scientists and other tech professionals.
Our newsletter puts your products and services directly in front of an audience that matters — thousands of leaders, senior data scientists, machine learning engineers, data analysts, etc., who have influence over significant tech decisions and big purchases.
To ensure your product reaches this influential audience, reserve your space here or reply to this email.
Great article! I have tried using alpha to address data intense scatterplots but plan on switching to the hexbin plot the next time I face this issue. Same for the size encoded heat maps - wonderful suggestions!