Discussion about this post

User's avatar
David Holmer's avatar

Your point about histograms is entirely accurate, but I find it odd to recommend a KDE plot as an alternative as it has exactly the same issue except with “smoothing bandwidth” instead of “bin width”.

You made an excellent analogy of checking if a regression summery is likely to be accurate by using a scatter plot to check for outliers. Similarity, for this case I find it’s best to check if binning will be accurate using a CDF plot.

Like a scatter plot CDF has the advantage of being “full resolution” with no rounding or binning and showing ALL the data points, so it shows the texture of your underlying data much better. If it’s generally smooth over a range then it’s “safe” to generate a histogram/KDE of the data in that range. But if it has sudden jumps then that’s where you need to be aware that different bin width / bandwidths may show different stories.

Expand full comment
2 more comments...

No posts