Discussion about this post

User's avatar
David Holmer's avatar

Your point about histograms is entirely accurate, but I find it odd to recommend a KDE plot as an alternative as it has exactly the same issue except with ā€œsmoothing bandwidthā€ instead of ā€œbin widthā€.

You made an excellent analogy of checking if a regression summery is likely to be accurate by using a scatter plot to check for outliers. Similarity, for this case I find it’s best to check if binning will be accurate using a CDF plot.

Like a scatter plot CDF has the advantage of being ā€œfull resolutionā€ with no rounding or binning and showing ALL the data points, so it shows the texture of your underlying data much better. If it’s generally smooth over a range then it’s ā€œsafeā€ to generate a histogram/KDE of the data in that range. But if it has sudden jumps then that’s where you need to be aware that different bin width / bandwidths may show different stories.

Expand full comment
2 more comments...

No posts