I looked up KS plots and it appears like two CDFs are being plotted on the same plot. This is different than what I was thinking. I may have created confusion by the way I worded my question
I should have clarified that I'm using the word "against" to imply that one distribution's CDF is the Xs of the plot and the other distribution's CDF is the Ys of the plot.
I haven't seen such a plot before to be honest, but as I am thinking about it while writing, I am expecting it would be quite similar in nature to a QQ plot.
I say this because in QQ plot, we plot the percentiles and a percentile value is indicative of "how many" values are present before a particular value. If you were to plot a CDF instead, the aggregator will be probability density, which again, will have some element of the number of values that were aggregated at a particular value of the random variable.
For instance, if these were your probability densities:
- P(X=1) = 2/8
- P(X=2) = 3/8
- P(X=3) = 1/8
- P(X=4) = 2/8
The CDF would look like:
- CDF(X<=1) = 2/8
- CDF(X<=2) = 5/8
- CDF(X<=3) = 6/8
- CDF(X<=4) = 8/8
Notice that the numerator is indicative of "count" here, which is precisely what you would expect to see in percentiles too. Sounds good?
When two distributions A and B have the same amount of samples, would plotting their CDFs against each other produce a valid QQ plot?
When we plot CDFs of two plots, it creates a KS plot instead, not QQ plot.
I looked up KS plots and it appears like two CDFs are being plotted on the same plot. This is different than what I was thinking. I may have created confusion by the way I worded my question
I should have clarified that I'm using the word "against" to imply that one distribution's CDF is the Xs of the plot and the other distribution's CDF is the Ys of the plot.
Got it. Sorry I misunderstood your point.
I haven't seen such a plot before to be honest, but as I am thinking about it while writing, I am expecting it would be quite similar in nature to a QQ plot.
I say this because in QQ plot, we plot the percentiles and a percentile value is indicative of "how many" values are present before a particular value. If you were to plot a CDF instead, the aggregator will be probability density, which again, will have some element of the number of values that were aggregated at a particular value of the random variable.
For instance, if these were your probability densities:
- P(X=1) = 2/8
- P(X=2) = 3/8
- P(X=3) = 1/8
- P(X=4) = 2/8
The CDF would look like:
- CDF(X<=1) = 2/8
- CDF(X<=2) = 5/8
- CDF(X<=3) = 6/8
- CDF(X<=4) = 8/8
Notice that the numerator is indicative of "count" here, which is precisely what you would expect to see in percentiles too. Sounds good?
Yeah, looks like we're on the same page. I like how you broke it down further by involving the pdf :)
Perfect, Adam :)
And yeah, where I wrote probability densities for (X=1), (X=2)..., I meant probability mass there as we are considering a discrete random variable.
Ok thanks. I'll look up KS plots.
Always appreciate such a straight forward explanation with visual representations, great job!