A Visual Guide to Joint, Marginal and Conditional Probabilities
...and how they are used in data science.
The concepts of probability are fundamental to machine learning and data science.
While it is easy to understand and model a single random variable, in practice, we usually have many random variables that may interact with each other.
Thus, knowing specific techniques and terminologies to estimate the probability of multiple random variables together is crucial.
Let’s begin.
Essentially, there are three main types of probabilities that are used for multiple random variables:
1) Joint probability:
It is the probability of two or more events occurring together.
Denoted as P(X=a and Y=b)
For instance, in data science, the probability of observing an entire row of data denotes a joint probability over multiple random variables.
2) Marginal probability:
The probability of a specific value of one random variable (X), for all outcomes of another random variable (Y).
Denoted as P(X=a, for all outcomes of Y)
For instance, the probability of a specific value of a random variable represents the marginal probability.
3) Conditional probability:
It is the probability of an event given another event.
Denoted as P(X | Y): Read as P(X given Y).
Predictive modeling is primarily centered around conditional probabilities where we estimate the conditional probability of an output given an input.
So remember…
When dealing with multiple random variables, there are three main types of probabilities we primarily consider:
Joint Probability: Probability of events A and B occurring together.
Marginal Probability: Probability of an event X=A irrespective of another random variable Y.
Conditional Probability: Probability of event A given event B.
👉 Over to you: Can you tell why marginal probability is called “marginal”?
Thanks for reading!
👉 If you liked this post, don’t forget to leave a like ❤️. It helps more people discover this newsletter on Substack and tells me that you appreciate reading these daily insights. The button is located towards the bottom of this email.
Whenever you’re ready, here are a couple of more ways I can help you:
Get the full experience of the Daily Dose of Data Science. Every week, receive two extensive deep dives, which:
Explain fundamental concepts of data science and statistics.
Help you approach data science problems with intuition.
Cover concepts that are often overlooked or misunderstood.
Promote yourself (or your brand) to 27,000 subscribers by sponsoring this newsletter.
👉 Tell the world what makes this newsletter special for you by leaving a review here :)
👉 If you love reading this newsletter, feel free to share it with friends!