Are You Using Probability and Likelihood Interchangeably?
If yes, don't. Here's the difference.
In data science and statistics, many folks often use “probability” and “likelihood” interchangeably.
However, likelihood and probability DO NOT convey the same meaning.
And the misunderstanding is somewhat understandable, given that they carry similar meanings in our regular language.
While writing today’s newsletter, I searched for their meaning in the Cambridge Dictionary.
Here’s what it says:
Probability: the level of possibility of something happening or being true/ (Source)
Likelihood: the chance that something will happen. (Source)
It amused me that “likelihood” is the only synonym of “probability”.
Anyway.
In my opinion, it is crucial to understand that probability and likelihood convey very different meanings in data science and statistics.
Let’s understand!
Probability is used in contexts where you wish to know the possibility/odds of an event.
For instance, what is the:
Probability of obtaining an even number in a die roll?
Probability of drawing an ace of diamonds from a deck?
and so on…
When translated to ML, probability can be thought of as:
What is the probability that a transaction is fraud?
What is the probability that an image depicts a cat?
and so on…
Essentially, many classification models, like logistic regression or a classification neural network, etc., assign the probability of a specific label to an input.
When calculating probability, the model’s parameters are known.
Also, we assume that they are trustworthy.
For instance, to determine the probability of a head in a coin toss, we mostly assume and trust that it is a fair coin.
Likelihood, on the other hand, is about explaining events that have already occurred.
Unlike probability (where parameters are known and assumed to be trustworthy)...
…likelihood helps us determine if we can trust the parameters in a model based on the observed data.
Let me elaborate more on that.
Assume you have collected some 2D data and wish to fit a straight line with two parameters — slope (m
) and intercept (c
).
Here, likelihood is defined as the support provided by a data point for some particular parameter values in your model.
Here, you will ask questions like:
If I model this data with the parameters:
m=2
andc=1
, what is the likelihood of observing the data?m=3
andc=2
, what is the likelihood of observing the data?and so on…
The above formulation popularly translates into the maximum likelihood estimation (MLE), which we discussed in this newsletter here.
In maximum likelihood estimation, you have some observed data and you are trying to determine the specific set of parameters (θ
) that maximize the likelihood of observing the data.
Using the term “likelihood” is like:
I have a possible explanation for my data. (In the above illustration, “explanation” can be thought of as the parameters you are trying to determine)
How well does my explanation explain what I’ve already observed? This is precisely quantified with likelihood.
For instance:
Observation: The outcomes of 10 coin tosses are “HHHHHHHTHH”.
Explanation: I think it is a fair coin (p=0.5).
What is the likelihood that my above explanation is true based on the observed data?
To summarize…
It is immensely important to understand that in data science and statistics, likelihood and probability DO NOT convey the same meaning.
As explained above, they are pretty different.
In probability:
We determine the possibility of an event.
We know the parameters associated with the event and assume them to be trustworthy.
In likelihood:
We have some observations.
We have an explanation (or parameters).
Likelihood helps us quantify whether the explanation is trustworthy.
Hope that helped!
👉 Over to you: I would love to hear your explanation of probability and likelihood. Feel free to share :)
👉 If you liked this post, don’t forget to leave a like ❤️. It helps more people discover this newsletter on Substack and tells me that you appreciate reading these daily insights.
The button is located towards the bottom of this email.
Thanks for reading!
Latest full articles
If you’re not a full subscriber, here’s what you missed last month:
Federated Learning: A Critical Step Towards Privacy-Preserving Machine Learning
You Cannot Build Large Data Projects Until You Learn Data Version Control!
Why Bagging is So Ridiculously Effective At Variance Reduction?
Sklearn Models are Not Deployment Friendly! Supercharge Them With Tensor Computations.
Deploy, Version Control, and Manage ML Models Right From Your Jupyter Notebook with Modelbit
Gaussian Mixture Models (GMMs): The Flexible Twin of KMeans.
To receive all full articles and support the Daily Dose of Data Science, consider subscribing:
👉 Tell the world what makes this newsletter special for you by leaving a review here :)
👉 If you love reading this newsletter, feel free to share it with friends!
Very well explained. Thanks.
Probability is used to measure the likelihood of an event occurring.