Loss Function of 16 ML Algos

An Algorithm-wise summary of loss functions.

Avi Chawla

Apr 01, 2024

Loss functions are a vital component of ML algorithms.

They specify the objective an algorithm should aim to optimize during its training.

In other words, loss functions explicitly tell the algorithm what it should minimize to improve its performance.

Model training with an objective function

Therefore, knowing which loss functions are (typically) best suited for specific ML algorithms is extremely crucial.

The below visual depicts the most commonly used loss functions by various ML algorithms.

Linear Regression: Mean Squared Error (MSE). This can be used with and without regularization, depending on the situation.
1. Why MSE? We covered it here: MSE post.
Logistic regression: Cross-entropy loss or Log Loss, with and without regularization.
1. Why log loss? We covered its origin here: Why Do We Use log-loss to Train Logistic Regression?
2. Also, do you know Logistic regression can be trained without specifying a learning rate? We covered it here: Why Sklearn’s Logistic Regression Has no Learning Rate Hyperparameter?
Decision Tree and Random Forest:
1. Classification: Gini impurity or information gain.
2. Regressor: Mean Squared Error (MSE)
3. Further reading on Random Forest: Why Bagging is So Ridiculously Effective At Variance Reduction?
Support Vector Machines (SVMs): Hinge loss. It penalizes both wrong and right (but less confident) predictions. Best suited for creating max-margin classifiers, like in SVMs.
k-Nearest Neighbors (kNN): No loss function. kNN is a non-parametric lazy learning algorithm. It works by retrieving instances from the training data, and making predictions based on the k nearest neighbors to the test data instance.
Naive Bayes: No loss function. Can you answer why?
Neural Networks: They can use a variety of loss functions depending on the type of problem. The most common ones are:
1. Regression: Mean Squared Error (MSE).
2. Classification: Cross-Entropy Loss.
AdaBoost: Exponential loss function. AdaBoost is an ensemble learning algorithm. It combines multiple weak classifiers to form a strong classifier. In each iteration of the algorithm, AdaBoost assigns weights to the misclassified instances from the previous iteration. Next, it trains a new weak classifier and minimizes the weighted exponential loss. We covered it here in detail: A Visual and Overly Simplified Guide to The AdaBoost Algorithm.
Other Boosting Algorithms:
1. Regression: Mean Squared Error (MSE).
2. Classification: Cross-Entropy Loss.

👉 Over to you: Can you tell which loss function is used in KMeans?

Thanks for reading!

Whenever you are ready, here’s one more way I can help you:

Every week, I publish 1-2 in-depth deep dives (typically 20+ mins long). Here are some of the latest ones that you will surely like:

I want to read super-detailed articles

To receive all full articles and support the Daily Dose of Data Science, consider subscribing:

I want to read super-detailed articles

👉 If you love reading this newsletter, feel free to share it with friends!

Share Daily Dose of Data Science

👉 Tell the world what makes this newsletter special for you by leaving a review here :)

Review Daily Dose of Data Science

Daily Dose of Data Science

Discussion about this post

Ready for more?