If you are doing regression, then in most real-world situations, you would find Mean Squared Error and Huber to be the most prevalent. I would recommend you to read the Huber regression issue to learn more about when it is useful: https://www.blog.dailydoseofds.com/p/a-simple-technique-to-robustify-linear. Log cosh is also used at times, but it has high computational run-time. So let's say if you are sure that you need Huber-like loss function but not able to determine a right threshold, Log cosh can be great.

For classification, BCE, Hinge and Cross-entropy are almost entirely prevalent. KL Divergence is more useful when you want to learn distributional differences and may be use that distributional difference as a loss function to learn model paramters. This is precisely what we do in tSNE. You can read this article if you are interested in learning more: https://www.dailydoseofds.com/formulating-and-implementing-the-t-sne-algorithm-from-scratch/

Lovely, Avi :)

Do you have a summary of when it is best to apply each of these loss functions so that we can apply it to real-world scenarios?

Hi Clinton

If you are doing regression, then in most real-world situations, you would find Mean Squared Error and Huber to be the most prevalent. I would recommend you to read the Huber regression issue to learn more about when it is useful: https://www.blog.dailydoseofds.com/p/a-simple-technique-to-robustify-linear. Log cosh is also used at times, but it has high computational run-time. So let's say if you are sure that you need Huber-like loss function but not able to determine a right threshold, Log cosh can be great.

For classification, BCE, Hinge and Cross-entropy are almost entirely prevalent. KL Divergence is more useful when you want to learn distributional differences and may be use that distributional difference as a loss function to learn model paramters. This is precisely what we do in tSNE. You can read this article if you are interested in learning more: https://www.dailydoseofds.com/formulating-and-implementing-the-t-sne-algorithm-from-scratch/