Logistic Regression Can NEVER Perfectly Model…

Dec 4, 2023

But isn't well-separated data easiest to separate?

3 Comments

Dec 4, 2023

Indeed this is very interesting, thanks for sharing! So when you observe perfectly separable classes ideally it sounds to me that you should switch to SVMs, right? They are effective in handling well-separated classes because they try to to find the hyperplane that optimally separates the classes. So those instances that are near the decision boundary and are hard to classify for the Logistic Regression would actually become the support vectors for SVM.

Expand full comment

Joe Corliss

Dec 5, 2023

This result from scikit-learn may be due to regularization. The loss function is minimized at c = -2.33 and m -> infinity, but sklearn.linear_model.LogisticRegression is regularized by default (C = 1.0), which prevents m from getting too big and we stop at a suboptimal solution. See https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html

Expand full comment

Shouvik

Dec 6, 2023

Thanks for sharing! In hindsight, have you encountered a real-life dataset with well-separated classes? Curious to hear if you have encountered one!

Expand full comment

Daily Dose of Data Science

Logistic Regression Can NEVER Perfectly Model…