Indeed this is very interesting, thanks for sharing! So when you observe perfectly separable classes ideally it sounds to me that you should switch to SVMs, right? They are effective in handling well-separated classes because they try to to find the hyperplane that optimally separates the classes. So those instances that are near the decision boundary and are hard to classify for the Logistic Regression would actually become the support vectors for SVM.

This result from scikit-learn may be due to regularization. The loss function is minimized at c = -2.33 and m -> infinity, but sklearn.linear_model.LogisticRegression is regularized by default (C = 1.0), which prevents m from getting too big and we stop at a suboptimal solution. See https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html

Indeed this is very interesting, thanks for sharing! So when you observe perfectly separable classes ideally it sounds to me that you should switch to SVMs, right? They are effective in handling well-separated classes because they try to to find the hyperplane that optimally separates the classes. So those instances that are near the decision boundary and are hard to classify for the Logistic Regression would actually become the support vectors for SVM.

This result from scikit-learn may be due to regularization. The loss function is minimized at c = -2.33 and m -> infinity, but sklearn.linear_model.LogisticRegression is regularized by default (C = 1.0), which prevents m from getting too big and we stop at a suboptimal solution. See https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html

Thanks for sharing! In hindsight, have you encountered a real-life dataset with well-separated classes? Curious to hear if you have encountered one!