MissForest: A Better Alternative To Zero (or…

Aug 14, 2023

Missing value imputation using Random Forest.

3 Comments

Aug 15, 2023

Better still is not to impute anything but rather to leave it up to each model's perspective how to treat missing values. For example "distance functions" can be customised and in some cases asymmetric (e.g. reflecting some aspect of the application domain). Preprocessing data presupposes downstream purposes (that might change over time).

Expand full comment

David Esp

Aug 15, 2023

Outlier-tolerant e.g. median would be better than mean - though in your example that would just place the spike in a "better" place.

Expand full comment

Joe Corliss

Aug 14, 2023

I've always wanted to try something like this. The iterative approach is interesting. I wonder how much the iteration improves on a single step with a model trained on the non-missing values.

Also, if every feature has some missing data, I guess you can't use MissForest without some modification. Maybe there's another algorithm for this case.

Expand full comment

Daily Dose of Data Science

MissForest: A Better Alternative To Zero (or…