Discussion about this post

User's avatar
David Esp's avatar

Better still is not to impute anything but rather to leave it up to each model's perspective how to treat missing values. For example "distance functions" can be customised and in some cases asymmetric (e.g. reflecting some aspect of the application domain). Preprocessing data presupposes downstream purposes (that might change over time).

Expand full comment
David Esp's avatar

Outlier-tolerant e.g. median would be better than mean - though in your example that would just place the spike in a "better" place.

Expand full comment
1 more comment...

No posts