8 Comments
Nov 11, 2023Liked by Avi Chawla

Let's see if I understand this. You say "Simply put, if we randomly shuffle just one feature and everything else stays the same..." Are you saying, you shuffle ONLY one column (feature) in the dataset, correct? This means after the shuffle, the dataset would no longer be valid, since the observations will no longer match the original observations. But I'm thinking this is okay if the "shuffled" observations are only used to determine feature importance, and are not used for modeling. Correct? Please comment.

Expand full comment
author

yes, that's correct, George. We have already trained a model on the non-shuffled dataset and then we are using a shuffled dataset to measure the impact of shuffling one specific feature on the accuracy. We repeat that multiple times for one feature to eliminate any affects of randomness.

Expand full comment

Got it. Thanks.

Expand full comment

I agree. You're absolutely right about that according to quantum theory.

Expand full comment

Why no use the observed feature at all? Erase it from dataframe and test the model whitout it

Expand full comment

Very interesting. How would this work (or would it) with time series data? Thanks!

Expand full comment

Basically the same, of course you cannot shuffle the variable with the time information, but I think you are talking about multivariate time series in which we have a time variable t and a matrix X of features so you can apply the same technique and if a feature is important then the model will not be able to predict y at time t_n as good as before

Expand full comment

Makes sense. Thanks!

Expand full comment