there is a slightly improved version of the Probe Method.
With the original one there is one problem.
If you have thousand features and you reiterate the Probe Method for a couple of times you'll get different number of useful features. There is some randomness in that process. So you can insert not one but say 3-5 of noise features and drop by the worst of them. It will be the least aggressive and greedy approach possible.
What happens if the random feature happens to be a very important feature? Should the engineer make sure that the random feature is far away from some existing feature?
maybe what you can do to ameliorate this is to take an actual feature and average weightedly add random noise to it (`rand_f = some_f*K + (1-K)*rand_noise). This way, you can also control the degree of noise. Just a thought.
there is a slightly improved version of the Probe Method.
With the original one there is one problem.
If you have thousand features and you reiterate the Probe Method for a couple of times you'll get different number of useful features. There is some randomness in that process. So you can insert not one but say 3-5 of noise features and drop by the worst of them. It will be the least aggressive and greedy approach possible.
Then repeat, as in the article
Pretty interesting, Sergey. I haven't heard of this before but of course, it makes total sense to me. Let me try this out on one of my models :)
Thanks so much!
that's a pleasure to give something valueble to you back! :)
I like your articles very much! I find here the same best quality of the materials as in 3blue1brown, statQuest and Andrew Ng!
What happens if the random feature happens to be a very important feature? Should the engineer make sure that the random feature is far away from some existing feature?
The last time my team tried this, the importance of the random feature was zero :/
maybe what you can do to ameliorate this is to take an actual feature and average weightedly add random noise to it (`rand_f = some_f*K + (1-K)*rand_noise). This way, you can also control the degree of noise. Just a thought.
I like this because you're making sure the random feature is not a known feature.