7 Comments
User's avatar
Sergey Skripko's avatar

there is a slightly improved version of the Probe Method.

With the original one there is one problem.

If you have thousand features and you reiterate the Probe Method for a couple of times you'll get different number of useful features. There is some randomness in that process. So you can insert not one but say 3-5 of noise features and drop by the worst of them. It will be the least aggressive and greedy approach possible.

Then repeat, as in the article

Expand full comment
Avi Chawla's avatar

Pretty interesting, Sergey. I haven't heard of this before but of course, it makes total sense to me. Let me try this out on one of my models :)

Thanks so much!

Expand full comment
Sergey Skripko's avatar

that's a pleasure to give something valueble to you back! :)

I like your articles very much! I find here the same best quality of the materials as in 3blue1brown, statQuest and Andrew Ng!

Expand full comment
Adam's avatar

What happens if the random feature happens to be a very important feature? Should the engineer make sure that the random feature is far away from some existing feature?

Expand full comment
Joe Corliss's avatar

The last time my team tried this, the importance of the random feature was zero :/

Expand full comment
Omar AlSuwaidi's avatar

maybe what you can do to ameliorate this is to take an actual feature and average weightedly add random noise to it (`rand_f = some_f*K + (1-K)*rand_noise). This way, you can also control the degree of noise. Just a thought.

Expand full comment
Adam's avatar

I like this because you're making sure the random feature is not a known feature.

Expand full comment