1 Comment
⭠ Return to thread

Because there are 2^n ways to select a subset of the original n trees in the random forest, this may lead to overfitting to the test set. You may want to use a validation set to select the top k trees and then evaluate the performance on a holdout test set.

Expand full comment