1. Is it possible that those top-k trees will be highly correlated to each other? I mean, their top predictors, root nodes, will look similar? From that perspective, won't it be more efficient to take top-k trees with some step, like every 3rd, to reduce this effect? Have you checked it?
2. After we picked top-k trees, we can improve the metric even more, calculating the residuals after top-k trees and fitting xgboost/any other boost against the residuals
I agree with your points. Was thinking along the same lines. I was wondering if procedure would be to keep resorting after taking the next tree so each one you add is the best one in combination with the prior k-1. I think this ends up being some kind of hybrid of bagging + boosting as then you are biased towards adding trees that do better on cases you were not doing well with so far. I’d wonder if these kinds of techniques are reliably more effective than just running plain boosting instead with some tuning to figure out a good small k.
A couple of ideas:
1. Is it possible that those top-k trees will be highly correlated to each other? I mean, their top predictors, root nodes, will look similar? From that perspective, won't it be more efficient to take top-k trees with some step, like every 3rd, to reduce this effect? Have you checked it?
2. After we picked top-k trees, we can improve the metric even more, calculating the residuals after top-k trees and fitting xgboost/any other boost against the residuals
I agree with your points. Was thinking along the same lines. I was wondering if procedure would be to keep resorting after taking the next tree so each one you add is the best one in combination with the prior k-1. I think this ends up being some kind of hybrid of bagging + boosting as then you are biased towards adding trees that do better on cases you were not doing well with so far. I’d wonder if these kinds of techniques are reliably more effective than just running plain boosting instead with some tuning to figure out a good small k.