“In this case, the dataset overlap between any two trees is expected to be huge compared to the typical random forest.”
Is this a typo, or did I misunderstand? In a batching context isn’t the batch size normally much smaller than the whole dataset? And wouldn’t that imply minimal overlap in datasets between trees compared to a typical random forest? I agree though this would aid the bagging objective and reduce bias.
“In this case, the dataset overlap between any two trees is expected to be huge compared to the typical random forest.”
Is this a typo, or did I misunderstand? In a batching context isn’t the batch size normally much smaller than the whole dataset? And wouldn’t that imply minimal overlap in datasets between trees compared to a typical random forest? I agree though this would aid the bagging objective and reduce bias.
I am sorry I made a mistake there, Joseph. I wanted to write "is NOT expected to be huge"
Thanks so much for pointing that out. Correcting it right away.
That makes sense, glad to help! Keep up the great writing!