Train Classical ML Models on Large Datasets

Apr 18, 2024

Extending the Bagging objective.

3 Comments

“In this case, the dataset overlap between any two trees is expected to be huge compared to the typical random forest.”

Is this a typo, or did I misunderstand? In a batching context isn’t the batch size normally much smaller than the whole dataset? And wouldn’t that imply minimal overlap in datasets between trees compared to a typical random forest? I agree though this would aid the bagging objective and reduce bias.

Expand full comment

I am sorry I made a mistake there, Joseph. I wanted to write "is NOT expected to be huge"

Thanks so much for pointing that out. Correcting it right away.

Expand full comment

That makes sense, glad to help! Keep up the great writing!

Expand full comment

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts