5 Comments
User's avatar
Nghia Dang's avatar

I wonder why we should add up all predicted values of all boosted models to get the final y.

Expand full comment
Avi Chawla's avatar

Since every subsequent model is built *on top of* the residual left by the previous models, the final prediction will be a sum of all the individual predictions.

A simple example would look like this:

- Say you train the first model on (x=5, y=10), but so far, the model can only predict (y=8) for x=5. Some part of y is yet to learned.

- Thus, second model will be trained on (x=5, y=2), but let's say this model is only able to predict (y=1) for x=5. Again, some part of y is yet to learned.

- The third model will be trained on (x=5, y=1), and now consider it fits well — it predicts y=1 for x=1.

Now to get the final prediction for x=5, don't you need to add all of them?

- 8 from the first model

- 1 from the second model

- 1 from the third model.

Expand full comment
Nghia Dang's avatar

How would the process change if we have classification tasks?

Expand full comment
Richard Tang's avatar

Why is it necessary that it must be a regression model? Why can’t any model be improved with boosting?

Expand full comment