Since every subsequent model is built *on top of* the residual left by the previous models, the final prediction will be a sum of all the individual predictions.

A simple example would look like this:

- Say you train the first model on (x=5, y=10), but so far, the model can only predict (y=8) for x=5. Some part of y is yet to learned.

- Thus, second model will be trained on (x=5, y=2), but let's say this model is only able to predict (y=1) for x=5. Again, some part of y is yet to learned.

- The third model will be trained on (x=5, y=1), and now consider it fits well — it predicts y=1 for x=1.

Now to get the final prediction for x=5, don't you need to add all of them?

I wonder why we should add up all predicted values of all boosted models to get the final y.

Since every subsequent model is built *on top of* the residual left by the previous models, the final prediction will be a sum of all the individual predictions.

A simple example would look like this:

- Say you train the first model on (x=5, y=10), but so far, the model can only predict (y=8) for x=5. Some part of y is yet to learned.

- Thus, second model will be trained on (x=5, y=2), but let's say this model is only able to predict (y=1) for x=5. Again, some part of y is yet to learned.

- The third model will be trained on (x=5, y=1), and now consider it fits well — it predicts y=1 for x=1.

Now to get the final prediction for x=5, don't you need to add all of them?

- 8 from the first model

- 1 from the second model

- 1 from the third model.

How would the process change if we have classification tasks?

We covered it here: https://blog.dailydoseofds.com/p/a-visual-guide-to-adaboost

Why is it necessary that it must be a regression model? Why can’t any model be improved with boosting?