Since every subsequent model is built *on top of* the residual left by the previous models, the final prediction will be a sum of all the individual predictions.
A simple example would look like this:
- Say you train the first model on (x=5, y=10), but so far, the model can only predict (y=8) for x=5. Some part of y is yet to learned.
- Thus, second model will be trained on (x=5, y=2), but let's say this model is only able to predict (y=1) for x=5. Again, some part of y is yet to learned.
- The third model will be trained on (x=5, y=1), and now consider it fits well — it predicts y=1 for x=1.
Now to get the final prediction for x=5, don't you need to add all of them?
I wonder why we should add up all predicted values of all boosted models to get the final y.
Since every subsequent model is built *on top of* the residual left by the previous models, the final prediction will be a sum of all the individual predictions.
A simple example would look like this:
- Say you train the first model on (x=5, y=10), but so far, the model can only predict (y=8) for x=5. Some part of y is yet to learned.
- Thus, second model will be trained on (x=5, y=2), but let's say this model is only able to predict (y=1) for x=5. Again, some part of y is yet to learned.
- The third model will be trained on (x=5, y=1), and now consider it fits well — it predicts y=1 for x=1.
Now to get the final prediction for x=5, don't you need to add all of them?
- 8 from the first model
- 1 from the second model
- 1 from the third model.
How would the process change if we have classification tasks?
We covered it here: https://blog.dailydoseofds.com/p/a-visual-guide-to-adaboost
Why is it necessary that it must be a regression model? Why can’t any model be improved with boosting?