4 Comments
Mar 17Liked by Avi Chawla

Isn't the same thing used in Adam with some other parameters also.

Expand full comment
author

yes, it maintains a exponentially decaying average of the previously computed gradients.

Expand full comment
Feb 23Liked by Avi Chawla

Great

Expand full comment
Feb 22Liked by Avi Chawla

It is a very clear and useful article. Thank you.

Expand full comment