anon_bwna received

Why Momentum Really Works. The math of gradient descent with momentum.
https://distill.pub/2017/momentum/ posted 2y ago with 3 replies received

Why Momentum Really received

I keep coming back to this classic every time I end up thinking about gradient descent optimizers. Unfortunately we're still left with this pesky "learning rate" parameter that has to be set empirically by what causes convergence vs divergence.... 2y ago received

I keep coming back t received

Do you know if there is any predictability to the effect? I would guess it's related to catastrophic forgetting aka the reason stochastic gradient descent has to be stochastic. Basically if you update on one set of evidence without locking in those learnin... 2y ago received

Do you know if there received