Why Momentum Really Works. The math of gradient descent with momentum. https://distill.pub/2017/momentum/ posted 1y ago with 3 replies (collapse hidden) log in to judge received 4.8 4.8 Why Momentum Really (view hidden) log in to judge received 4.8 4.8 I keep coming back to this classic every time I end up thinking about gradient descent optimizers. Unfortunately we're still left with this pesky "learning rate" parameter that has to be set empirically by what causes convergence vs divergence.... 1y ago (collapse hidden) log in to judge received 2.9 2.9 I keep coming back t (view hidden) log in to judge received 2.9 2.9 Do you know if there is any predictability to the effect? I would guess it's related to catastrophic forgetting aka the reason stochastic gradient descent has to be stochastic. Basically if you update on one set of evidence without locking in those learnin... 14mo ago (collapse hidden) log in to judge received 2.9 2.9 Do you know if there (view hidden) log in to judge received 2.9 2.9