in thread "Why Momentum Really Works. The math of gradient descent with momentum.": I cannot yet post new threads, but in the same broad space as " sensitivity to empirically determined learning rate parameters", I recently learned about "curriculum learning", in which the particular order of ingest of training data affects model performa... 12mo ago (collapse hidden) 11 I cannot yet post ne (view hidden) 11