Momentum can accelerate learning on those problems where the high-dimensional “weight space” that is being navigated by the optimization process has structures that mislead the gradient descent algorithm, such as flat regions or steep curvature. The method of momentum is designed to accelerate learni...