Gradient Descent Here you can see an implemention and comparison of main GD modifications: Gradient Descent Stochastic GD SGD with Momentum Mini-batch GD Mini-batch with Momentum