The last part of this formula explains the name of this technique: each weight is decayed by a factor .

    For SGD, those two formulas are equivalent. However, this equivalence only holds for standard SGD, because we have seen that with momentum, RMSProp or in Adam, the update has some additional formulas around the gradient.

    Now you know everything that is hidden behind the line !