Relative smoothness
In the last lecture, we find smoothness required for convergence guarantee. How to generalize smoothness? Ans: Bregman divergence.
Bregman divergence


Notice that Bregman divergence is not symmetric in terms of x and y, but can be considered an asymmetric “x to y” connotation.

We can find the second term in the L-smooth requirement as a special case of the Bregman divergence, where h is the half L-scaled-squared Euclidean norm. Thus, by replacing the second term in the L-smooth definition and gradient descent algorithm, we find the relative smoothness and a new type of “mirror” descent algorithm.


Mirror descent convergence proof
The proof follows that of gradient descent and L-smoothness. However, with Bregman divergence, simple concepts (e.g. Pythagorean theorem) now require detailed proofs.