4. mirror descent and relative smoothness

Convex non-smooth and the convergence of mirror descent with relative smoothness

back to 《A Deeper Learner》main page

Relative smoothness

In the last lecture, we find smoothness required for convergence guarantee. How to generalize smoothness? Ans: Bregman divergence.

Bregman divergence

D_h(y,x)\equiv h(y) — h(x) — \langle\nabla h(x), y-x\rangle
D_h(y,x)\equiv h(y) — h(x) — \langle\nabla h(x), y-x\rangle
The Bregman divergence.
The Bregman divergence illustrated by Prof. Yen-Huan Li.

Notice that Bregman divergence is not symmetric in terms of x and y, but can be considered an asymmetric “x to y” connotation.

D_h(y,x) &= \frac{1}{2}\lVert y\rVert² — \frac{1}{2}\lVert x\rVert² — \langle \nabla \frac{1}{2}\lVert x\rVert², y-x\rangl
D_h(y,x) &= \frac{1}{2}\lVert y\rVert² — \frac{1}{2}\lVert x\rVert² — \langle \nabla \frac{1}{2}\lVert x\rVert², y-x\rangl
Bregman divergence returns to the (second term) of the L-smooth definition.

We can find the second term in the L-smooth requirement as a special case of the Bregman divergence, where h is the half L-scaled-squared Euclidean norm. Thus, by replacing the second term in the L-smooth definition and gradient descent algorithm, we find the relative smoothness and a new type of “mirror” descent algorithm.

f(y)\leq f(x)+\langle\nabla f(x),y-x\rangle+LD_h(y,x)
f(y)\leq f(x)+\langle\nabla f(x),y-x\rangle+LD_h(y,x)
The necessary and sufficient condition of relative L smooth condition.
Mirror descent algorithm, expressed with the Bregman divergence.

Mirror descent convergence proof

The proof follows that of gradient descent and L-smoothness. However, with Bregman divergence, simple concepts (e.g. Pythagorean theorem) now require detailed proofs.

Bregman Proximal Equality

Three-point Equality

no time to write will finish latter