4. mirror descent and relative smoothness

Convex non-smooth and the convergence of mirror descent with relative smoothness

Jimmy Yeh
2 min readNov 29, 2020

back to 《A Deeper Learner》main page

Relative smoothness

In the last lecture, we find smoothness required for convergence guarantee. How to generalize smoothness? Ans: Bregman divergence.

Bregman divergence

D_h(y,x)\equiv h(y) — h(x) — \langle\nabla h(x), y-x\rangle
The Bregman divergence.
The Bregman divergence illustrated by Prof. Yen-Huan Li.

Notice that Bregman divergence is not symmetric in terms of x and y, but can be considered an asymmetric “x to y” connotation.

D_h(y,x) &= \frac{1}{2}\lVert y\rVert² — \frac{1}{2}\lVert x\rVert² — \langle \nabla \frac{1}{2}\lVert x\rVert², y-x\rangl
Bregman divergence returns to the (second term) of the L-smooth definition.

We can find the second term in the L-smooth requirement as a special case of the Bregman divergence, where h is the half L-scaled-squared Euclidean norm. Thus, by replacing the second term in the L-smooth definition and gradient descent algorithm, we find the relative smoothness and a new type of “mirror” descent algorithm.

f(y)\leq f(x)+\langle\nabla f(x),y-x\rangle+LD_h(y,x)
The necessary and sufficient condition of relative L smooth condition.
Mirror descent algorithm, expressed with the Bregman divergence.

Mirror descent convergence proof

The proof follows that of gradient descent and L-smoothness. However, with Bregman divergence, simple concepts (e.g. Pythagorean theorem) now require detailed proofs.

Bregman Proximal Equality

Three-point Equality

Sign up to discover human stories that deepen your understanding of the world.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Jimmy Yeh
Jimmy Yeh

Written by Jimmy Yeh

no time to write will finish latter

No responses yet

Write a response