1. complexity measures of an optimization problem

On this page: how optimization problems arise in machine learning, and what performance measures do we care about: Statistical Errors v.s. Optimization Errors. {Part of the Convex Optimization (know your Gradient Descents!) series}

2 min readNov 28, 2020

back to 《A Deeper Learner》main page

The problem

How do optimization problems arise in machine learning? This may be clarified with some term definitions.

Term definitions

Data: pairwise information of the correct (or observed) input-output correspondence, e.g. (x_n, y_n)∈ℝ^D ×{−1, +1} for image-label (of 2 class) pairs.

Hypothesis class: A set of hypothesis/classifiers/functions that maps input to output, e.g. h∈H, h: ℝ^D →{−1, +1}

Loss function: A function that assigns a real number to a hypothesis and a datapoint, e.g. λ: H×ℝ^D ×{−1, +1}→ℝ

Risk function: The expectation of the loss function R=E[λ(h,z)].

h^* \in \underset{h\in\mathcal{H}}{\arg\min}\, R(h) — The most general form of optimization.

The goal is obviously to find a hypothesis that has a minimal loss for any unforeseen datapoints. If we define the expectation of the loss function as a risk function R, we can write down the most general optimization problem.

Empirical Risk Minimization (ERM)

By the law of large numbers, we approximate the real expectation in the risk function with the average of N attempts.

h^*\approx\hat{h}_N\in\underset{h\in\mathcal{H}}{\arg\min}\hat{R}_N(h)\equiv\frac{1}{N}\sum_{n=1}^N\lambda(h,z_n) — Minimize the empirical risk to approximate the real risk function.

Error estimate to find uniform law of large numbers

The statistical error of the above estimation is given by,

R(\hat{h}_N)-R(h^*)\leq2\sup_{h\in\mathcal{H}}|\hat{R}_N(h)-R(h)| — statistical error estimation.

1. complexity measures of an optimization problem

On this page: how optimization problems arise in machine learning, and what performance measures do we care about: Statistical Errors v.s. Optimization Errors. {Part of the Convex Optimization (know your Gradient Descents!) series}

The problem

Term definitions

Empirical Risk Minimization (ERM)

Error estimate to find uniform law of large numbers

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Jimmy Yeh

No responses yet