1. complexity measures of an optimization problem
On this page: how optimization problems arise in machine learning, and what performance measures do we care about: Statistical Errors v.s. Optimization Errors. {Part of the Convex Optimization (know your Gradient Descents!) series}
back to 《A Deeper Learner》main page
The problem
How do optimization problems arise in machine learning? This may be clarified with some term definitions.
Term definitions
Data: pairwise information of the correct (or observed) input-output correspondence, e.g. (x_n, y_n)∈ℝ^D ×{−1, +1} for image-label (of 2 class) pairs.
Hypothesis class: A set of hypothesis/classifiers/functions that maps input to output, e.g. h∈H, h: ℝ^D →{−1, +1}
Loss function: A function that assigns a real number to a hypothesis and a datapoint, e.g. λ: H×ℝ^D ×{−1, +1}→ℝ
Risk function: The expectation of the loss function R=E[λ(h,z)].

The goal is obviously to find a hypothesis that has a minimal loss for any unforeseen datapoints. If we define the expectation of the loss function as a risk function R, we can write down the most general optimization problem.
Empirical Risk Minimization (ERM)
By the law of large numbers, we approximate the real expectation in the risk function with the average of N attempts.

Error estimate to find uniform law of large numbers
The statistical error of the above estimation is given by,
