Wednesday, March 14, 2018
Why we use projection to measure error
In classic nonlinear optimization you typically measure the error of a constraint,
$$
\label{eq:nlpfunc}
f(x) \leq 0,
$$
by the value $\max(0, f(x))$. The problem with this measure is that reformulations such as $1 000 000 f(x) \leq 0$, or $f(x)^3 \leq 0$, or even $\exp(f(x)) - 1 \leq 0$ show very different errors for the exact same point. This makes it highly representation dependent. In fact, in this measure, the inevitable error introduced by rounding the true algebraic solution $x^*$ to its nearest floating-point representation even depends on the rate-of-change in $f(x)$. As example, this error grows significantly with higher values of $x^*$ in $\exp(x)$, as well as with lower positive values of $x^*$ in $1/x^2$.
In conic optimization the nonlinearities you model are independent of representation. This is because you only ever constrain variables to a cone, i.e., a set of points. This allows us to change and tune internal representations for optimal performance. Correspondingly, we also want the error reports you receive from MOSEK to be independent of representation. Our choice of projection, being the minimum distance from point to cone, is a widely known, computationally inexpensive and mathematically well-defined concept that fits the glove. This definition, however, does not always fit the intuitive understanding of error as captured by the following question we sometimes get:
Q: The result $\hat x = (0, 1{\rm e}18, 1{\rm e}3)$ fails to satisfy $2 x_1 x_2 \geq x_3^2$; the left is zero and the right is a million! Why does MOSEK claim feasibility when the conic domain $x \in \mathcal{Q}_r^3$ is so clearly violated?
A: Note that $x^* = (5{\rm e-}13, 1{\rm e}18, 1{\rm e}3)$ is feasible, showing that the projection distance to the conic domain is less than $\| x^* - \hat x\|_2 = 5{\rm e-}13$. This is a small violation in floating-point computations. In the particular case of $\hat x$, we may reformulate the constraint to $x_1 \geq \frac{x_3^2}{2x_2}$ and obtain a comparable error measure by direct comparison of left and right.
If you wonder why there are no feasibility tolerances on conic domains, the answer is simple. They are satisfied symbolically in all computations, even if the interior-point solver is terminated before convergence. The small violations you may see are due to floating-point arithmetic.
Finally, if you want to know more about projection theory, computation and how we use it in MOSEK, we invite you to Henrik's session at ISMP 2018.