https://www.blogger.com/blogger.g?blogID=2124323532937076865#overview The MOSEK blog: February 2025

Thursday, February 27, 2025

New compute cluster for our MIP team

Solving a mixed integer programming (MIP) problem can be extremely time-consuming using the so-called brand and bound algorithm. Therefore, a MIP solver like MOSEK incorporates a lot of algorithmic improvements to reduce the solution time. Sometimes those improvements are called tricks.

Now to evaluate whether some trick benefits the MIP solver, then the MIP solver with and without the trick included is used to solve a benchmark set of test problems and if the benchmark results indicate the trick helps, then it is included in the solver.

Clearly, if all the test problems are solved on one specific computer, then the timing results are comparable. However, to make robust conclusions then a lot of carefully selected test problems must be employed. This has the unfortunate consequence that evaluating a new trick is very time-consuming. The benchmark problems can of course be solved in parallel, but solving multiple problems in parallel on one computer will not produce reliable timing results that can be compared. The only way of getting comparable timing results quickly is to solve many problems in parallel on a cluster of identical computers.

That is why we at MOSEK recently invested 55K+ USD in a compute cluster made up of identical computers for the MIP development team. The cluster consists of 4 boxes each containing 8 computational nodes i.e. it provides 32 identical computers.

Hopefully, you won't have to wait too long to see the benefits in Mosek arising from faster testing of potential new improvements in the MIP solver!

In the meantime, you can enjoy this photo of our cluster working away :)

Thursday, February 20, 2025

Should Mosek optimizer parameters be tweaked?

Let us begin by answering the question in the title which is no. This post presents arguments behind that answer.

In a recent substack post, Ben Recht mentioned that Mosek and optimization software in general have many algorithmic parameters that can be tweaked. Clearly, it may be overwhelming for the average user of optimization software to figure out what the parameters do and how to change them.

Note that the types of parameters that are referred to are those that affect the algorithms of the optimizers and not the parameters that affect things like the amount of log output.

However, in the case of Mosek a good rule of thumb is to only tweak the parameters if there is a very good reason to do so. The main reasons for this are:

The default values of the parameters are chosen after extensive testing. Hence, it is hard to improve the average optimizer performance by tweaking the parameters.
If it is easy to devise a rule that automatically improves the performance of Mosek by adjusting a parameter, then such a rule would be built into Mosek.
Mosek is only tested extensively for the default parameter settings and hence weird things can happen for non-default parameter settings. For instance, the stopping criteria employed by the interior-point optimizer normally would lead to the most accurate solution possible. Therefore, tightening the stopping criteria in most cases increases the number of iterations(=work) and results in no or minimal benefit.

Given the above arguments an obvious question is then, why are so many parameters made available to the users of Mosek? The two most common reasons are:

For instance, the mixed-integer optimizer can take a lot of time to solve a problem to optimality and therefore it might be relevant to relax the stopping criteria to obtain a less accurate but decent solution in a reasonable amount of time.
Mosek implements many heuristics that make beneficial algorithmic choices. An example is, decide whether to solve the problem on primal or the dual form in the case of interior-point optimizer. This heuristic is not perfect and in some cases, it is worthwhile to override it.
Occasionally Mosek has a bug or simply performs poorly on specific problems. In such cases, a tweak in the parameter settings may provide a good workaround.

To summarize, based on 25 years of experience, parameter tweaking rarely solves performance or robustness issues. Indeed it is rarely the quick fix a user might hope for. The correct fix is in most cases to improve the model formulation by avoiding bad scaling, big Ms etc.

As an exception to the general rule mentioned in the conclusion, let us finish by listing the parameters

that are the most common to tweak:

Relaxing the stopping criteria of the mixed integer optimizer i.e. relax the relative gap tolerance.
Turn the linear dependency checker off in the presolve because the computational cost of the procedure is not worth the benefit. This is often the case when the constraint matrix is generated from a finite element model.
Force an optimizer to employ a specific number of threads e.g. setting the number of threads to one in the case when multiple problems are solved in parallel.

Friday, February 7, 2025

A small breakthrough in the modeling of polynomial denominators

In 2018, we challenged our community to find convex functions that could not be reformulated in terms of the cones supported in MOSEK. A long-standing contender for this challenge was

$$ \frac{1}{x^4 + x^2}\quad \text{ for } x \geq 0, \label{eq:challenge}$$

but as of today, we are finally able to present a conic reformulation of this function guided by past experiences. In this blogpost we take a look at the insights that enabled this.

Before diving into the details, however, we would like to clarify that conic optimization is able to encompass any convex function, including \eqref{eq:challenge}, via perspective reformulation. The problem, however, is that the resulting cones have widely different computational properties (some even being NP-hard to optimize over!). Restricting attention to the set of cones supported in MOSEK, we effectively limiting ourselves to representations with practical large-scale performance. As suggested by the community challenge, however, it turns out to be quite hard to find convex functions without a computationally efficient conic representation.

So what makes \eqref{eq:challenge} so difficult?

Insight 1: Polynomial denominators

For a polynomial $g(x)$ with real roots $g_1 \leq \ldots \leq g_k$, we can represent

$$t \geq \frac{1}{g(x)} = \frac{1}{\alpha(x - g_1)\cdots(x - g_k)}$$

on a convex subinterval, $[g_p, g_{p+1}]$, via a single power cone

$$ t \alpha (x - g_1)\cdots(x - g_p)(g_{p+1} - x)\cdots(g_k - x) \geq 1,$$

by flipping signs of the factors, $(x - g_i)$, so that they are all nonnegative on the subinterval of interest. This method can also represent the left and right tail, $[-\infty, g_1]$ and $[g_k, \infty]$, if convex. The issue with $g(x) = x^4 + x^2 $ is that it has a complex conjugate root pair, $\pm i$, and so the closest we can get in this direction is

$$ t x^2 (x^2 + 1) \geq 1. \label{eq:step1}$$

Insight 2: Nonlinear substitution

A natural first attempt would be to substitute out the convex term, $x^2 + 1$, from \eqref{eq:step1} using

$$ t x^2 r \geq 1,\quad r \geq x^2 + 1,$$

but this fails since $r \rightarrow \infty$ satisfy both constraints without enforcing the desired relation. We thus resort to a lesser known trick, namely to rewrite $x^2 + 1 = f(f^{-1}(x^2 + 1))$ and then substitute out $f^{-1}(x^2 + 1)$. With this in mind, it would be natural to try

$$ t x^2 r^p \geq 1,\quad r \leq (x^2 + 1)^{1/p},$$

for some suitable power $p$, but this fails because $(x^2 + 1)^{1/p}$ is not concave on all of $x > 0$ for any $p \geq 1$. Nevertheless, with a bit of persistence, we finally manage to satisfy all composition rules with

$$ t x r^4 \geq 1,\quad r \leq (x^3 + x)^{1/4}, \label{eq:step2}$$

where the latter is a convex set in need of a conic reformulation.

Insight 3: Signed sum-of-quartics decomposition

The signed sum-of-squares decomposition was introduced in a previous talk on our MOSEK channel, as a new reformulation tool with great applicability. In case of $(x^3 + x)^{1/4}$, the fourth root suggests need of a signed sum-of-quartics instead. We do not know of any algorithm to perform this decomposition, but using the method of undetermined coefficients (guessing the final form, and mapping coefficients via quantifier elimination), we obtain

$$ x^3 + x = (ax + a)^4 - (ax - a)^4,$$

where $a = \frac{1}{2^{3/4}}$. This allows us to rewrite \eqref{eq:step2} as

$$ t x r^4 \geq 1,\quad r^4 + (ax - a)^4 \leq (ax + a)^4, \label{eq:step3}$$

where the former is a power cone and the latter is the p-norm cone of order 4, which is a well-known conic representable set (we refer to the MOSEK Cookbook).

This concludes our reformulation from the convex function \eqref{eq:challenge} to the conic representation \eqref{eq:step3}. In time, we shall see if this process extends to a wider range of polynomial denominators.