Monday, June 30, 2014

The licensing system overhead


MOSEK employs a license system to make sure users pay for the software so the developers can get payed.  The way the licensing system works is that a valid license token must be obtained either from a file, or from a token server, i.e. a service/daemon running on a computer on the network.
Clearly, checking out a license token costs some time, but for large scale optimization problems the cost is negligible. However in an application where a large number of small problems are solved in a short amount of time, such checkout overhead can be significant. Here will we try quantify the overhead.

Licensing system at glance

First, let's summarize how MOSEK interacts with the license system:
  1. A license is checked out the first time MOSEK tries to optimize a problem.
  2. The check out involves either reading a file or querying a token server on the network.
  3. By default MOSEK employs license caching, i.e. the token is stored in the so called MOSEK environment. A environment can be reused for all the optimizations.
  4. Therefore, the license is released when the MOSEK environment is destroyed and not when a single optimization terminates.
The license check out overhead is split among the first optimization and the environment termination. Many users, for sake of a cleaner and simpler code keep creating and destroying environments at each optimization. This approach also allows to keep licenses free as much as possible. But if you were to solve thousands of small problems, you might experience a performance penalty.

The test

So the question arises: how much overhead the licensing system introduce?

To answer this question we will test MOSEK using the Python Optimizer API. We will call the solver with an empty problem twice, and measure the time spent for the first and the second call. The test is then repeated 10000 times. We use the standard timeit  Python package. The code for the tests is the following:

We execute the test both in the case of a machine running a token server locally, and in the case of a license file stored locally. The latter corresponds to the case of using a trial license for instance. We obtain the following results:

  1. The first line shows the time spent for creating the environment, the task and then running the solver. The license is release as the environment is destroyed.
  2. The second line reports the time when the environment is created only once and kept the same, while a new task is allocated for each problem.
  3. The last line shows the case in which we only create the environment and the task only once.

It can be seen:
  • There is no time significant difference between using a license file or a token server in an ideal case as the one we are using.
  • If the environment is not reused then average time is 8 milliseconds whereas if it is then the average time is $0.1$ milliseconds. 
  • It takes around $8$ microseconds to check out a license.


Therefore, if an optimization problem requires less than say $0.1$ second to solve then license check out time is going to be significant if license caching is disabled, i.e. the MOSEK environment is created and destroyed at each solver run.

Tuesday, June 17, 2014

What if the solver stalls?

In this post we will discuss why the solver get sometimes stalls and how does this relates with the solution quality. What does we mean for stall?

The solver get stalled whenever it cannot make any sensible progress towards the optimal solution.

An iteration of a continuous optimizer in MOSEK consists of
\[ x^+ = x + \alpha d_x\]
  • $x$:  Is the current guess for the optimal solution.
  • $d_x$: Is a direction computed in some way by the optimizer.
  • $\alpha$: Is nonnegative step size.
  • $x^+$: Is the updated guess for the optimal solution.
The interpretation is that the optimizer computes a search direction $d_x$ and then takes a step in that direction. The search direction is computed such that when moving in that direction the current solution will be moved closer towards an optimal solution. Normally the step size $\alpha$ is chosen such that the new trial solution $x^+$ is as close as possible to the optimal solution with respect to some merit function. 

If the search direction is badly chosen then the optimal step size might be very close to $0$, in which case the algorithm is stalled. So why does this happens in practice? Typically the search direction $d_x$ is given as the solution of a system of linear equations, say

\[H d_x = h,\]

where it can be proven that if the linear system is solved accurately then good progress is obtained. Unfortunately in practice due to fact that computations are done in finite precision then the search direction may be of poor quality leading to a stall.

To summarize due to numerical issues MOSEK may not be able to compute a good search direction and if that leads to an extremely short step then MOSEK stops and says STALLED. Thus it does not make sense to continue because the subsequent iterations will also leads to small steps and no progress at all.

Observe that when a stall occurs then usually the current solution is very close to an optimal solution and many cases the reported solution is good enough for most practical purposes. Recall that usually the optimization problem solved is approximation of the true problem and even in the best case MOSEK will only report an approximated solution to approximated problem. 

Let's visualize the algorithm execution using a simplified scheme: the feasible region (primal and dual) is the yellowish one and the solution is the green dot on its corner. Algorithm execution generates a sequence of iterates, the green dots, that hopefully will end up in 10.

Interior-point algorithm execution sketch.

For each iterates, a normalized primal and dual feasibility violation is computed, as well as a measure of optimality violation. Let's call them $r_p,r_d$ and $r_g$ respectively. Ideally, $||(r_p,r_d,r_g)||$ should be driven to zero, but in practice the solver stops when
\[ ||(r_p,r_d,r_g)|| \leq \epsilon_f.\]

Thus $\epsilon_f$ defines a region in which a solution is optimal, i.e. the purple region in the figure. When the solver terminates it provides a solution with a certain solution status. Hopefully, it will be optimal. Otherwise, the solver may still be able to provide a useful solution: for this reason, and to improve flexibility, MOSEK denotes some solutions as near-optimal: the solution would be optimal if we scale up $\epsilon_f$ by a factor $\alpha>1$, i.e.

\[ ||(r_p,r_d,r_g)|| \leq \epsilon_f\alpha.\]

This corresponds to the reddish region. As the algorithm proceeds, we reach the near-optimal region at step (9) and then the optimal one at step (10). If all goes well, we stop in the latter. If we get stalled at (9) we will get a near-optimal solution. But what if the solver get stalled at step (8)? The solution status is unknown in this case.

In general, if the solver is stalled, the solution status is still meaningful since the solver is able to check the solution feasibility and the residual values. In particular a solution could be near-optimal. In our example, step (8) is not even near optimal and could be not very useful, but it is still worth to check it.
In case of stall, user should
  • look at solution summary: provides the primal and dual violation under the column PFEAS and DFEAS, and the MU column measure the optimality violation;
  • retrieve the solution status from the solver.
What can be done to avoid stalls? It is hard to say something about in general. A few good rules to follow are:
  • Avoid nearly feasible or nearly infeasible problems.
  • Make sure the problem is scaled reasonably.
  • Make sure the primal and dual solutions are reasonably bounded i.e. not too large in absolute size.
Users should consider revising their model is stalling happens regularly. If modeling does not help, you are encouraged to contact our support.

What NOT to do when an optimization problem is infeasible!

Given your optimization problem is infeasible then what should do?

A common practice is to look at the infeasible solution returned by the optimizer e.g. MOSEK and then relax the bounds that are violated by the reported solution. This practice is rooted in the fact that in the good old days all linear optimization problems was optimized using the primal simplex algorithm employing a phase 1 and phase 2 approach as described in textbooks. The purpose of the phase 1 is to find a feasible solution and then in phase 2 to locate an optimal solution starting from the phase 1 solution. The implication is if no feasible solution exists, then the reported solution will be as close to as feasible solution as possible in a well defined sense. Therefore, it make sense to relax the problem based on the solution reported by the phase 1 because that leads to the smallest possible relaxation that make the problem feasible.

However, nowadays this does not work in most cases for the following reasons:
  1. Most optimizers apply a presolve to the problem before optimizing. Now finding an optimal phase 1 solution to the presolved problem is almost never the same as finding an optimal phase 1 solution to the original problem  if the problem is infeasible. Therefore, if the presolve has been effective the reported solution is likely to be far from an optimal phase 1 solution to the original problem.
  2. Moreover, if the problem is found to be infeasible during the presolve, then the phase 1 problem is not solved at all and hence the optimal solution to the phase 1 problem is not available.  
  3. Interior-point based optimizers does not use a phase 1 and phase 2 approach, Therefore, the primal solution returned by an interior-point optimizer for infeasible problems is usually not very informative and definitely not close to an optimal solution to a phase 1 problem.
  4. The dual simplex algorithm never solves the phase 1 problem. Rather in case of an infeasible problem it produce a Farkas certificate of the infeasibility.  
The main recomodation is if an optimization problem is diagnosed infeasible by an optimizer e.g. MOSEK, then do NOT use the primal solution for anything unless you are absolutely sure it is an optimal solution to a phase 1 problem. 

Then what should you do? Assuming the problem is a continuous linear or convex optimization problem then there is a so called Farkas certificate that proves the problem is infeasible. Some optimizes like MOSEK will report the Farkas certificate as the dual solution. Such a Farkas certificate can be used to repair the problem as discussed in the technical report.  

Alternatively a phase 1 problem can be formed explicitly and solved if you want to do the traditional repair procedure. By the way the optimal dual solution to the phase 1 problem will be a Farkas certificate of the infeasible status. Hence, the phase 1 problem can seen as computing a Farkas certificate if one is available.

Wednesday, June 4, 2014

New Rmosek release available

We are happy to announce that a new version of Rmosek is available.

The new release is now fully compatible with the latest R version 3.1 recently released.

You can download  the package from

Give it a try!