Friday, August 17, 2018

Solving SDP with millions of matrix variables

Is it feasible in practice to solve semidefinite optimization problems with a huge number of matrix variables?

We recently received a problem from a structural engineering application with approximately the following  parameters:

  • 1 500 000 three-dimensional matrix variables,
  • 750 000 three-dimensional rotated quadratic cones,
  • $8\cdot 10^6$ scalar variables, $15\cdot 10^6$ linear constraints, $45\cdot 10^6$ nonzeros.

On a DELL PowerEdge R730 server with 2 Xeon E5-2687W v4 3.0GHZ the optimal solution is found in about 161 minutes on 24 threads using the latest MOSEK 8.1.0.59 with memory peaking at about 60GB.  Due to the nature of the problem we disabled the linear dependency check and otherwise used all standard parameter settings.

Friday, June 29, 2018

.NET Core support

From release 8.1.0.56 we initialize support for .NET Core, the cross-platform implementation of .NET.

MOSEK for .NET Core is distributed as a platform-independent NuGet package, which can be downloaded directly from our website. See the documentation for .NET APIs for installation instructions.

Friday, June 22, 2018

MOSEK at ISMP 2018

Here is a complete schedule of our talks at ISMP 2018 in chronological order:

SpeakerTitleSessionTimeRoom
Sven WieseThe Mixed-integer Conic Optimizer in MOSEKMixed-Integer Conic OptimizationMon, 5:00PMA. DURKHEIM, Bld. A
Henrik FribergProjection and presolve in MOSEK: exponential and power conesTheory and algorithms in conic linear programming 1Tue, 8:30AMS. LC5, Bld. L
Joachim DahlExtending MOSEK with exponential conesTheory and algorithms in conic linear programming 2Wed, 8:30AMS. 20, Bld. G
Erling AndersenMOSEK version 9Progress in Conic and MIP SolversWed, 3:15PMA. PITRES, Bld. O
Michał AdamaszekExponential cone in MOSEK: overview and applicationsRelative Entropy Optimization IFri, 3:15PMS. LC5, Bld. L

Check the program for any last-minute changes.

The session on Mixed-Integer Conic Optimization (Mon, 5:00PM) is organized by Sven Wiese. The full program of that session is:

  • Lucas Letocart,  Exact methods based on SDP for the k-item quadratic knapsack problem
  • Tristan Gally, Knapsack Constraints over the Positive Semidefinite Cone
  • Sven Wiese, The Mixed-integer Conic Optimizer in MOSEK

Tuesday, June 12, 2018

Elementary intro to infeasibility certificates

We occasionally receive support questions asking about the meaning and practical use of infeasibility certificates, usually when the user expected the problem to be feasible. While an infeasibility certificate is a well-defined mathematical object based on Farkas' lemma, it is not intuitive for everyone how to use it to address the basic question of "What is wrong with my problem formulation?". We will try to explain this on a very simple example.

In this post we don't even consider optimization, but restrict to elementary linear algebra. Suppose we want to find a solution to a system of linear equations:

$$
\begin{array}{llllllllllll}
& 2x_1 & - & x_2 & + & x_3 & - & 3x_4 & + & x_5 & = & 1, \\
& & & x_2 & -& 2x_3 & & & +& x_5 & = & -1,\\
& 3x_1 &  & & - & x_3 & - & x_4 & & & = & 2, \\
& 2x_1 & + & x_2 & - & 3x_3 & - & 3x_4 & + & 3x_5 & = & -0.5.
\end{array}
$$

This system of linear equations is in fact infeasible (has no solution). One way to see it is to multiply the equations by coefficients given on the right-hand side and add them up:

$$
\begin{array}{lllllllllllll}
& 2x_1 & - & x_2 & + & x_3 & - & 3x_4 & + & x_5 & = & 1 & / \cdot (-1) \\
& & & x_2 & -& 2x_3 & & & +& x_5 & = & -1 & / \cdot (-2) \\
& 3x_1 &  & & - & x_3 & - & x_4 & & & = & 2 & / \cdot 0 \\
& 2x_1 & + & x_2 & - & 3x_3 & - & 3x_4 & + & 3x_5 & = & -0.5 & / \cdot 1 \\
\hline
&  &  &  &  &  &  &  &  & 0 & = & 0.5. &
\end{array}
$$

We get an obvious contradiction, which proves the system has no solution. The vector

$$ y = [-1, -2, 0, 1] $$

of weights used above is therefore a proof (certificate) of infeasibility. MOSEK produces such a certificate automatically. Here is a simple script in MATLAB that computes precisely that vector:


As output we get:


Here are a few comments:

  • General theory guarantees that the dual variable y is a convenient place that can store the certificate.
  • When a system of equations has no solution then an appropriate certificate is guaranteed to exist. This is a basic variant of Farkas' lemma, but it should be intuitively clear: your favorite method of solving linear systems (for instance Gaussian elimination) boils down to taking  successive linear combinations of equations, a process which ends up either with a solution or with an "obvious" contradiction of the form $0=1$.


Using the certificate to debug a model 


In the example above $y_3=0$ which means that the third equation does not matter: infeasibility is caused already by some combination of the 1st, 2nd and 4th equation. In many practical situations the infeasibility certificate $y$ will have very few nonzeros, and those nonzeros determine a subproblem (subset of equations) which alone cause infeasibility. The user can configure MOSEK to print an infeasibility report, which in our example will look like:



This report is nothing else than a listing of equations with nonzero entries in $y$, and $y$ is the difference between Dual lower and Dual upper. Analyzing this set (which hopefully is much smaller than the full problem) can help locate a possible modeling error which makes the problem infeasible.

To conclude, let us now phrase the above discussion in matrix notation. A linear equation system

$$ Ax=b $$

is infeasible if and only if there is a vector $y$ such that

$$ A^Ty = 0 \quad \mbox{and}\quad b^Ty \neq 0. $$

The situation is analogous for linear problems with inequality constraints. We will look at an example of that in another blog post.


Thursday, May 31, 2018

Perspective functions (and Luxemburg norms)

If $\varphi:\mathbb{R}_+\to\mathbb{R}$ is a convex function then its perspective function $\tilde{\varphi}:\mathbb{R}_+\times\mathbb{R}_+\to\mathbb{R}$ is defined as
$$\tilde{\varphi}(x,y) = y\varphi(\frac{x}{y}).$$
Moreover, under these conditions, the epigraph of the perspective function
$$\left\{(t,x,y)~:~t\geq y\varphi(\frac{x}{y})\right\}$$
(or, to be precise, its appropriate closure) is a convex cone. Here are some familiar examples:

  • $\varphi(x)=x^2$. Then $\tilde{\varphi}(x,y)=\frac{x^2}{y}$, familiar to some as quad-over-lin. The epigraph of $\tilde{\varphi}$, described by $ty\geq x^2$, is the Lorentz cone (rescaled rotated quadratic cone).
  • $\varphi(x)=x^p$. Then $\tilde{\varphi}(x,y)=\frac{x^p}{y^{p-1}}$ and the epigraph of $\tilde{\varphi}$, described equivalently by $t^{1/p}y^{1-1/p}\geq |x|$ is the 3-dimensional power cone (with parameter $p$).
  • $\varphi(x)=\exp(x)$. Then the epigraph $t\geq y\exp(x/y)$ is the exponential cone.
  • $\varphi(x)=x\log(x)$. Then the epigraph $t\geq x\log(x/y)$ is the relative entropy cone.

We bring this up in connection with a series of blogposts by Dirk Lorenz here and here. For a monotone increasing, convex, nonnegative $\varphi$ with  $\varphi(0)=0$ he defines a norm on $\mathbb{R}^n$ via
$$\|x\|_\varphi=\inf\left\{\lambda>0~:~\sum_i\varphi\left(\frac{|x_i|}{\lambda}\right)\leq 1\right\},$$
and we can ask when the norm bound $t\geq \|x\|_\varphi$ is conic representable. The answer is: if the epigraph of the perspective function $\tilde{\varphi}$ is representable, then so is the epigraph of $\|\cdot\|_\varphi$. The reason is that the inequality
$$\sum_i\varphi\left(\frac{|x_i|}{t}\right)\leq 1$$
is equivalent to
$$
\begin{array}{ll}
w_i\geq |x_i| & i=1,\ldots,n,\\
s_i\geq t\varphi\left(\frac{w_i}{t}\right) & i=1,\ldots,n,\\
t=\sum_i s_i. &
\end{array}
$$
That covers for example $\varphi(x)=x^2$, $\varphi(x)=x^p$ ($p>1$), $\varphi(x)=\exp(x)-1$, $\varphi(x)=x\log(1+x)$ and so on.
Figure: smallest $(\exp(x)-1)$-Luxemburg-norm disk containing a random set of points.

Wednesday, May 9, 2018

New Modeling Cookbook


We released a new, revised version of the MOSEK Modeling Cookbook



HTML     PDF (A4)     PDF (letter)

where you can find practical modeling examples and some theory behind:
  • linear optimization,
  • conic quadratic optimization,
  • the power cone,
  • the exponential cone (relative entropy cone),
  • semidefinite optimization,
  • duality in linear and conic optimization,
  • mixed-integer optimization,
  • quadraticaly constrained optimization,
  • and numerical issues.
The Modeling Cookbook is a general compendium of practical conic optimization. It is a generic mathematical publication without any code samples or references to the MOSEK API, although it covers areas of optimization that are supported by MOSEK 8 or the upcoming MOSEK 9.

Monday, March 26, 2018

Progress on the pooling problem and new reformulation techniques

On the 8th of March 2018, an interesting paper on the pooling problem landed on arxiv. The authors take the pq-formulation, known to have the strongest continuous relaxation, and strengthen it further it by adding convex hull-describing valid inequalities for subproblems -- one for each (product, attibute, pool)-combination. Two of the inequalities are linear, one is conic quadratic, and the final is general convex. We took up the challenge and also cast the latter on conic quadratic form. Here is what we learned.

The inequality is this:
$$
\label{eq:cut}
\overline{\beta}(\overline{\gamma} x - u) + h(y, v) \leq \overline{\beta}(\overline{\gamma} - t),
$$

valid on $y > 0$ when
$$
h(y,v) := A y + B g(y,v),\quad g(y,v) := \frac{yv}{y+v},\quad v \geq 0,
$$

where $A = \overline{\gamma}-\underline{\gamma} > 0$ and $B = \underline{\gamma} < 0$ are coefficients. The challenge is first to put this on conic quadratic form, and then to extend its validity to all of $y \in \mathbb{R}$, while staying tight on $y > 0$. On the first challenge of conic representability, our analysis led to a small surprise, namely, the possibility to represent harmonic means (see this blog post). The inequality $\eqref{eq:cut}$ is thus representable on $y > 0$ by $\overline{\beta}(\overline{\gamma} x - u) + h \leq \overline{\beta}(\overline{\gamma} - t)$ and
$$
\label{eq:repr:ypos} 
h = A y + B g,\quad \left(\begin{matrix}0.5 (y-g)\\ y+v\\ y\end{matrix}\right) \in Q_r^3,\quad v \geq 0.
$$

We stress that the conic form is a proof of convexity. Next is the challenge of extending this representation to be valid on all of $y \in \mathbb{R}$, while staying tight on $y > 0$. That is, as proven in the paper, we need to extend the representation such that $h \leq 0$ on $y \leq 0$. Notably, however, the current representation in $\eqref{eq:repr:ypos}$ fails this criterion as shown by the graph of $h(y, 5)$ where $A=1$ and $B=-3$.

For this task we deploy another reformulation technique which, in lack of better names, may be denoted as stretching the extremum. The idea is to locally, in the representation of $h(y,v)$, exchange $y$ by $\hat{y} = y + s$ for some amount of stretch $\underline{s} \leq s \leq \overline{s}$. To see the effect of this let $y_{min} = \arg\min_{y \in \mathbb{R}} h(y,v)$, and note that
$$
\min_{s \in \mathbb{R}} h(y+s,v) = \begin{cases}
h(y+\overline{s},v), & {\rm if\ } y_{min} \geq y+\overline{s}, \\
h(y_{min},v), & {\rm if\ } y+\underline{s} \leq y_{min} \leq y+\overline{s},\\
h(y+\underline{s},v), & {\rm if\ } y+\underline{s} \geq y_{min}.
\end{cases}
$$

The visual interpretation of this is that we cut the graph of the function in two parts at $y_{min}$, move the two parts away from each other along the y-axis by amounts determined by $\underline{s}$ and $\overline{s}$, and finally fill in the gap with the extremum value $h(y_{min},v)$. In case of $\eqref{eq:repr:ypos}$ we use $\underline{s} = 0$ and $\overline{s} = \infty$ to allow $h$ to take smaller values of $h(y,v)$ at any higher value of $y$. In the graphed example of $h(y, 5)$ above, we get that the lower bound on $h$ is relaxed to the left of the extremum.



As $h(0,v) = 0$ for all $v \geq 0$, we get the desired property that $h \leq 0$ is possible on $y \leq 0$. In conclusion, the globally valid conic quadratic representation of the convex inequality we started out from, is given by $\overline{\beta}(\overline{\gamma} x - u) + h \leq \overline{\beta}(\overline{\gamma} - t)$ and

$$
h = A \hat{y} + B g,\quad \left(\begin{matrix}0.5 (\hat{y}-g)\\ \hat{y}+v\\ \hat{y}\end{matrix}\right) \in Q_r^3,\quad v \geq 0, \quad \hat{y} \geq y.
$$



Friday, March 23, 2018

Easter holidays 2018

Support and sales will be closed for Easter holidays

from Thursday, March 29th until Monday, April 2nd (inclusive)

We wish everyone a good Easter.

MOSEK Team

Thursday, March 15, 2018

AOO Copenhagen 23rd April 2018


Again this year MOSEK is one of the sponsors of the Applications of Optimization day (AOO 2018), the annual meeting of the Danish Operational Research Society (DORS). It is a great opportunity to get in touch with people interested in various aspects of optimization.

The meeting will be held at Industriens Hus, in the very heart of Copenhagen (DK) on April 23rd, 2018. Please check the official web site for more information.

Wednesday, March 14, 2018

Harmonic mean, p-norms and extremely disciplined conic modeling

Contrary to what we inadverently claim in our modeling cookbook, there IS a conic quadratic model of the harmonic mean. More precisely, we can model the constraint
$$0\leq t\leq \frac{n}{\frac{1}{x_1}+\cdots+\frac{1}{x_n}},\quad x_i>0$$
Here is the model, which at the same time demonstrates the non-obvious fact that the right-hand side of (1) is actually concave. We call this approach extremely disciplined modelling, since by definition it prevents users from creating a non-convex model. We first notice that (1) is equivalent to
$$\sum_{i=1}^n\frac{t^2}{x_i}\leq nt$$
which can be written using rotated quadratic cones $2uv\geq w^2$, $u,v\geq 0$, as follows:
$$
\begin{eqnarray}
t^2\leq 2x_iy_i, &\quad i=1,\ldots,n \\
2\sum_i y_i = nt. &
\end{eqnarray}
$$
For the special case $n=2$ we can model $t\leq \frac{2xy}{x+y}$ with just one rotated cone: this inequality is equivalent to $2(x-\frac12 t)(y-\frac12 t)\geq (\frac12t)^2$.

The same way we can model (and prove concativity of) a general "negative $p$-norm":
$$ 0\leq t\leq (\sum_i x_i^{-p})^{-1/p} $$
where $p>0$, $x_i>0$ and the harmonic mean corresponds to $p=1$ (up to a factor $n$). The equivalent version is
$$ \sum_{i=1}^n\frac{t^{p+1}}{x_i^p}\leq t$$
which leads to
$$
\begin{eqnarray}
|t|\leq x_i^{\frac{p}{p+1}} y_i^{\frac{1}{p+1}}, &\quad i=1,\ldots,n \\
\sum_i y_i = t. &
\end{eqnarray}
$$
Constraint (7) defines the three-dimensional power cone with exponents $(\frac{p}{p+1},\frac{1}{p+1})$. Conic models with the power cone will be expressible in Mosek 9, see https://themosekblog.blogspot.dk/2018/01/version-9-roadmap_31.html.


Why we use projection to measure error


In classic nonlinear optimization you typically measure the error of a constraint,

$$
\label{eq:nlpfunc}
f(x) \leq 0,
$$

by the value $\max(0, f(x))$. The problem with this measure is that reformulations such as $1 000 000 f(x) \leq 0$, or $f(x)^3 \leq 0$, or even $\exp(f(x)) - 1 \leq 0$ show very different errors for the exact same point. This makes it highly representation dependent. In fact, in this measure, the inevitable error introduced by rounding the true algebraic solution $x^*$ to its nearest floating-point representation even depends on the rate-of-change in $f(x)$. As example, this error grows significantly with higher values of $x^*$ in $\exp(x)$, as well as with lower positive values of $x^*$ in $1/x^2$.

In conic optimization the nonlinearities you model are independent of representation. This is because you only ever constrain variables to a cone, i.e., a set of points. This allows us to change and tune internal representations for optimal performance. Correspondingly, we also want the error reports you receive from MOSEK to be independent of representation. Our choice of projection, being the minimum distance from point to cone, is a widely known, computationally inexpensive and mathematically well-defined concept that fits the glove. This definition, however, does not always fit the intuitive understanding of error as captured by the following question we sometimes get:

Q: The result $\hat x = (0, 1{\rm e}18, 1{\rm e}3)$ fails to satisfy $2 x_1 x_2 \geq x_3^2$; the left is zero and the right is a million! Why does MOSEK claim feasibility when the conic domain $x \in \mathcal{Q}_r^3$ is so clearly violated?

A: Note that $x^* = (5{\rm e-}13, 1{\rm e}18, 1{\rm e}3)$ is feasible, showing that the projection distance to the conic domain is less than $\| x^* - \hat x\|_2 = 5{\rm e-}13$. This is a small violation in floating-point computations. In the particular case of $\hat x$, we may reformulate the constraint to $x_1 \geq \frac{x_3^2}{2x_2}$ and obtain a comparable error measure by direct comparison of left and right.

If you wonder why there are no feasibility tolerances on conic domains, the answer is simple. They are satisfied symbolically in all computations, even if the interior-point solver is terminated before convergence. The small violations you may see are due to floating-point arithmetic.

Finally, if you want to know more about projection theory, computation and how we use it in MOSEK, we invite you to Henrik's session at ISMP 2018.

Happy Pi Day

Code:

from mosek.fusion import *
from math import sqrt

def pi(n):
  M = Model()
  x = M.variable(n, Domain.inQCone())
  M.constraint(Expr.mulElm( range(1,n), x.slice(1,n) ),
               Domain.greaterThan(sqrt(6)))
  M.objective(ObjectiveSense.Minimize, x.index(0))
  M.solve()
  return M.primalObjValue()

print pi(100)
print pi(1000)
print pi(10000)
print pi(100000)
print pi(200000)


Output:

3.13198074852
3.14063710035
3.14149715507
3.14158310484
3.14158824714


Hint: 

$\sum_{i=1}^\infty\frac{1}{i^2}=\frac{\pi^2}{6}$.

Friday, March 9, 2018

MATLAB Computational Finance Conference 2018

MOSEK is one of the exhibitors at the MATLAB Computational Finance Conference 2018 in London on the 24th May. 

The event will include keynote presentations from The Bank of England and MathWorks as well as sessions with leading financial practitioners and academic institutions.

For details, speakers and registration see





Friday, March 2, 2018

ISMP 2018

MOSEK is one of the sponsors of the 23rd International Symposium on Mathematical Programming ISMP 2018 which takes place in Bordeaux on 1-6 July 2018.

There will be plenty of opportunities to meet us in Bordeaux. Here are our tentative talk titles:
  • Erling Andersen, MOSEK version 9
  • Joachim Dahl, Extending MOSEK with exponential cones
  • Henrik A. Friberg, Projection and presolve in MOSEK: exponential and power cones
  • Sven Wiese, Mixed-integer Conic Optimization in MOSEK in a session on Mixed-integer Conic Optimization Sven is organizing.
  • Michał Adamaszek, Exponential cone in MOSEK: overview and applications
Stay tuned for more details closer to the date.

Monday, February 5, 2018

Modern Optimization in Energy - Summer School

DTU CEE Summer School 2018 Modern Optimization in Energy will be held June 24-29, 2018 at the Technical University of Denmark. This is the third edition of the summer school. See list of speakers, program and how to apply (until March 18th).

Just like last year MOSEK is one of the sponsors of the meeting and of two MOSEK scholarships for outstanding student participants.

Wednesday, January 31, 2018

Version 9 roadmap

MOSEK version 9 is scheduled for release some time during 2018. In this blog post we will outline some of the expected changes. Please note this is subject to change and hence the blogpost will be updated occasionally.

The major new feature in version 9 is support for exponential cone

$$x\geq y\exp(z/y)$$

and power cone

$$x^\alpha y^{1-\alpha}\geq |z|$$

This has the implication that almost any practical convex optimization problem can be formulated directly on conic form. That includes modeling functions such as $x^\alpha$, $x^{1.5}$, $\log{x}$, $e^x$, relative entropy, $p$-norm, logistic, Kullback–Leibler divergence, Cobb-Douglas etc.

Mixed-integer optimization will be available for all conic problems without semi-definite matrix variables. Hence, version 9 should be able to handle almost all mixed-integer optimization problems appearing in practice.

An advantage of moving to the conic form is that it become possible for MOSEK to solve the dual problem if deemed worthwhile. This sometimes leads to dramatic speed ups.

The general nonlinear convex optimizer will be dropped, since optimization problems in conic form are preferred for efficiency and stability reasons. Quadratic and quadratically constrained problems remain unaffected.

Below the planned changes are listed in some detail. Please send us an e-mail to support@mosek.com if you have questions or comments.


Feature changes

  • Adding support for the exponential and power cones in all interfaces.
  • Adding mixed-integer support for new cone types.
  • Dropping support for general convex optimization problems.


Interface changes

OPTIMIZER API
  • Remove SCopt, DGopt, EXPopt which depended on the general nonlinear optimizer. There will be a drop-in replacement for SCopt which converts current data format into conic form.
R
  • Remove scopt which depended on the general nonlinear optimizer.
TOOLBOX FOR MATLAB
  • Remove mskenopt, mskscopt and mskgpopt which depended on the general nonlinear optimizer.
FUSION
  • Fusion for MATLAB will be dropped due to technical issues and low demand. 
  • Improved performance.


Other changes

  • Vastly improved performance on the AMD Ryzen family of CPUs. 
  • New version of the Modeling Cookbook adapted to new conic features.
  • Dropped support reading and writing XML formatted files.