Clearly we are glad they appreciate somehow our product, but also we feel challenged in tackling the problem itself. MOSEK is not specialized for OLS-LASSO, but still it can be very efficient we thought, especially in dual form. Moreover, we are always eager to find good test problems, for the core solver but also for our APIs.
The OLS-LASSO problem
So let's start from the formulation given in the challenge, where we are given $T$ samples in dimension $m$:\begin{equation}\label{main}\tag{OLS-LASSO}
\begin{aligned}
\min_{u,b} & \frac{1}{2} || u ||^2 + \sum_{j=1}^{m} \lambda_j |b_j|\\
&\\
s.t.&\\
& u = y - X b
\end{aligned}
\end{equation}
where $b,\lambda\in \mathbb{R}^m, u,y\in \mathbb{R}^T$. We reformulate problem \ref{main} so to have a linear objective function:
\begin{equation*}
\begin{aligned}
\min_{u,b,z,w} &\quad z + \lambda^T w\\
&\\
s.t.&\\
& u + X b = y\\
& s=1\\
& (s,z,u) \in Q_R^{T+2}\\
& w_i \geq |b_i|& \forall i=1,\ldots,m.
\end{aligned}
\end{equation*}
where $Q_R$ is the rotated cone. There are two way to to get rid of the absolute value:
(1) Linear constraints
\[ w_i\geq |b_i| \Rightarrow \left\{ \begin{array}{l} w \geq b \\ w \geq -b \\\end{array}\right.\]
(2) Second-order conic constraint
\[ w_i \geq |b_i| \Leftrightarrow w_i\geq \sqrt{b_i^2} \equiv (w_i,b_i)\in Q^2\]
First let's fix a partition of the variable set as $[s,z,u,w,b]$, and clearly states the basic vector and
matrices:
(1) cost vector: $c^T= [0 \quad 1 \quad 0_T \quad \lambda^T \quad 0_m]$;
(2) right-hand side: $b=^T[1 \quad y^T]^T$;
(2) matrix coefficients: $A= \left[ \begin{array}{ccccc} 1 & 0&0_T&0_m&0_m\\ 0&0&I_T& 0_m&X\end{array}\right]$;
We then introduce as many dual variables as primal linear constraints:
$\mu_0$ for the first constraint, and $\mu\in \mathbb{R}^T$ for the
second one. We also include additional slack variables $\gamma$
partitioned in the same way as the primal variables, i.e. $\gamma^T=[\gamma_s,\gamma_z,\gamma_u,\gamma_w,\gamma_b]$.
The dual problem is
\begin{equation*}
\begin{aligned}
\max_{\mu_0, \mu, \gamma} & \quad \mu_0 + y^T \mu \\
&\\
s.t.&\\
& c - A^T \left[ \begin{array}{c}\mu_0\\\mu\end{array} \right]= \gamma\\
& (\gamma_s,\gamma_z,\gamma_u) \in Q_R^{T+2}\\
& (\gamma_w,\gamma_b) \in \Pi_{i=1}^m Q^2.
\end{aligned}
\end{equation*}
After some calculations, we obtain:
\begin{equation}\tag{D}
\begin{aligned}
\max_{\mu_0,\mu,\gamma_z} & \quad \mu_0 + y^T \mu \\
&\\
s.t.&\\
& X^T \mu \leq \lambda\\
& \gamma_z= 1\\
& \left(-\mu_0, \gamma_z, -\mu \right) \in Q_R^{2+T}.
\end{aligned}
\end{equation}
Try to do the same with PL, you'll get the same model.
The results
We implement these three models using the MOSEK Fusion Python API and run a batch of tests for $m=[10,25,75,100]$ and $T$ from $10000$ to $100000$. To make it easy, $X,\lambda$ and $y$ are all uniformly generated in $[0,1]$. The code that interface with the model looks likeHere you can see the running time for problem PL in green, PQ in red and D in blue. In the first plot we fix $m$ and let $T$ varies:
and then if we fix $T$ either to $10000$ or $100000$ and we get
But it is also interesting to see how PQ is as good as PL. Red and green lines are almost indistinguishable. Only for larger $m$, i.e. when the number of cones increases, we can see a significant difference. Otherwise the difference is very very small.