Handling Non-Linearities with Gurobi

Gurobi Live Barcelona

Pierre Bonami

Agenda

Introduction
Mixed-Integer Convex Quadratic Optimiziation
Non-Convex Quadratic Optimization
General Functions, piece-wise linear and beyond

LP and MIP

Base problems that Gurobi solves
Simplex and Barrier algorithms for LP
Branch-and-cut for MIP

Nonlinearities

Application	Phenomenon	Nonlinearity
Finance	risk	quadratic (convex)
Truss topology	physical forces	quadratic (convex)
Pooling (petrochemical, mining, agriculture)	mixing products	quadratic non-convex
electricity distribution (ACOPF)	Alternative Current	\(\sin\) and \(\cos\) (can be made quadratic)
machine learning	—	logistic function, \(\tanh\)
chemical engineering	chemical reactions	—

many more…

The MINLP Goal

Ideally, we seek to solve \[ \begin{aligned} & \min f(x)\\ & \text{s.t:}\\ & g_i(x) \le 0, i = 1,\ldots, m \\ & x_j \in \mathbb Z, j \in \mathcal I\\ & l \le x \le u \end{aligned} \]

\(f: \mathbb R^n \rightarrow \mathbb R\), \(g_i: \mathbb R^n \rightarrow \mathbb R\), reasonably smooth
If \(l\) or \(u\) is not finite: undecidable in general
Even assuming finiteness it’s very difficult in theory and practice
Note that integer variables are “not convex” and can be represented as a polynomial if bounded

Convex or not convex?

Any segment connecting two points inside the region is inside the region.

There exists two points in the region the segment connecting them is not completely in the region.

Why?

Optimization “easy” (usually)
- Start from any point in the region
- Take steps inside the region improving objective
- When there is no more step global optimum
Interior point methods for any closed convex region
Simplex algorithm for polyhedra

Optimization “hard”
- Start from any point in the region
- Take steps inside the region improving objective
- When there is no more step local optimum is reached
Need a divide-and-conquer algorithm to find a global optimum.

Nonlinearities in Gurobi

Convex

Quadratic objective: \(\min c^T x + x^T Q x\) with \(Q \succeq 0\)
Quadratic constraints: \(a^T x + x^T Q x \le b\) describing a convex region
- \(Q \succeq 0\) is a simple case, can be more complex

Non-Convex

Discrete objects: integer variables, SOS constraints
Bilinear terms: \(z_{ij} = x_i x_j\)
- Non-Convex quadratic forms: \(a^T x + x^T Q x \le b\)
General functions: \(\exp\), \(\log\), \(\cos\),
- Reformulated as PWL in Gurobi 9 and 10
- Treated directly

New in Gurobi 11

Agenda

Introduction
Mixed-Integer Convex Quadratic Optimization
Non-Convex Quadratic Optimization
General Functions, piece-wise linear and beyond

Problem definition

\[ \begin{aligned} & \min c^T x + x^T Q^0 x\\ & \text{s.t:}\\ & {a_k}^T x + x^T Q^k x \le b_k, k = 1,\ldots, m\\ & x_j \in \mathbb Z, j \in \mathcal I\\ & l \le x \le u \end{aligned} \]

\(Q^0, Q^1,\ldots, Q^m\) are assumed to be symmetric
- \(Q^0\) is positive semi definite
- The quadratic forms \(a_k^T + x^T Q^k x - b_k\) are second order cone representable.

The second order cone

\(\mathcal L^n = \{x \in \mathbb R^{n+1}: \sum_{j=1}^n x_j^2 \le x_0^2, x_0 \ge 0 \}\)

Through simple algebra, can be represented as SOC:

\(\sum_{i=1}^n x_i^2 \le x_0^2\), with \(x_0\ge 0\)
\(\sum_{i=2}^n x_i^2 \le x_0 x_1\), with \(x_0, x_1 \ge 0\) (rotated SOC)
\(a^T x + x^T Q x \le b\), with \(Q \succeq 0\)
\(x^T Q x \le y^2\), with \(Q \succeq 0, y \ge 0\)

Very powerful but modeling sometimes far from obvious.

Not all forms recognized by solvers

New in Gurobi 11

Note

Our barrier algorithm needs everything converted
to SOC or linear by presolve.

The basic branch-and-bound algorithm

flowchart TD
  A((c* = 0.5,  <br> x*=0.3))
  A -- "x ≤ 0" --> B((c* = 0.6,  <br> z*=2.2))
  A -->|"x ≥ 1"| C((c* = 0.8,  <br> y*=1.6))
  B -->|"z ≤ 2"| D((c* = 2))
  B -->|"z ≥ 3"| E(("c* = +∞"))
  C -->|"y ≤ 1"| F((c* = 2.1))
  C -->|"y ≥ 2"| G((c* = 1.3 <br> t* = 0.5))
  G -->|"t ≤ 0"| H((c* = ?))
  G -->|"t ≥ 1"| I((c* = ?))

  class A branchnode
  class B branchnode
  class C branchnode
  class D intfeasnode
  class E prunenode
  class F prunenode
  class G branchnode

  classDef intfeasnode stroke:#0f0
  classDef prunenode stroke:#f00
  classDef branchnode stroke:#00f

At each node of the tree:

flowchart TD
A[Solve continuous relaxation]
A --> B[Integer feasible?]
B -- no --> C[Branch]
B -- yes -->F[New solution? <br> Update best known]

In MILP and MIQP continuous relaxation usually solved by simplex.

In MISOCP/MIQCP, continuous relaxation solved by barrier.

The outer approximation cut

Let \(C = \{g(x) \le b: x \in R^n\}\), with \(g\) a convex function
For any \(x^* \in \mathbb R^n\), the constraint: \[\nabla g(x^*) (x - x^*) + g(x^*) \le 0 \] is valid
If \(x^* \not \in C\), it cuts \(x^*\): \[ \nabla g(x^*) (x^* - x^*) + g(x^*) > 0 \]

Outer approximation branch-and-cut

flowchart TD
  A((c* = 0.5,  <br> x*=0.3))
  A -- "x ≤ 0" --> B((c* = 0.6,  <br> z*=2.2))
  A -->|"x ≥ 1"| C((c* = ?))
  B -->|"z ≤ 2"| D((c* = 2 <br> SOCP?))
  B -->|"z ≥ 3"| E((c* = ?))
  D -.-> H((c* = ?))
  D -.-> I((c* = ?))

  class A branchnode
  class B branchnode
  class D intfeasnode

  classDef intfeasnode stroke:#0f0
  classDef prunenode stroke:#f00
  classDef branchnode stroke:#00f

Drop quadratic constraints and solve an LP relaxation at each node.

Integer feasible nodes are not necessarily solutions.

flowchart TD
A[Solve LP relaxation]
A --> B[Integer feasible?]
B -- no --> C[Branch]
B -- yes --> D[SOCP Feasible?]
D -- yes -->F[New solution? <br> Update best known]
D -- no -->E[Generate OA cuts]

E --> A

Numerical difficulties

OA branch-and-cut builds a cutting plane approximation of smooth functions
It can happen that node solution:
- is integer feasible
- is not SOC feasible
- OA cuts are not cutting enough

New in Gurobi 11

Rely on barrier algorithm for those nodes (usually very few).

flowchart TD
A[Solve LP relaxation]
A --> B[Integer feasible?]
B -- no --> C[Branch]
B -- yes --> D[SOCP Feasible?]
D -- yes -->F[New solution? <br> Update best known]
D -- no -->E[Generate OA cuts]

E -- "no cutting!" --> A

linkStyle 0,2,4,5 stroke:#dd2113,stroke-width:4px;

Cone disaggregation and outer approximation

An exponential number of cutting planes is needed to approximate a convex quadratic form.

Cone disaggregation

From \[ \sum_{i = 1}^n x_i^2 \le x_0^2, x_0 \ge 0 \]

Create variables \(y_i \ge 0\), such that \(x_i^2 \le y_i x_0\) (rotated SOC)
Replace initial constraint with \(\sum_{i = 1}^n y_i \le x_0\)

Pitfalls of disaggregation

\[ (A) \left\{ \begin{aligned} &\sum_{i = 1}^n x_i^2 \le x_0^2,\\ &x_0 \ge 0 \end{aligned} \right. \]

\[ (B) \left\{ \begin{aligned} & \sum_{i = 1}^n y_i \le x_0\\ & x_i^2 \le y_i x_0 & & i = 1, \ldots, n\\ & y \ge 0, x \ge 0 \end{aligned} \right. \]

The reformulation is correct: every solution of \((B)\) translates to \((A)\)
But a solution of \((B)\) with a small infeasibility can have a large one in \((A)\):
Suppose \(x_0 = 1\), \(y_i = \frac{1}{n}\) and \(x_i = \sqrt{\frac{1}{n} + \epsilon}\):
- Infeasibility in \((B)\) is \(\epsilon\)
- Infeasibility in \((A)\) is \(n \cdot \epsilon\)

Gurobi tries to deal with it but can be an issue.

Options for MISOCP/MIQCQP

MIQCPMethod

\(-1\) Automatic choice (default)
\(0\) Use QCP branch-and-bound
\(1\) Use Outer Approximation

PreMIQCPForm

\(-1\) Automatic choice (default)
\(0\) Leave the model as is (for B&B)
\(1\) Reformulate to SOC
\(2\) Reformulate to SOC and disaggregate

Example

Portfolio Optimization

Agenda

Introduction
Mixed-Integer Convex Quadratic Optimization
Non-Convex Quadratic Optimization
General Functions, piece-wise linear and beyond

Stepping into a non-convex world

Non-Convex MIQCQP

\[ \begin{aligned} & \min c^T x + x^T Q^0 x\\ & \text{s.t:}\\ & {a_k}^T x + x^T Q^k x \le b_k, & &k = 1,\ldots, m\\ & x_j \in \mathbb Z, j \in \mathcal I\\ & l \le x \le u \end{aligned} \]

\(Q^0, Q^1,\ldots, Q^m\) are assumed to be symmetric
Continuous relaxation is NP-hard!
Solution strategy:
- Build a convex relaxation
- Refine it through branching.

Note

For simplicity sake, consider model without integer variables, in remainder.

NonConvex parameter in Gurobi

NonConvex

\(-1\) automatic (default)
\(0\) Return error if original model has non-convex Q objective or constraints
\(1\) Return error if presolved model has non-convex Q that cannot be linearized
\(2\) Accept non-convex Q by building a bilinear formulation

New in Gurobi 11

Default behavior change: New default \(2\) (was \(1\))).

Bilinear formulation

For each product \(x_i x_j\) in the model
- Introduce a new variable \(z_{ij}\)
- Add the bilinear constraint \(z_{ij} = x_i x_j\)
- Replace product with \(z_{ij}\)

\[ \begin{aligned} & \min c^T x + \left\langle Q^0, Z \right\rangle\\ & \text{s.t:}\\ & {a_k}^T x + \color{red}{\left\langle Q^k , Z \right\rangle} \le b_k, & &k = 1,\ldots, m\\ & \color{red}{Z = x x^T} \\ & l \le x \le u\\ \\ &(\left\langle Q , Z \right\rangle = \sum_i \sum_j q_{ij} z_{ij}) \end{aligned} \]

More details on bilinear formulation

Try as much as possible to avoid creating bilinear terms:
- if at least one variable is fixed
- if at least one variable is binary (can be reformulated)
- square of binary \(x^2 = x\)
- square term \(q_{ii} x^2\) with \(q_{ii} > 0\) is convex
If \(x_i x_j\) always appears in inequalities with \(q_{ij}\) of same sign relax to:
- \(z_{ij} \ge x_i x_j\), if \(q_{ij} > 0\)
- \(z_{ij} \le x_i x_j\), if \(q_{ij} < 0\)

Warning

Similar to disaggregation, infeasibilities in bilinear formulation can be larger than in the original model

Bilinear relaxation

Relax non-convex constraint \(Z=x x^T\) using convex envelopes. \[ \begin{aligned} & \min c^T x + \left\langle Q^0, Z \right\rangle\\ & \text{s.t:}\\ & {a_k}^T x + \left\langle Q^k , Z \right\rangle \le b_k, & &k = 1,\ldots, m\\ & \color{red}{z^-(x_i, x_j) \le z_{ij} \le z^+(x_i, x_j)}\\ & \color{red}{l \le x \le u}\\ \end{aligned} \]

Convex envelopes: parabola

Consider the square case: \(z = x^2\)

It is convex: \[ z^-(x_i, x_i) = x_i^2 \]

Can be dealt with by OA.

\(z^+(x_i, x_i)\) is given by the secant: \[ z^+_{ii} = (u + l) x_i - l \cdot u \]

Convex envelopes: products (McCormick)

\[ z^-_{ij} = \max \left\{ \begin{aligned} & 𝑙_j x_i+ 𝑙_i 𝑥_j - 𝑙_i 𝑙_j \\ & 𝑢_j x_i + 𝑢_i x_j - u_i u_j \end{aligned} \right\} \le z_{ij} \le z^+_{ij} = \min \left\{ \begin{aligned} & 𝑙_j x_i+ u_i 𝑥_j - u_i l_j \\ & 𝑢_j x_i + l_i x_j - l_i u_j \end{aligned} \right\} \]

Spatial branching

Let \((x^*, z^*)\) be the solution of the bilinear relaxation
If not integer feasible, can branch on an integer variable
Otherwise:
- If \(z^*_{ij} = x^*_i x^*_j\), for all bilinear term, we have a solution
- Otherwise refine our bilinear relaxation:
  - Pick \(x_i\) or \(x_j\) s.t. \(z^*_{ij} \ne x^*_i x^*_j\)
  - Create two child nodes with \(x_i \le x^*_i\) and \(x_i \ge x^*_i\)
  - Refine bilinear relaxation in the two nodes

Other techniques targeted at non-convex MIQCQP

RLTCuts
- Reformulation Linearization Technique (Sherali and Adams, 1990)
- Multiply linear constraint by a variable, linearize resulting products
BQPCuts
- Facets of the Boolean Quadratic Polytope (Padberg 1989)
- Clique cuts from the paper
SDPCuts
- Relax \(Z=x x^T\) to \(Z \succeq x x^T\), and outer approximate resulting cone
OBBT:
- Optimization based bound tightening
- Infer tighter bound on variables involved in products by LP.

Non-convex MIQCQP performance history

New in Gurobi 11

Improved recognition of convexity
Branching improvements
- Strong branching for bilinear terms
- Better choice for deciding to branch on an integer or bilinear

Example solve

Pooling problem from MINLPLIB

m.optimize()

Gurobi Optimizer version 11.0.0 build v11.0.0beta2 (mac64[x86] - macOS 13.6.1 22G313)

CPU model: Intel(R) Core(TM) i5-1038NG7 CPU @ 2.00GHz
Thread count: 4 physical cores, 8 logical processors, using up to 8 threads

Optimize a model with 662 rows, 403 columns and 2229 nonzeros
Model fingerprint: 0x883de6ff
Model has 70 quadratic constraints
Variable types: 295 continuous, 108 integer (108 binary)
Coefficient statistics:
  Matrix range     [2e-03, 1e+03]
  QMatrix range    [1e+00, 1e+00]
  QLMatrix range   [1e+00, 1e+00]
  Objective range  [1e+00, 5e+00]
  Bounds range     [1e+00, 1e+03]
  RHS range        [1e+00, 1e+03]
Presolve removed 188 rows and 93 columns
Presolve time: 0.01s
Presolved: 754 rows, 310 columns, 2283 nonzeros
Presolved model has 70 bilinear constraint(s)
Variable types: 218 continuous, 92 integer (92 binary)

Root relaxation: objective 1.262419e+04, 542 iterations, 0.01 seconds (0.01 work units)

    Nodes    |    Current Node    |     Objective Bounds      |     Work
 Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

     0     0 12624.1926    0  101          - 12624.1926      -     -    0s
     0     0 10416.0029    0  111          - 10416.0029      -     -    0s
     0     0 10405.6081    0  108          - 10405.6081      -     -    0s
     0     0 10299.1966    0  104          - 10299.1966      -     -    0s
     0     0 10299.1966    0  104          - 10299.1966      -     -    0s
     0     0 10299.1833    0  125          - 10299.1833      -     -    0s
     0     0 10299.1828    0  109          - 10299.1828      -     -    0s
     0     0 10299.0296    0  125          - 10299.0296      -     -    0s
     0     0 10299.0195    0  112          - 10299.0195      -     -    0s
     0     0 10299.0195    0  115          - 10299.0195      -     -    0s
     0     0 10298.8700    0   84          - 10298.8700      -     -    0s
     0     2 10298.8700    0   84          - 10298.8700      -     -    0s
* 1064   307             157    10246.216735 10292.1669  0.45%  18.2    0s
H 1139   270                    10246.219957 10291.0140  0.44%  17.7    0s

Cutting planes:
  Cover: 3
  Implied bound: 18
  Clique: 1
  MIR: 26
  Flow cover: 33
  Flow path: 9
  GUB cover: 1
  Inf proof: 3
  RLT: 47

Explored 1392 nodes (24192 simplex iterations) in 0.84 seconds (0.77 work units)
Thread count was 8 (of 8 available processors)

Solution count 2: 10246.2 10246.2 

Optimal solution found (tolerance 1.00e-04)
Best objective 1.024621995655e+04, best bound 1.024621995655e+04, gap 0.0000%

Agenda

Introduction
Mixed-Integer Convex Quadratic Optimization
Non-Convex Quadratic Optimization
General Functions, piece-wise linear and beyond

Function constraints in Gurobi

Since Gurobi 9.0. Allow to state \(𝑦=𝑓(𝑥)\)

\(𝑓\) is a predefined function
\(𝑦\) and \(𝑥\) are one-dimensional variables

Library of predefined functions include:

\(e^x\), \(a^x\), \(\ln(x)\), \(\log_a(x)\), logistic
\(\sin(x)\), \(\cos(x)\), \(\tan(x)\),
monomials \(x^a\), polynomials of one variable \(a_0 + a_1 x^1 + a_2 x^2 + ...\)

Example:

m = gp.Model()
x = m.addVar()
y = m.addVar()
m.addGenConstrLog(x, y)

New treatment in Gurobi 11

Gurobi 9.0-10.0: nonlinear functions replaced during presolve by a piece-wise linear approximation.
Gurobi 11, can treat nonlinear functions directly:
- Set FuncNonLinear=1
- No other changes to users’ code

%%{ init: { 'flowchart': { 'curve': 'linear' } } }%%
flowchart TD
        IV[Integer variables <br> SOS Constraints]
        GF["General function <br> y = f(x)"]

        GF--"Gurobi ≤ 10"--- IV

        MILP[MILP]
            IV --- MILP

        MINLP[MINLP]
            GF --"Gurobi 11 <br> FuncNonLinear=1"---- MINLP

        BB(branch-and-cut)
         MILP --- BB

        SBB(spatial b&b)
            MINLP --- SBB


    class Model subg
    class Convex subg
    class Nonconvex subg
    class Problems subg
    class Algorithms subg

    classDef subg fill:#FFFFFF;
    classDef difficult stroke:#dd2113
    classDef easy stroke:#1675a9

Algorithmic approach

Similar to bilinear formulation/relaxation
For each function, compute lower/upper envelope
Spatial branching to refine them
Additional difficulties:
- detect when functions are convex/concave
- functions can be locally convex

Great powers and great responsabilities

Gurobi 11.0 handles select univariate nonlinear functions
But, those can be composed

E.g.: Consider for \(x\ge 0\): \[ f(x) = \sqrt{1 + x^2} + \ln({x + \sqrt{1 + x^2}}) \le 2 \] We can formulate it:

introduce auxiliary variables \(u, v, w, z \ge 0\)
add constraints \(u = 1 + x^2\), \(v = \sqrt{u}\), \(w = x + v\), \(z = \ln({w})\)
\(f(x) = v + z\)

Caution

Feasibility tolerances!

Decomposition leading to large infeasibility

\[ y = f(x) = \frac{x}{\sin (x)} \] a solution is \(x= 0.0001\), \(y = 1.0000000016666666\).

Now decompose \(f(x)\): \(u = sin(x)\), \(v = \frac{1}{u}\), \(y = x \cdot v\).

And consider:

\(\overline x = 0.0001\)
\(\overline u = 0.000098999999833333343\) (\(\sin (\overline x) - 10^-6\))
\(\overline v = 10101.010118015167\)
\(\overline y = 1.0101010118015167\)

😱

We have \(\overline y - f(\overline x) \approxeq 10^-2\)

Example

Logistic Regression