Handling Non-Linearities with Gurobi

Gurobi Live Barcelona

Pierre Bonami

Agenda

  1. Introduction
  2. Mixed-Integer Convex Quadratic Optimiziation
  3. Non-Convex Quadratic Optimization
  4. General Functions, piece-wise linear and beyond

LP and MIP

  • Base problems that Gurobi solves
  • Simplex and Barrier algorithms for LP
  • Branch-and-cut for MIP

Nonlinearities

Application Phenomenon Nonlinearity
Finance risk quadratic (convex)
Truss topology physical forces quadratic (convex)
Pooling (petrochemical, mining, agriculture) mixing products quadratic non-convex
electricity distribution (ACOPF) Alternative Current \(\sin\) and \(\cos\) (can be made quadratic)
machine learning logistic function, \(\tanh\)
chemical engineering chemical reactions
  • many more…

The MINLP Goal

Ideally, we seek to solve \[ \begin{aligned} & \min f(x)\\ & \text{s.t:}\\ & g_i(x) \le 0, i = 1,\ldots, m \\ & x_j \in \mathbb Z, j \in \mathcal I\\ & l \le x \le u \end{aligned} \]

  • \(f: \mathbb R^n \rightarrow \mathbb R\), \(g_i: \mathbb R^n \rightarrow \mathbb R\), reasonably smooth
  • If \(l\) or \(u\) is not finite: undecidable in general
  • Even assuming finiteness it’s very difficult in theory and practice
  • Note that integer variables are “not convex” and can be represented as a polynomial if bounded

Convex or not convex?

Convex Region

Any segment connecting two points inside the region is inside the region.

Non-Convex Region

There exists two points in the region the segment connecting them is not completely in the region.

Why?

  • Optimization “easy” (usually)
    • Start from any point in the region
    • Take steps inside the region improving objective
    • When there is no more step global optimum
  • Interior point methods for any closed convex region
  • Simplex algorithm for polyhedra

  • Optimization “hard”
    • Start from any point in the region
    • Take steps inside the region improving objective
    • When there is no more step local optimum is reached
  • Need a divide-and-conquer algorithm to find a global optimum.

Nonlinearities in Gurobi

Convex

  • Quadratic objective: \(\min c^T x + x^T Q x\) with \(Q \succeq 0\)
  • Quadratic constraints: \(a^T x + x^T Q x \le b\) describing a convex region
    • \(Q \succeq 0\) is a simple case, can be more complex

Non-Convex

  • Discrete objects: integer variables, SOS constraints
  • Bilinear terms: \(z_{ij} = x_i x_j\)
    • Non-Convex quadratic forms: \(a^T x + x^T Q x \le b\)
  • General functions: \(\exp\), \(\log\), \(\cos\),
    • Reformulated as PWL in Gurobi 9 and 10
    • Treated directly

New in Gurobi 11

Agenda

  1. Introduction
  2. Mixed-Integer Convex Quadratic Optimization
  3. Non-Convex Quadratic Optimization
  4. General Functions, piece-wise linear and beyond

Problem definition

\[ \begin{aligned} & \min c^T x + x^T Q^0 x\\ & \text{s.t:}\\ & {a_k}^T x + x^T Q^k x \le b_k, k = 1,\ldots, m\\ & x_j \in \mathbb Z, j \in \mathcal I\\ & l \le x \le u \end{aligned} \]

  • \(Q^0, Q^1,\ldots, Q^m\) are assumed to be symmetric
    • \(Q^0\) is positive semi definite
    • The quadratic forms \(a_k^T + x^T Q^k x - b_k\) are second order cone representable.

The second order cone

\(\mathcal L^n = \{x \in \mathbb R^{n+1}: \sum_{j=1}^n x_j^2 \le x_0^2, x_0 \ge 0 \}\)

Through simple algebra, can be represented as SOC:

  • \(\sum_{i=1}^n x_i^2 \le x_0^2\), with \(x_0\ge 0\)
  • \(\sum_{i=2}^n x_i^2 \le x_0 x_1\), with \(x_0, x_1 \ge 0\) (rotated SOC)
  • \(a^T x + x^T Q x \le b\), with \(Q \succeq 0\)
  • \(x^T Q x \le y^2\), with \(Q \succeq 0, y \ge 0\)

Very powerful but modeling sometimes far from obvious.

Not all forms recognized by solvers

New in Gurobi 11

Note

Our barrier algorithm needs everything converted
to SOC or linear by presolve.

The basic branch-and-bound algorithm

flowchart TD
  A((c* = 0.5,  <br> x*=0.3))
  A -- "x ≤ 0" --> B((c* = 0.6,  <br> z*=2.2))
  A -->|"x ≥ 1"| C((c* = 0.8,  <br> y*=1.6))
  B -->|"z ≤ 2"| D((c* = 2))
  B -->|"z ≥ 3"| E(("c* = +∞"))
  C -->|"y ≤ 1"| F((c* = 2.1))
  C -->|"y ≥ 2"| G((c* = 1.3 <br> t* = 0.5))
  G -->|"t ≤ 0"| H((c* = ?))
  G -->|"t ≥ 1"| I((c* = ?))

  class A branchnode
  class B branchnode
  class C branchnode
  class D intfeasnode
  class E prunenode
  class F prunenode
  class G branchnode

  classDef intfeasnode stroke:#0f0
  classDef prunenode stroke:#f00
  classDef branchnode stroke:#00f

At each node of the tree:

flowchart TD
A[Solve continuous relaxation]
A --> B[Integer feasible?]
B -- no --> C[Branch]
B -- yes -->F[New solution? <br> Update best known]

In MILP and MIQP continuous relaxation usually solved by simplex.

In MISOCP/MIQCP, continuous relaxation solved by barrier.

The outer approximation cut

  • Let \(C = \{g(x) \le b: x \in R^n\}\), with \(g\) a convex function
  • For any \(x^* \in \mathbb R^n\), the constraint: \[\nabla g(x^*) (x - x^*) + g(x^*) \le 0 \] is valid
  • If \(x^* \not \in C\), it cuts \(x^*\): \[ \nabla g(x^*) (x^* - x^*) + g(x^*) > 0 \]

Outer approximation branch-and-cut

flowchart TD
  A((c* = 0.5,  <br> x*=0.3))
  A -- "x ≤ 0" --> B((c* = 0.6,  <br> z*=2.2))
  A -->|"x ≥ 1"| C((c* = ?))
  B -->|"z ≤ 2"| D((c* = 2 <br> SOCP?))
  B -->|"z ≥ 3"| E((c* = ?))
  D -.-> H((c* = ?))
  D -.-> I((c* = ?))

  class A branchnode
  class B branchnode
  class D intfeasnode

  classDef intfeasnode stroke:#0f0
  classDef prunenode stroke:#f00
  classDef branchnode stroke:#00f

Drop quadratic constraints and solve an LP relaxation at each node.

Integer feasible nodes are not necessarily solutions.

flowchart TD
A[Solve LP relaxation]
A --> B[Integer feasible?]
B -- no --> C[Branch]
B -- yes --> D[SOCP Feasible?]
D -- yes -->F[New solution? <br> Update best known]
D -- no -->E[Generate OA cuts]

E --> A

Numerical difficulties

  • OA branch-and-cut builds a cutting plane approximation of smooth functions
  • It can happen that node solution:
    • is integer feasible
    • is not SOC feasible
    • OA cuts are not cutting enough

New in Gurobi 11

Rely on barrier algorithm for those nodes (usually very few).

flowchart TD
A[Solve LP relaxation]
A --> B[Integer feasible?]
B -- no --> C[Branch]
B -- yes --> D[SOCP Feasible?]
D -- yes -->F[New solution? <br> Update best known]
D -- no -->E[Generate OA cuts]

E -- "no cutting!" --> A

linkStyle 0,2,4,5 stroke:#dd2113,stroke-width:4px;

Cone disaggregation and outer approximation

An exponential number of cutting planes is needed to approximate a convex quadratic form.

Cone disaggregation

From \[ \sum_{i = 1}^n x_i^2 \le x_0^2, x_0 \ge 0 \]

  • Create variables \(y_i \ge 0\), such that \(x_i^2 \le y_i x_0\) (rotated SOC)
  • Replace initial constraint with \(\sum_{i = 1}^n y_i \le x_0\)

Pitfalls of disaggregation

\[ (A) \left\{ \begin{aligned} &\sum_{i = 1}^n x_i^2 \le x_0^2,\\ &x_0 \ge 0 \end{aligned} \right. \]

\[ (B) \left\{ \begin{aligned} & \sum_{i = 1}^n y_i \le x_0\\ & x_i^2 \le y_i x_0 & & i = 1, \ldots, n\\ & y \ge 0, x \ge 0 \end{aligned} \right. \]

  • The reformulation is correct: every solution of \((B)\) translates to \((A)\)
  • But a solution of \((B)\) with a small infeasibility can have a large one in \((A)\):
  • Suppose \(x_0 = 1\), \(y_i = \frac{1}{n}\) and \(x_i = \sqrt{\frac{1}{n} + \epsilon}\):
    • Infeasibility in \((B)\) is \(\epsilon\)
    • Infeasibility in \((A)\) is \(n \cdot \epsilon\)

Gurobi tries to deal with it but can be an issue.

Options for MISOCP/MIQCQP

MIQCPMethod

  • \(-1\) Automatic choice (default)
  • \(0\) Use QCP branch-and-bound
  • \(1\) Use Outer Approximation

PreMIQCPForm

  • \(-1\) Automatic choice (default)
  • \(0\) Leave the model as is (for B&B)
  • \(1\) Reformulate to SOC
  • \(2\) Reformulate to SOC and disaggregate

Example

Portfolio Optimization

Agenda

  1. Introduction
  2. Mixed-Integer Convex Quadratic Optimization
  3. Non-Convex Quadratic Optimization
  4. General Functions, piece-wise linear and beyond

Stepping into a non-convex world

Non-Convex MIQCQP

\[ \begin{aligned} & \min c^T x + x^T Q^0 x\\ & \text{s.t:}\\ & {a_k}^T x + x^T Q^k x \le b_k, & &k = 1,\ldots, m\\ & x_j \in \mathbb Z, j \in \mathcal I\\ & l \le x \le u \end{aligned} \]

  • \(Q^0, Q^1,\ldots, Q^m\) are assumed to be symmetric
  • Continuous relaxation is NP-hard!
  • Solution strategy:
    • Build a convex relaxation
    • Refine it through branching.

Note

For simplicity sake, consider model without integer variables, in remainder.

NonConvex parameter in Gurobi

NonConvex

  • \(-1\) automatic​ (default)
  • \(0\) Return error if original model has non-convex Q objective or constraints​
  • \(1\) Return error if presolved model has non-convex Q that cannot be linearized​
  • \(2\) Accept non-convex Q by building a bilinear formulation

New in Gurobi 11

Default behavior change: New default \(2\) (was \(1\))).

Bilinear formulation

  • For each product \(x_i x_j\) in the model
    • Introduce a new variable \(z_{ij}\)
    • Add the bilinear constraint \(z_{ij} = x_i x_j\)
    • Replace product with \(z_{ij}\)

\[ \begin{aligned} & \min c^T x + \left\langle Q^0, Z \right\rangle\\ & \text{s.t:}\\ & {a_k}^T x + \color{red}{\left\langle Q^k , Z \right\rangle} \le b_k, & &k = 1,\ldots, m\\ & \color{red}{Z = x x^T} \\ & l \le x \le u\\ \\ &(\left\langle Q , Z \right\rangle = \sum_i \sum_j q_{ij} z_{ij}) \end{aligned} \]

More details on bilinear formulation

  • Try as much as possible to avoid creating bilinear terms:
    • if at least one variable is fixed
    • if at least one variable is binary (can be reformulated)
    • square of binary \(x^2 = x\)
    • square term \(q_{ii} x^2\) with \(q_{ii} > 0\) is convex
  • If \(x_i x_j\) always appears in inequalities with \(q_{ij}\) of same sign relax to:
    • \(z_{ij} \ge x_i x_j\), if \(q_{ij} > 0\)
    • \(z_{ij} \le x_i x_j\), if \(q_{ij} < 0\)

Warning

Similar to disaggregation, infeasibilities in bilinear formulation can be larger than in the original model

Bilinear relaxation

  • Relax non-convex constraint \(Z=x x^T\) using convex envelopes. \[ \begin{aligned} & \min c^T x + \left\langle Q^0, Z \right\rangle\\ & \text{s.t:}\\ & {a_k}^T x + \left\langle Q^k , Z \right\rangle \le b_k, & &k = 1,\ldots, m\\ & \color{red}{z^-(x_i, x_j) \le z_{ij} \le z^+(x_i, x_j)}\\ & \color{red}{l \le x \le u}\\ \end{aligned} \]

Convex envelopes: parabola

Consider the square case: \(z = x^2\)

\(z ≥ x^2\)

It is convex: \[ z^-(x_i, x_i) = x_i^2 \]

Can be dealt with by OA.

\(z \le x^2, \, -1 \le x \le 1.5\)

\(z^+(x_i, x_i)\) is given by the secant: \[ z^+_{ii} = (u + l) x_i - l \cdot u \]

Convex envelopes: products (McCormick)

Lower envelope \(z^-_{ij}\)

Upper envelope \(z^+_{ij}\)

\[ z^-_{ij} = \max \left\{ \begin{aligned} & 𝑙_j x_i+ 𝑙_i 𝑥_j - 𝑙_i 𝑙_j \\ & 𝑢_j x_i + 𝑢_i x_j - u_i u_j \end{aligned} \right\} \le z_{ij} \le z^+_{ij} = \min \left\{ \begin{aligned} & 𝑙_j x_i+ u_i 𝑥_j - u_i l_j \\ & 𝑢_j x_i + l_i x_j - l_i u_j \end{aligned} \right\} \]

Spatial branching

  • Let \((x^*, z^*)\) be the solution of the bilinear relaxation
  • If not integer feasible, can branch on an integer variable
  • Otherwise:
    • If \(z^*_{ij} = x^*_i x^*_j\), for all bilinear term, we have a solution
    • Otherwise refine our bilinear relaxation:
      • Pick \(x_i\) or \(x_j\) s.t. \(z^*_{ij} \ne x^*_i x^*_j\)
      • Create two child nodes with \(x_i \le x^*_i\) and \(x_i \ge x^*_i\)
      • Refine bilinear relaxation in the two nodes

Other techniques targeted at non-convex MIQCQP

  • RLTCuts
    • Reformulation Linearization Technique (Sherali and Adams, 1990)
    • Multiply linear constraint by a variable, linearize resulting products
  • BQPCuts
    • Facets of the Boolean Quadratic Polytope (Padberg 1989)
    • Clique cuts from the paper
  • SDPCuts
    • Relax \(Z=x x^T\) to \(Z \succeq x x^T\), and outer approximate resulting cone
  • OBBT:
    • Optimization based bound tightening
    • Infer tighter bound on variables involved in products by LP.

Non-convex MIQCQP performance history

New in Gurobi 11

  • Improved recognition of convexity
  • Branching improvements
    • Strong branching for bilinear terms
    • Better choice for deciding to branch on an integer or bilinear

Example solve

Pooling problem from MINLPLIB

m.optimize()
Gurobi Optimizer version 11.0.0 build v11.0.0beta2 (mac64[x86] - macOS 13.6.1 22G313)

CPU model: Intel(R) Core(TM) i5-1038NG7 CPU @ 2.00GHz
Thread count: 4 physical cores, 8 logical processors, using up to 8 threads

Optimize a model with 662 rows, 403 columns and 2229 nonzeros
Model fingerprint: 0x883de6ff
Model has 70 quadratic constraints
Variable types: 295 continuous, 108 integer (108 binary)
Coefficient statistics:
  Matrix range     [2e-03, 1e+03]
  QMatrix range    [1e+00, 1e+00]
  QLMatrix range   [1e+00, 1e+00]
  Objective range  [1e+00, 5e+00]
  Bounds range     [1e+00, 1e+03]
  RHS range        [1e+00, 1e+03]
Presolve removed 188 rows and 93 columns
Presolve time: 0.01s
Presolved: 754 rows, 310 columns, 2283 nonzeros
Presolved model has 70 bilinear constraint(s)
Variable types: 218 continuous, 92 integer (92 binary)

Root relaxation: objective 1.262419e+04, 542 iterations, 0.01 seconds (0.01 work units)

    Nodes    |    Current Node    |     Objective Bounds      |     Work
 Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

     0     0 12624.1926    0  101          - 12624.1926      -     -    0s
     0     0 10416.0029    0  111          - 10416.0029      -     -    0s
     0     0 10405.6081    0  108          - 10405.6081      -     -    0s
     0     0 10299.1966    0  104          - 10299.1966      -     -    0s
     0     0 10299.1966    0  104          - 10299.1966      -     -    0s
     0     0 10299.1833    0  125          - 10299.1833      -     -    0s
     0     0 10299.1828    0  109          - 10299.1828      -     -    0s
     0     0 10299.0296    0  125          - 10299.0296      -     -    0s
     0     0 10299.0195    0  112          - 10299.0195      -     -    0s
     0     0 10299.0195    0  115          - 10299.0195      -     -    0s
     0     0 10298.8700    0   84          - 10298.8700      -     -    0s
     0     2 10298.8700    0   84          - 10298.8700      -     -    0s
* 1064   307             157    10246.216735 10292.1669  0.45%  18.2    0s
H 1139   270                    10246.219957 10291.0140  0.44%  17.7    0s

Cutting planes:
  Cover: 3
  Implied bound: 18
  Clique: 1
  MIR: 26
  Flow cover: 33
  Flow path: 9
  GUB cover: 1
  Inf proof: 3
  RLT: 47

Explored 1392 nodes (24192 simplex iterations) in 0.84 seconds (0.77 work units)
Thread count was 8 (of 8 available processors)

Solution count 2: 10246.2 10246.2 

Optimal solution found (tolerance 1.00e-04)
Best objective 1.024621995655e+04, best bound 1.024621995655e+04, gap 0.0000%

Agenda

  1. Introduction
  2. Mixed-Integer Convex Quadratic Optimization
  3. Non-Convex Quadratic Optimization
  4. General Functions, piece-wise linear and beyond

Function constraints in Gurobi

Since Gurobi 9.0. Allow to state \(𝑦=𝑓(𝑥)\)

  • \(𝑓\) is a predefined function
  • \(𝑦\) and \(𝑥\) are one-dimensional variables

Library of predefined functions include:

  • \(e^x\), \(a^x\), \(\ln(x)\), \(\log_a(x)\), logistic
  • \(\sin(x)\), \(\cos(x)\), \(\tan(x)\),
  • monomials \(x^a\), polynomials of one variable \(a_0 + a_1 x^1 + a_2 x^2 + ...\)

Example:

m = gp.Model()
x = m.addVar()
y = m.addVar()
m.addGenConstrLog(x, y)

New treatment in Gurobi 11

  • Gurobi 9.0-10.0: nonlinear functions replaced during presolve by a piece-wise linear approximation.

  • Gurobi 11, can treat nonlinear functions directly:

    • Set FuncNonLinear=1
    • No other changes to users’ code
%%{ init: { 'flowchart': { 'curve': 'linear' } } }%%
flowchart TD
        IV[Integer variables <br> SOS Constraints]
        GF["General function <br> y = f(x)"]

        GF--"Gurobi ≤ 10"--- IV

        MILP[MILP]
            IV --- MILP

        MINLP[MINLP]
            GF --"Gurobi 11 <br> FuncNonLinear=1"---- MINLP

        BB(branch-and-cut)
         MILP --- BB

        SBB(spatial b&b)
            MINLP --- SBB


    class Model subg
    class Convex subg
    class Nonconvex subg
    class Problems subg
    class Algorithms subg

    classDef subg fill:#FFFFFF;
    classDef difficult stroke:#dd2113
    classDef easy stroke:#1675a9

Algorithmic approach

  • Similar to bilinear formulation/relaxation
  • For each function, compute lower/upper envelope
  • Spatial branching to refine them
  • Additional difficulties:
    • detect when functions are convex/concave
    • functions can be locally convex

\(y = \log(x), 0.8 \le x \le 5\)

Great powers and great responsabilities

  • Gurobi 11.0 handles select univariate nonlinear functions
  • But, those can be composed

E.g.: Consider for \(x\ge 0\): \[ f(x) = \sqrt{1 + x^2} + \ln({x + \sqrt{1 + x^2}}) \le 2 \] We can formulate it:

  • introduce auxiliary variables \(u, v, w, z \ge 0\)
  • add constraints \(u = 1 + x^2\), \(v = \sqrt{u}\), \(w = x + v\), \(z = \ln({w})\)
  • \(f(x) = v + z\)

Caution

Feasibility tolerances!

Decomposition leading to large infeasibility

\[ y = f(x) = \frac{x}{\sin (x)} \] a solution is \(x= 0.0001\), \(y = 1.0000000016666666\).

Now decompose \(f(x)\): \(u = sin(x)\), \(v = \frac{1}{u}\), \(y = x \cdot v\).

And consider:

  • \(\overline x = 0.0001\)
  • \(\overline u = 0.000098999999833333343\) (\(\sin (\overline x) - 10^-6\))
  • \(\overline v = 10101.010118015167\)
  • \(\overline y = 1.0101010118015167\)

😱

We have \(\overline y - f(\overline x) \approxeq 10^-2\)

Example

Logistic Regression