Gurobi Performance on OET Benchmarks

Matthias Miltenberger

Manager of Optimization Support

Agenda

What are we going to cover?


What is the Gurobi Parameter Tuner?

How does the Tuner work?

How much performance can we gain?

How to evaluate benchmarks?

OET LP results

OET MIP results




Gurobi Parameter Tuning Tool

A quick overview

  • Part of the standard Gurobi distribution (often referred to as the “Tuner”)
  • Evoke from the command line using grbtune or call from the API
  • Tries to find the best parameters within the given time limit
  • Distributable over multiple machines to speed up the tuning process
  • Heavily used by Gurobi Experts to tune customer models

How the Tuner works

General concepts

  • Start with a baseline run to compare against
  • Tests parameters in a predefined order of priority
  • Quickly discards non-improving parameter sets
  • Primarily tries to reduce time to optimality with minimal parameter changes
  • Different tuning goals or metrics for MIPs that don’t solve to optimality:
    • TuneCriterion=0 – ignore secondary criterion
    • TuneCriterion=1 – optimality gap as secondary criterion (default)
    • TuneCriterion=2 – objective of the best feasible solution found
    • TuneCriterion=3 – best objective bound (dual bound)
  • TuneBaseSettings to pass a set of initial parameters to try first
  • TuneTrials to set number of random seeds for each model

Tuning Demo

6 MIP instances, 3 tune trials, time limit: 600s, tuning time limit: 50 000s

Important things to consider

Some words of caution

  • Tuning many models at the same time takes more time
  • Tuning too many diverse models may not lead to good results
  • Tuning few models can result in over-tuning → use more tune trials (seeds)
  • Balance is important - the defaults have been tested on thousands of models

Performance Variability

The bain of tuning NP-hard problems

Evaluating and analyzing Gurobi runs

Using the open-source gurobi-logtools

  • gurobi-logtools is a package to
    • parse Gurobi logs into pandas DataFrames or Excel sheets
    • aggregate results over multiple runs, models, or parameters
    • visualize everything in an interactive way
    • complement the tuner by allowing in-depth analytics
import gurobi_logtools as glt

results = glt.parse(["*.log"])
summary = results.summary()
glt.plot(summary, type="box")

Gurobi on the OET LP instances

Tuning the sample set (medium run times)

Mean runtime is reduced from 42.29s to 22.73s

Gurobi on the OET LP instances

Apply tuned parameters to entire benchmark

  • only slight improvement on the whole test set (note the log-scale x-axis)
  • models are too diverse to succeed with a single parameter set

Gurobi on the OET LP instances

Apply tuned parameters to entire benchmark


Mean runtime is only reduced from 510.49s to 447.24s (including timeouts)

  • only slight improvement on the whole test set (note the log-scale x-axis)
  • models are too diverse to succeed with a single parameter set

This is actually a sign of a good benchmark!

Notes on the LP benchmark

  • All 48 models taken directly from the website
  • Some are in LP format and some in MPS format (LP format is less accurate than MPS)
    • May lead to inconsistent results compared to generating models directly
  • Reading times can be as high as 900 seconds for GenX-elec models
    • May distort performance results when included as total run time
    • Likely caused by many warnings because of unclean data
  • Warnings about large matrix coefficient ranges (\(10^{-7}\) to \(10^6\), 7 models):
    • TIMES-GEO_E4SMA_Base_scenariocplex
    • TIMES-GEO_E4SMA_NetZero_scenariocplex
    • temoa-US_9R_TS models
  • Warnings about large bounds (up to \(10^9\), all 13 pypsa-eur models)
  • Numerical issues can make performance comparisons difficult
    • How to handle slight violations in the reported solution?
    • Safer numerical parameters may lead to worse performance
    • kea and tui exhibit numerical issues with some parameters

Gurobi on the OET MIP instances

Tuning a set of randomly selected easy instances

It’s not very effective and does not result in a clearly improving parameter set

Mean runtime is reduced from 4.68s to 3.24s, but there is no consistent speedup.

Gurobi on the OET MIP instances

Tuning a specific benchmark subset (PowerModel)

Tuning is very effective and results in clearly improving parameter sets

Mean runtime is reduced from 53.81s to 7.3s, and all models are solved faster.

Notes on MIP benchmark

  • 70 instances, partly LP, partly MPS format
  • Reading times can be as high as 90 seconds for the only GenX-elec model
    • Likely caused by many warnings because of unclean data
  • some GenX models show violations in the final solutions (in order of \(10^{-4}\))
  • some Sienna and Tulipa models show solution violations in order of \(10^{-6}\)
  • Warnings about large bounds (up to \(10^9\), all 3 pypsa-eur models)

Takeaways

  1. Parameter tuning can expose hidden potential/performance in all solvers
    • More effective than using more powerful hardware
  1. Tuning can be time-consuming and works best on distributed machines
  1. There are quite a few pitfalls:
    • Watch out for numerical issues and inconsistent results
    • Avoid over-tuning by regularly re-evaluating your settings and using seeds
    • There likely are no perfect fixed settings for diverse sets of models
    • The default values often choose automatically based on the specific model
  1. When analyzing aggregated results, do not neglect single model behaviors
  1. Improving your model formulation can be even more effective

Ask the Gurobi Experts for help when stuck or unsure how to start!