Student Enrollment and Non-Linear options

Gurobi Live Barcelona

Pierre Bonami - Gurobi

Student Enrollment and Non-Linear options

In this example, we use the student enrollment problem from Bergman et.al. (2020) with Gurobi Machine Learning to illustrate the usage of nonlinear functions and the new FuncNonLinear parameter in Gurobi 11.

The Problem

Data

Data of students admissions in a college is used to predict the probability that a student enrolls to the college.

For each student we have SAT and GPA scores, scholarship (merit) offered and whether the student enrolled or not.

	StudentID	SAT	GPA	merit	enroll
1	1	1507	3.72	1.64	0
2	2	1532	3.93	0.52	0
3	3	1487	3.77	1.67	0
4	4	1259	3.05	1.21	1
5	5	1354	3.39	1.65	1

The Problem

Regression

Based on this data a logistic regression is trained to predict the probability that a student joins the college.

# Run our regression
features = ["merit", "SAT", "GPA"]
target = "enroll"

regression = LogisticRegression(random_state=1)
regression.fit(X=historical_data.loc[:, features], y=historical_data.loc[:, target])

LogisticRegression(random_state=1)

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

The Problem

The Admission Office has data for SAT and GPA scores of $n$ admitted students of the incoming class.

want to offer scholarships to students with the goal of maximizing the expected number of students that enroll
the maximal budget for all scholarship is $0.2 n \, \text{K\$}$
each student can be offered a scholarship up to $2.5 \, \text{K\$}$

	SAT	GPA
StudentID
756	1049	2.75
2549	1258	2.99
4895	1181	2.94

Optimization Model

\[ \begin{aligned} &\max \sum_{i=1}^n y_i \\ &\text{subject to:}\\ &\sum_{i=1}^n x_i \le 0.2 \cdot n,\\ &y_i = g(x_i, SAT_i, GPA_i) & & i = 1, \ldots, n,\\ & 0 \le x \le 2.5. \end{aligned} \]

$x$ is the scholarship offered
$y$ the probability that a students joins
$g(x_i, SAT_i, GPA_i)$ the prediction from the logistic regression.

Optimization Model

Formulation with gurobipy

m = gp.Model()

# The y variables: probability of enrollment of each student.
y = gppd.add_vars(m, studentsdata, name='enroll_probability')

# We add to studentsdata a column of variables to model the "merit" feature.
studentsdata = studentsdata.gppd.add_vars(m, lb=0.0, ub=2.5, name='merit')

# We denote by x this (variable) "merit" feature
x = studentsdata.loc[:, "merit"]

Gurobi 11.0.0 (beta1) - expires 2023-11-21

	merit	SAT	GPA
StudentID
756	<gurobi.Var merit[756]>	1049	2.75
2549	<gurobi.Var merit[2549]>	1258	2.99
4895	<gurobi.Var merit[4895]>	1181	2.94

Optimization Model

Formulation with gurobipy

Add the objective and the budget constraint:

m.setObjective(y.sum(), gp.GRB.MAXIMIZE)

m.addConstr(x.sum() <= 20)

<gurobi.Constr *Awaiting Model Update*>

Optimization Model

Formulation with gurobipy

Finally, insert the constraints from the regression using Gurobi Machine Learning

pred_constr = add_predictor_constr(
    m, regression, studentsdata, y, output_type="probability_1"
)

pred_constr.print_stats()

Model for log_reg:
300 variables
100 constraints
100 general constraints
Input has shape (100, 3)
Output has shape (100, 1)

Optimization Model

Formulation of the logistic regression constraint

The function $g$ is a composition of two function $g(x) = f(\phi(x))$:

$\phi : \mathbb R^3 \rightarrow \mathbb R$ is an affine function
$f : \mathbb R \rightarrow \mathbb R$, is the logistic function

$f(x) = \frac{1}{1+ e^{-x}}$

Formulated in Gurobi:

 log_reg.linreg[0,0]: - 12.45073402807585 merit[5507]
   + 0.0931484531801867 log_reg.SAT[5507]
   + 17.7647402941666 log_reg.GPA[5507] + log_reg.affine_trans[0,0]
   = 168.6220091433963

 log_reg.logistic[(0,0)]:
   enroll_probability[5507] = LOGISTIC ( log_reg.affine_trans[0,0] )

Optimize

By default Gurobi 11 uses piece-wise linear approximation to formulate the logistic constraints

Gurobi Optimizer version 11.0.0 build v11.0.0beta2 (mac64[x86] - macOS 13.6 22G120)

CPU model: Intel(R) Core(TM) i5-1038NG7 CPU @ 2.00GHz
Thread count: 4 physical cores, 8 logical processors, using up to 8 threads

Optimize a model with 101 rows, 500 columns and 500 nonzeros
Model fingerprint: 0x2619fec2
Model has 100 general constraints
Variable types: 500 continuous, 0 integer (0 binary)
Coefficient statistics:
  Matrix range     [9e-02, 2e+01]
  Objective range  [1e+00, 1e+00]
  Bounds range     [2e+00, 2e+03]
  RHS range        [2e+01, 2e+02]
Presolve added 82 rows and 826 columns
Presolve time: 0.02s
Presolved: 183 rows, 1326 columns, 3634 nonzeros
Presolved model has 56 SOS constraint(s)
Variable types: 1320 continuous, 6 integer (6 binary)

Root relaxation: objective 5.754080e+01, 225 iterations, 0.00 seconds (0.00 work units)

    Nodes    |    Current Node    |     Objective Bounds      |     Work
 Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

     0     0   57.54080    0    1          -   57.54080      -     -    0s
H    0     0                      57.5131676   57.54080  0.05%     -    0s
     0     0   57.54080    0    1   57.51317   57.54080  0.05%     -    0s
     0     0   57.54080    0    1   57.51317   57.54080  0.05%     -    0s
     0     0   57.53540    0    1   57.51317   57.53540  0.04%     -    0s
     0     2   57.53540    0    1   57.51317   57.53540  0.04%     -    0s
H    3     2                      57.5253270   57.52729  0.00%   6.0    0s

Explored 5 nodes (433 simplex iterations) in 0.10 seconds (0.03 work units)
Thread count was 8 (of 8 available processors)

Solution count 2: 57.5253 57.5132 

Optimal solution found (tolerance 1.00e-04)
Warning: max constraint violation (3.0434e-04) exceeds tolerance
Warning: max general constraint violation (3.0434e-04) exceeds tolerance
Best objective 5.752532704673e+01, best bound 5.752668626285e+01, gap 0.0024%

Optimize

Remarks

In the log we see warnings about a significant violation.

We can verify this error using pred_constr.get_error:

sorted(pred_constr.get_error())[-5:]

[array([2.62034763e-08]),
 array([1.86171217e-07]),
 array([3.49585945e-07]),
 array([6.46901614e-07]),
 array([0.00030434])]

Using FuncNonLinear

With Gurobi 11, we can set the parameter to handle the logistic function as truly non-linear

m.Params.FuncNonLinear = 1
m.optimize()

Set parameter FuncNonlinear to value 1
Gurobi Optimizer version 11.0.0 build v11.0.0beta2 (mac64[x86] - macOS 13.6 22G120)

CPU model: Intel(R) Core(TM) i5-1038NG7 CPU @ 2.00GHz
Thread count: 4 physical cores, 8 logical processors, using up to 8 threads

Optimize a model with 101 rows, 500 columns and 500 nonzeros
Model fingerprint: 0x2619fec2
Model has 100 general constraints
Variable types: 500 continuous, 0 integer (0 binary)
Coefficient statistics:
  Matrix range     [9e-02, 2e+01]
  Objective range  [1e+00, 1e+00]
  Bounds range     [2e+00, 2e+03]
  RHS range        [2e+01, 2e+02]
Presolve removed 100 rows and 348 columns
Presolve time: 0.00s
Presolved: 498 rows, 153 columns, 1124 nonzeros
Presolved model has 76 nonlinear constraint(s)
Variable types: 153 continuous, 0 integer (0 binary)
Found heuristic solution: objective 55.8610394

Root relaxation: objective 5.771753e+01, 71 iterations, 0.00 seconds (0.00 work units)

    Nodes    |    Current Node    |     Objective Bounds      |     Work
 Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

     0     0   57.71753    0   31   55.86104   57.71753  3.32%     -    0s
H    0     0                      56.7856869   57.71753  1.64%     -    0s
     0     0   57.59774    0   34   56.78569   57.59774  1.43%     -    0s
     0     0   57.59774    0   36   56.78569   57.59774  1.43%     -    0s
H    0     0                      57.3250269   57.59774  0.48%     -    0s
     0     0   57.56029    0   37   57.32503   57.56029  0.41%     -    0s
     0     0   57.56029    0   38   57.32503   57.56029  0.41%     -    0s
     0     0   57.54769    0   38   57.32503   57.54769  0.39%     -    0s
     0     0   57.54504    0   38   57.32503   57.54504  0.38%     -    0s
     0     0   57.54504    0   38   57.32503   57.54504  0.38%     -    0s
     0     0   57.54476    0   38   57.32503   57.54476  0.38%     -    0s
H    0     0                      57.5020506   57.54476  0.07%     -    0s
     0     2   57.54476    0   38   57.50205   57.54476  0.07%     -    0s
H   30     7                      57.5288674   57.54119  0.02%  51.9    0s
H   61    16                      57.5325331   57.54056  0.01%  33.9    0s
H  109    80                      57.5340702   57.54056  0.01%  21.4    0s
H  199   156                      57.5341767   57.54056  0.01%  17.5    0s
H  434   232                      57.5343002   57.53945  0.01%  12.6    0s

Explored 573 nodes (7324 simplex iterations) in 0.14 seconds (0.10 work units)
Thread count was 8 (of 8 available processors)

Solution count 9: 57.5343 57.5342 57.5341 ... 55.861

Optimal solution found (tolerance 1.00e-04)
Best objective 5.753430016394e+01, best bound 5.753945147294e+01, gap 0.0090%

Using FuncNonLinear

Remarks

The solution time is slower but hopefully no warnings about errors.

Here are the 5 largest approximation errors:

[array([1.86171217e-07]),
 array([1.89671129e-07]),
 array([2.08461844e-07]),
 array([3.49585945e-07]),
 array([6.46901614e-07])]

Refined PWL Approximation

We could try to make a more precise PWL approximation. Would that work?

Using the parameter FuncPieceError we can require a better PWL approximation than before

m.params.FuncPieces = -1
m.params.FuncPieceError = 1e-6

Gurobi Optimizer version 11.0.0 build v11.0.0beta2 (mac64[x86] - macOS 13.6 22G120)

CPU model: Intel(R) Core(TM) i5-1038NG7 CPU @ 2.00GHz
Thread count: 4 physical cores, 8 logical processors, using up to 8 threads

Optimize a model with 101 rows, 500 columns and 500 nonzeros
Model fingerprint: 0x2619fec2
Model has 100 general constraints
Variable types: 500 continuous, 0 integer (0 binary)
Coefficient statistics:
  Matrix range     [9e-02, 2e+01]
  Objective range  [1e+00, 1e+00]
  Bounds range     [2e+00, 2e+03]
  RHS range        [2e+01, 2e+02]
Presolve removed 137 rows and 347 columns (presolve time = 16s) ...
Presolve removed 137 rows and 347 columns (presolve time = 49s) ...
Presolve removed 137 rows and 347 columns (presolve time = 55s) ...
Presolve removed 137 rows and 347 columns (presolve time = 63s) ...
Presolve removed 137 rows and 347 columns (presolve time = 87s) ...
Presolve removed 137 rows and 347 columns (presolve time = 105s) ...
Presolve removed 137 rows and 347 columns (presolve time = 115s) ...
Presolve removed 137 rows and 347 columns (presolve time = 135s) ...
Presolve removed 137 rows and 347 columns (presolve time = 178s) ...
Presolve removed 137 rows and 347 columns (presolve time = 212s) ...
Presolve removed 137 rows and 347 columns (presolve time = 279s) ...
Presolve removed 137 rows and 347 columns (presolve time = 288s) ...
Presolve removed 137 rows and 347 columns (presolve time = 305s) ...
Presolve removed 137 rows and 347 columns (presolve time = 322s) ...
Presolve removed 137 rows and 354 columns (presolve time = 325s) ...
Presolve removed 141 rows and 358 columns (presolve time = 339s) ...
Presolve removed 141 rows and 358 columns (presolve time = 370s) ...
Presolve removed 141 rows and 358 columns (presolve time = 376s) ...
Presolve removed 141 rows and 358 columns (presolve time = 384s) ...
Presolve removed 141 rows and 358 columns (presolve time = 408s) ...
Presolve removed 141 rows and 358 columns (presolve time = 426s) ...
Presolve removed 141 rows and 358 columns (presolve time = 436s) ...
Presolve removed 141 rows and 358 columns (presolve time = 455s) ...
Presolve removed 141 rows and 358 columns (presolve time = 499s) ...
Presolve removed 141 rows and 358 columns (presolve time = 532s) ...
Presolve removed 141 rows and 358 columns (presolve time = 608s) ...
Presolve removed 141 rows and 358 columns (presolve time = 618s) ...
Presolve removed 141 rows and 358 columns (presolve time = 637s) ...
Presolve removed 141 rows and 358 columns (presolve time = 655s) ...
Presolve added 121 rows and 1515593 columns
Presolve time: 658.43s
Presolved: 222 rows, 1516093 columns, 4547837 nonzeros
Presolved model has 71 SOS constraint(s)
Variable types: 1516089 continuous, 4 integer (4 binary)

Root simplex log...

Iteration    Objective       Primal Inf.    Dual Inf.      Time
       0    7.8060843e+01   2.519218e+02   0.000000e+00    659s
     118    7.8060832e+01   1.421780e+02   0.000000e+00    685s
    1348    5.7839536e+01   2.120418e+01   0.000000e+00    690s
    1572    5.7543818e+01   0.000000e+00   0.000000e+00    690s

Root relaxation: objective 5.754382e+01, 1572 iterations, 31.29 seconds (120.86 work units)

    Nodes    |    Current Node    |     Objective Bounds      |     Work
 Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

     0     0   57.54382    0    1          -   57.54382      -     -  690s
     0     0   57.54382    0    1          -   57.54382      -     -  691s
     0     2   57.54382    0    1          -   57.54382      -     -  692s
     3     8   57.53739    2    1          -   57.53963      -   405  695s
    15    20   57.53579    4    1          -   57.53579      -   571  702s
*   19    22               4      57.5357879   57.53579  0.00%   632  704s

Explored 23 nodes (15547 simplex iterations) in 704.79 seconds (162.15 work units)
Thread count was 8 (of 8 available processors)

Solution count 1: 57.5358 

Optimal solution found (tolerance 1.00e-04)
Best objective 5.753578785624e+01, best bound 5.753578785624e+01, gap 0.0000%

Refined PWL Approximation

Remarks

The MILP generated for the PWL approximation is considerably larger
The running time was completely prohibitive

We can check the largest errors:

[array([2.62034763e-08]),
 array([1.86171217e-07]),
 array([3.49585945e-07]),
 array([6.46901614e-07]),
 array([7.9213374e-07])]

Thank You!

To know more: