Applying proven methodology to model engineering
Five principles from Dave Farley:
But we are different, right?!
What is the smallest model that we can implement and solve?
Document the most important decision variable, constraint and objective
Tip
The model is the blueprint for your algorithm. Carefully checking this before writing code, is an extra check upfront. Documenting the model in mathematical form also simplifies extensions.
Make sure data is realistic in size and structure
We will use data from the State of Georgia (USA), for 99 power plants and 10.8 million people. Data is aggregated into 10 groups of plants.
Tip
The data is essential for proper testing. Mathematical optimization solvers may behave very differently on randomly generated data. Also performance does not scale linearly with the size of your model.
One function or class, turning a problem into a solution
For any optimization project, I start with:
problem = read_problem('path')
parameters = Parameters()
solver = Solver(parameters)
solution = solver.solve(problem)
Note
The mathematical model will be completely encapsulated in the solver
, invisible to the outside world. In fact, the solver may use multiple models, or any other search strategy.
import pandas as pd
from os import path
def read_problem(folder: str) -> Problem:
df_load_curves = pd.read_csv(path.join(folder, 'demand.csv'))
df_subset = df_load_curves[(df_load_curves['YEAR']==2011)&(df_load_curves['MONTH']==7)&(df_load_curves['DAY']==1)]
demand = df_subset.set_index(['HOUR']).LOAD.to_dict()
df_plant_info = pd.read_csv(path.join(folder, 'plant_capacities.csv'))
P = set(df_plant_info['Plant'].unique())
plant_type = df_plant_info.set_index('Plant').PlantType.to_dict()
fuel_type = df_plant_info.set_index('Plant').FuelType.to_dict()
capacity = df_plant_info.set_index('Plant').Capacity.to_dict()
plants = set(
Plant(p, plant_type[p], fuel_type[p], capacity[p])
for p in P
)
return Problem(demand, plants)
import gurobipy as gp
class Solver:
def __init__(self, parameters: Parameters):
self.parameters = parameters
def solve(self, problem: Problem) -> Solution:
with gp.Model("powergeneration") as model:
P = problem.plants
H = set(problem.demand.keys())
z = model.addVars(P, H, name="z", lb=0)
model.addConstrs((z.sum('*',h) == problem.demand[h] for h in H),
name='demand')
model.addConstrs((z[p,h] <= p.capacity for p in P for h in H),
name='capacity')
model.optimize()
power = { (p,h): z[p,h].X for (p,h) in z }
return Solution(problem, power)
Set parameter WLSAccessID
Set parameter WLSSecret
Set parameter LicenseID to value X
WLS license X - registered to X
Gurobi Optimizer version 12.0.0 build v12.0.0rc1 (mac64[arm] - Darwin 23.6.0 23G93)
CPU model: Apple M1
Thread count: 8 physical cores, 8 logical processors, using up to 8 threads
WLS license X - registered to X
Optimize a model with 264 rows, 240 columns and 480 nonzeros
Model fingerprint: 0xd1470513
Coefficient statistics:
Matrix range [1e+00, 1e+00]
Objective range [0e+00, 0e+00]
Bounds range [0e+00, 0e+00]
RHS range [8e+01, 1e+04]
Presolve removed 264 rows and 240 columns
Presolve time: 0.00s
Presolve: All rows and columns removed
Iteration Objective Primal Inf. Dual Inf. Time
0 0.0000000e+00 0.000000e+00 0.000000e+00 0s
Solved in 0 iterations and 0.00 seconds (0.00 work units)
Optimal objective 0.000000000e+00
Focus on the choices made by your model, not just the consequences
Pandas is familiar for most Python developers (especially data scientists) and great for analyzing optimization solutions.
from dataclasses import asdict
class Solution:
def get_dataframe(self) -> pd.DataFrame:
plants = pd.DataFrame.from_records(
[asdict(plant) for plant in self.problem.plants]
).rename(columns={'id':'plant'})
hours = pd.DataFrame({'hour': self.problem.demand.keys()})
df_problem = plants.merge(hours, how='cross')
df_problem = df_problem.set_index(['plant', 'hour'])
df_solution = pd.DataFrame({'power': self.power})
df_solution.index.names = ['plant', 'hour']
df_solution = df_solution.rename(index=
lambda pair: pair.id, level=0)
result = df_problem.join(df_solution)
return result
type | fuel_type | capacity | power | |
---|---|---|---|---|
(‘BIOMASS’, 22) | BIOMASS | BLQ | 470.96 | 470.96 |
(‘BIOMASS’, 23) | BIOMASS | BLQ | 470.96 | 470.96 |
(‘BIOMASS’, 24) | BIOMASS | BLQ | 470.96 | 0 |
(‘Bowen’, 1) | COAL | BIT | 2622.37 | 2622.37 |
(‘Bowen’, 2) | COAL | BIT | 2622.37 | 0 |
(‘Bowen’, 3) | COAL | BIT | 2622.37 | 0 |
(‘Bowen’, 4) | COAL | BIT | 2622.37 | 1494.82 |
Tip
With the data above, we can easily sum
by hour to check power against capacity. We can also do a row-wise check to compare power against capacity.
Export the full mathematical model from Gurobi:
Examine the output in a text editor:
Minimize
Subject To
demand[1]: z[Edwin_I_Hatch,1] + z[OIL,1] + z[Jack_McDonough,1]
+ z[BIOMASS,1] + z[Bowen,1] + z[HYDRO,1] + z[GAS,1]
+ z[OTHER_COAL,1] + z[Scherer,1] + z[Vogtle,1] = 11212.28914
demand[2]: z[Edwin_I_Hatch,2] + z[OIL,2] + z[Jack_McDonough,2]
+ z[BIOMASS,2] + z[Bowen,2] + z[HYDRO,2] + z[GAS,2]
+ z[OTHER_COAL,2] + z[Scherer,2] + z[Vogtle,2] = 10144.43032
...
capacity[Edwin_I_Hatch,1]: z[Edwin_I_Hatch,1] <= 1626.41228
capacity[Edwin_I_Hatch,2]: z[Edwin_I_Hatch,2] <= 1626.41228
...
Extend the problem all the way through to the solution
Lets add fuel cost and health cost to the problem and minimize these as an objective.
# Fuel cost per fuel type
df_fuel_costs = pd.read_csv(path.join(folder, 'fuel_costs.csv'))
dict_fuel_costs = df_fuel_costs[df_fuel_costs['year']==2011].T.to_dict()[9]
# Health cost for each plant/hour
H = demand.keys()
df_health_costs = pd.read_csv(path.join(folder, 'health_costs.csv'))
dict_health_costs = df_health_costs[(df_health_costs['Year']==2007)&(df_health_costs['Day']==1)].set_index(['Plant','Hour']).to_dict()['Cost'] # Health cost/MWh (plant)
dict_health_costs.update({(p,h): 0 for p in P for h in H if p not in ['Bowen','Jack McDonough','Scherer']})
plants = set(
Plant(
(...)
dict_fuel_costs[fuel_type[p]],
{ h: dict_health_costs[p, h] for h in H }
) for p in P
)
with gp.Model("powergeneration") as model:
P = problem.plants
H = set(problem.demand.keys())
PH = [(p,h) for p in P for h in H]
z = model.addVars(P, H, name="z", lb=0)
model.addConstrs((z.sum('*',h) == problem.demand[h] for h in H),
name='demand')
model.addConstrs((z[p,h] <= p.capacity for (p,h) in PH),
name='capacity')
objective = gp.quicksum(p.fuel_cost*z[p,h] for (p,h) in PH)
objective += gp.quicksum(p.health_cost[h]*z[p,h] for (p,h) in PH)
model.setObjective(objective, sense=GRB.MINIMIZE)
model.optimize()
def get_dataframe(self) -> pd.DataFrame:
plants = pd.DataFrame.from_records(
[asdict(plant) for plant in self.problem.plants]).rename(columns={'id':'plant'})
hours = pd.DataFrame({'hour': self.problem.demand.keys()})
df_problem = plants.merge(hours, how='cross')
df_problem = df_problem.set_index(['plant', 'hour'])
df_problem['health_cost'] = df_problem.apply(
lambda row: row.health_cost[row.name[1]], axis=1)
df_solution = pd.DataFrame({'power': self.power})
df_solution.index.names = ['plant', 'hour']
df_solution = df_solution.rename(index=
lambda pair: pair.id, level=0)
result = df_problem.join(df_solution)
result['total_fuel_cost'] = result['fuel_cost'] * result['power']
result['total_health_cost'] = result['health_cost'] * result['power']
result['total_cost'] = result['total_fuel_cost'] + result['total_health_cost']
return result
Our updated DataFrame now contains cost components, both per unit of power generated and total. This will be extended later to include the other cost factors.
fuel_cost | health_cost | power | total_fuel_cost | total_health_cost | total_cost | |
---|---|---|---|---|---|---|
(‘Scherer’, 6) | 18.39 | 3.56861 | 2860.11 | 52597.4 | 10206.6 | 62804 |
(‘Scherer’, 7) | 18.39 | 2.96348 | 2860.11 | 52597.4 | 8475.88 | 61073.3 |
(‘Scherer’, 8) | 18.39 | 2.29133 | 2779.39 | 51112.9 | 6368.49 | 57481.4 |
(‘Scherer’, 9) | 18.39 | 2.02691 | 2807.62 | 51632.2 | 5690.79 | 57322.9 |
(‘Scherer’, 10) | 18.39 | 1.5784 | 2860.11 | 52597.4 | 4514.39 | 57111.8 |
(‘Scherer’, 11) | 18.39 | 1.60359 | 2860.11 | 52597.4 | 4586.45 | 57183.9 |
import plotly.express as px
df_melted = df.reset_index().melt(
id_vars=['plant', 'hour'],
value_vars=['total_fuel_cost', 'total_health_cost'],
var_name='component'
)
fig = px.bar(
df_melted.groupby(['plant', 'component']).sum().reset_index(),
x='plant', y='value', color='component',
title='Cost per plant', width=1000, height=400)
Go from one working version to the next
# Commented out because Q is undefined, need to investigate
# d = model.addVars(P, Q, name="d", lb=0)
# TODO: Read demand from input files
model.addConstrs((z.sum('*',h) == demand[h] for h in H))
# --> I think this line doesn't work properly, help!
model.addConstrs((v[p,h] <= u[p,h]) for p in P for h in H)
model.optimize()
Note
The code we showed earlier is minimal, but works just fine. It is a solid starting point for future development of additional model aspects.
Suppose we want to add a minimum amount of power generated when a plant is on. We also want to consider fixed operating cost per plant.
@dataclass(frozen=True, repr=False)
class Plant:
id: str
type: str
fuel_type: str
capacity: float
fuel_cost: float
health_cost: dict[int, float] = field(hash=False)
operating_cost: float
def is_nuclear(self):
return self.type == 'NUCLEAR'
def get_min_generation_fraction(self):
return 0.8 if self.is_nuclear() else 0.01
Construct tiny problems manually that challenge individual constraints.
import unittest
from ucp import Parameters, Plant, Problem, Solver
class TestCapacityConstraints(unittest.TestCase):
def test_min_power(self):
demand = { 0: 20, 1: 40 }
health_cost = { 0: 0, 1: 0 }
plants = [
Plant('Free', 'COAL', '', 100, 3, health_cost, 0),
Plant('Limited', 'NUCLEAR', '', 50, 1, health_cost, 0)
]
problem = Problem(demand, plants)
parameters = Parameters()
solver = Solver(parameters)
solution = solver.solve(problem)
self.assertEqual(20, solution.power[plants[0], 0])
self.assertEqual(40, solution.power[plants[1], 1])
if __name__ == '__main__':
unittest.main()
z = model.addVars(P, H, name="z", lb=0)
u = model.addVars(P, H, name="u", vtype=GRB.BINARY)
model.addConstrs((z.sum('*', h) == problem.demand[h]) for h in H,
name='demand')
model.addConstrs((z[p,h] <= p.capacity*u[p,h] for (p,h) in PH),
name='max_capacity')
model.addConstrs((z[p,h] >=
p.get_min_generation_fraction()*p.capacity*u[p,h] for (p,h) in PH),
name='min_capacity')
objective = gp.quicksum(p.fuel_cost*z[p,h] for (p,h) in PH)
objective += gp.quicksum(p.health_cost[h]*z[p,h] for (p,h) in PH)
objective += gp.quicksum(p.operating_cost*u[p,h] for (p,h) in PH)
model.setObjective(objective, sense=GRB.MINIMIZE)
model.optimize()
power = { (p,h): z[p,h].X for (p,h) in z }
active = { (p,h): u[p,h].X>.5 for (p,h) in u }
return Solution(problem, power, active)
def get_dataframe(self) -> pd.DataFrame:
(...)
df_solution = pd.DataFrame({
'power': self.power,
'active':self.active
})
(...)
result = df_problem.join(df_solution)
result['total_fuel_cost'] = result['fuel_cost'] * result['power']
result['total_health_cost'] =
result['health_cost'] * result['power']
result['total_operating_cost'] =
result['operating_cost'] * result['active']
result['total_cost'] = result['total_fuel_cost']
+ result['total_health_cost'] + result['total_operating_cost']
return result
Note how the amount of power changes quite rapidly in hours 1-3 and 13-15 for plants of type OTHER COAL
:
hour | operating_cost | active | power | total_operating_cost | capacity |
---|---|---|---|---|---|
1 | 7.3845 | True | 2060.91 | 7.3845 | 4396.83 |
2 | 7.3845 | True | 993.053 | 7.3845 | 4396.83 |
3 | 7.3845 | False | 0 | 0 | 4396.83 |
4 | 7.3845 | False | 0 | 0 | 4396.83 |
12 | 7.3845 | False | 0 | 0 | 4396.83 |
13 | 7.3845 | False | 0 | 0 | 4396.83 |
14 | 7.3845 | True | 991.383 | 7.3845 | 4396.83 |
15 | 7.3845 | True | 2180.32 | 7.3845 | 4396.83 |
16 | 7.3845 | True | 2638.4 | 7.3845 | 4396.83 |
with gp.Model("powergeneration") as model:
(...)
model.addConstrs((z[p,h]-z[p,h-1] >= -p.get_ramp_factor()*p.capacity)
for (p,h) in PH if h>1)
model.addConstrs((z[p,h]-z[p,h-1] <= p.get_ramp_factor()*p.capacity)
for (p,h) in PH if h>1)
model.addConstrs((z[p,h] >=
p.get_min_generation_fraction()*p.capacity)
for p in P if p.is_nuclear() for h in H)
Note how the OTHER COAL
ramp up/down is now correct (steps are never more than 25% of capacity), but we shutdown the GAS
plants for only two hours:
hour | GAS | OTHER COAL |
---|---|---|
1 | 1761.31 | 2060.91 |
2 | 1761.31 | 993.053 |
3 | 1590.96 | 0 |
4 | 902.57 | 0 |
5 | 598.1 | 0 |
6 | 453.394 | 0 |
7 | 86.3351 | 0 |
8 | 0 | 0 |
9 | 0 | 0 |
10 | 292.806 | 0 |
hour | GAS | OTHER COAL |
---|---|---|
11 | 543.982 | 0 |
12 | 786.906 | 0 |
13 | 1472.95 | 0 |
14 | 1671.57 | 1081.12 |
15 | 1761.31 | 2180.32 |
16 | 1761.31 | 2638.4 |
@dataclass(frozen=True, repr=False, order=True)
class Plant:
(...)
startup_cost: float
teardown_cost: float
# Startup/teardown cost per fuel type
df_startup_costs = pd.read_csv(path.join(folder, 'startup_costs.csv'))
dict_startup_costs = df_startup_costs[df_startup_costs['year']==2011].T.to_dict()[9]
plants = set(
Plant(
(...)
dict_startup_costs[fuel_type[p]],
dict_startup_costs[fuel_type[p]],
) for p in P
)
with gp.Model("powergeneration") as model:
(...)
v = model.addVars(P, H, name="v", vtype=GRB.BINARY) # start up
w = model.addVars(P, H, name="w", vtype=GRB.BINARY) # tear down
(...)
model.addConstrs((v[p,h] <= u[p,h]) for (p,h) in PH )
model.addConstrs((w[p,h] <= 1-u[p,h]) for (p,h) in PH )
model.addConstrs((v[p,h] - w[p,h] == u[p,h] - u[p,h-1])
for (p,h) in PH if h > 1)
(...)
objective += gp.quicksum(p.startup_cost*v[p,h] for (p,h) in PH)
objective += gp.quicksum(p.teardown_cost*w[p,h] for (p,h) in PH)
(...)
startup = { (p,h): v[p,h].X>.5 for (p,h) in v }
teardown = { (p,h): w[p,h].X>.5 for (p,h) in w }
return Solution(problem, power, active, startup, teardown)
@dataclass
class Solution:
(...)
startup: dict[tuple[Plant, int], bool]
teardown: dict[tuple[Plant, int], bool]
def get_dataframe(self) -> pd.DataFrame:
(...)
df_solution = pd.DataFrame({
...
'startup':self.startup,
'teardown':self.teardown,
})
(...)
result['total_startup_cost'] = result['startup_cost'] * result['startup']
result['total_teardown_cost'] = result['teardown_cost'] * result['teardown']
result['total_cost'] = result['total_fuel_cost']
+ result['total_health_cost'] + result['total_operating_cost']
+ result['total_startup_cost'] + result['total_teardown_cost']
return result
Streamlit is nice to add a little user interface on top of your algorithm
import streamlit as st
import pandas as pd
import plotly.express as px
from final import read_dataset, Parameters, Solver
problem = read_dataset('data/small')
parameters = Parameters()
solver = Solver(parameters)
solution = solver.solve(problem)
df = solution.get_dataframe()
st.set_page_config(layout="wide")
col_demand, col_cost = st.columns(2)
with col_demand:
st.header('Demand')
df_demand = pd.DataFrame({'demand': pd.Series(problem.demand) })
st.line_chart(df_demand, width=500)
with col_cost:
st.header('Cost summary')
cost_columns = [col for col in df.columns if col.startswith('total_')]
df_melted = df.reset_index().melt(
id_vars=['plant', 'hour'],
value_vars=cost_columns,
var_name='component'
)
fig = px.bar(
df_melted.groupby(['plant', 'component']).sum().reset_index(),
x='plant', y='value', color='component', width=500, height=350
)
st.plotly_chart(fig)
tab_schedule, tab_details = st.tabs(['Schedule', 'Details'])
with tab_schedule:
st.header('Schedule')
schedule = pd.pivot_table(df.reset_index(), index='hour', columns='plant', values='power', aggfunc='sum')
st.dataframe(schedule)
with tab_details:
st.header('Details')
st.write(df.reset_index().sort_values(['plant', 'hour']))from Pandas
“Premature optimization is the root of all evil”
Typical questions:
However:
©️ 2024 Gurobi Optimization