Written by: Paul Rubin

Primary Source: OR in an OB World. 9/12/2018.

I seem to bring up “big M” models a lot, so apologies if I end up repeating myself in places here. Not long ago, someone passed along highlights of a “big M” type model to me and asked if he could somehow reformulate to get rid of \(M\). I did not see any good way to do it, but I did offer up suggestions about choosing values for \(M\), and I thought that might make a decent blog post.

Just for clarity, what I’m referring to is an integer or mixed-integer program (I’ll stick to linear objective and constraints) in which binary variables are used to, in loose terms, turn constraints on or off. So a representative constraint might look like the following:

\(\displaystyle \sum_i a_i x_i \le b + M(1-y)\)

with the \(a_i\) coefficients, the \(x_i\) variables (discrete, continuous or a mix), \(y\) a binary variable and \(M\) a “large” coefficient. Think of \(M\) as a stunt double for infinity. The notion is that if \(y=1\), the constraint is “turned on” and reduces to

\(\displaystyle \sum_i a_i x_i \le b.\)

If \(y=0\), the constraint is “turned off” and reduces to

\(\displaystyle \sum_i a_i x_i \le b + M \approx \infty,\)

which should be satisfied by any choices for \(x\) the solver might make. There are other variations of “big M” constraints, but I’ll stick with this one for illustrative purposes.

### The perils of \(M\)

Choosing a value for \(M\) can be a tricky business. Suppose first that we choose \(M\) small enough that, when \(y=0\) and the constraint should be “off”,

\(\displaystyle \sum_i a_i x_i^* \gt b + M\)

for some solution \(x^*\) that *should* be optimal but now appears infeasible to the solver. The result is an incorrect solution. So let’s refer to a value for \(M\) as **correct** if it is large enough that no potentially optimal solution violates the constraint when \(y=0\). (“Correct” is my term for this. I don’t know if there is an “official” term.) Correctness is essential: choose an incorrect (too small) value for \(M\) and you risk getting an incorrect solution.

Since it is frequently not obvious how large \(M\) needs to be in order to be guaranteed correct, people tend to err on the side of caution by choosing really massive values for \(M\). That brings with it a different set of problems (ones often ignored in introductory text books). First, branch-and-bound (or branch-and-cut) solvers tend to rely on the continuous relaxation of subproblems (dropping integrality constraints but keeping the functional constraints). Large values of \(M\) make for weak relaxations (producing very loose bounds).

Suppose, for instance, that we are solving a model that both builds roads and routes traffic along those roads. \(y\) represents the decision to build a particular road (\(y=1\)) or not (\(y=0\)). If we build the road, a cost \(c\) is incurred (represented by the term \(cy\) in the objective function) and we can send unlimited traffic along it; if not, there is no cost but no traffic is committed. In our constraint, the left side is traffic on the road and \(b=0\), so traffic there can either be up to \(M\) (if built) or 0 (if not built). Now suppose, for example, that the user chooses \(M=10^9\) and the solver, in a continuous relaxation of some subproblem, sets \(y=10^{-6}\). The solution to the relaxed problem pays only one millionth of \(c\) for the privilege of allowing 1,000 units of traffic on this route, which basically allows it to “cheat” (and perhaps grossly underestimates the cost of any actual solution).

A related problem is limited arithmetic precision. Since double-precision floating-point arithmetic (the way the computer does arithmetic with real numbers) has limited precision, and is thus susceptible to rounding errors, solvers have to establish standards for “close enough to feasible” and “close enough to integer”. Continuing the previous example, it is entirely possible that the solver might look at \(y=10^{-6}\), decide that is within integrality tolerance, and treat it as \(y=0\), possibly leading to what it thinks is an “optimal” solution with 1,000 units of traffic running over a road that does not exist. Oops.

Finally, note that \(M\) is part of the coefficient matrix. Mixing very large coefficients (like \(M\)) with much smaller coefficients can create numerical instability, leading the solver to spend more time computing linear program pivots and possibly leading to totally erroneous solutions. That’s too big a topic to get into here, but I’m pretty sure I’ve mentioned it elsewhere.

Despite all this, “big M” models are still very common in practice. There is a nice paper by Codato and Fischetti [1] that shows how a form of Bender’s decomposition can be used to get rid of the \(M\) coefficients. I’ve used it successfully (for instance, in [2]), but Bender’s decomposition is an advanced technique (i.e., not for the faint of heart), and is not always supported by high-level modeling languages.

So, if we are stuck with “big M” models, what can we do to choose values of \(M\) that are both correct and and **tight** (again, my term), meaning small enough that they avoid numerical problems and hopefully produce relaxations with bounds strong enough to do some good?

### Not all \(M\) are created equal

Most “big M” models have more than one constraint (and frequently many) containing a large coefficient \(M\). One of my pet peeves is that authors of text books and journal articles will frequently just us \(M\), with no subscripts, everywhere such a coefficient is needed. This, in turn, breeds a tendency for modelers to choose one large value for \(M\) and use it everywhere.

Back when manuscripts were produced on typewriters, it was a bit of a pain in the ass to put in subscripts, so I can see how the trend would have started. (Quick tangent: Nate Brixius recently did a blog post with a picture of a typewriter, for those too young to be familiar with them. I’ll link it here for the benefit of younger readers … and also note that the one pictured is much more modern than the one I learned on, which was not electric.) Today, when people routinely use LaTeX to write manuscripts, there’s not much excuse for omitting one measly subscript here and there. Anyway, my point is that it is usually better to use a different, customized value of \(M\) for each constraint that needs one.

### Customized how?

In some cases, model context will provide you an obvious choice for \(M\). For example, suppose you are selecting warehouse locations and planning shipments from warehouses to customers. A typical “big M” constraint will look like the following (slightly different from our previous constraint but clearly related:

\(\displaystyle \sum_j x_{ij} \le M_i y_i.\)

Here variable \(x_{ij}\) indicates the amount shipped from warehouse \(i\) to customer \(j\), binary variable \(y_i\) is 1 if we use that warehouse and 0 if we do not, and the intent of the constraint is to say that if we do not use (and pay for) a warehouse, we cannot ship anything from it. One obvious choice for \(M_i\) is the capacity of the warehouse. A better choice might be the smaller of that capacity or the maximum volume of customers demands that might plausibly be assigned to that warehouse. The latter might be based, for example, on knowing that the company would not ship anything to customers outside a certain distance from the warehouse.

In other cases, there may be some less obvious way to extract suitable (correct and hopefully tight) values for the \(M_i\). A while back, I was working on mixed-integer programming models for two group discriminant analysis, and in one paper ([3]) I needed to pick values for \(M\). Without going into gory details, I was able to normalize the coefficients of the (linear) discriminant function under construction and then compute valid coefficients \(M_i\) by looking at the euclidean distances between pairs of observations. I don’t claim that the \(M_i\) I came up with were the tightest possible, but they were correct and they produced faster solution of the models than what I got with some arbitrarily large values I initially tried.

Finally, you can actually solve subproblems to get correct and (hopefully) tight \(M\) values. Whether this is feasible depends on how many you need and how large and scary your model is. Going back to my first example, the trick would work something like this. Relax all integrality constraints. Drop the constraint in question. If you have any other “big M” constraints for which you have not yet computed tight values of \(M\), pick something safely large for those coefficients, trying to avoid numerical problems but not worrying about tightness. Now switch the objective to maximizing \(\sum_i a_i x_i – b\). The optimal objective value is your value for \(M\). It is clearly correct: any feasible solution to the MIP model is a feasible solution to the LP relaxation, and so the value of \(\sum_i a_i x_i – b\) cannot exceed your choice of \(M\) (regardless of whether \(y\) is 0 or 1). Repeat for each additional constraint containing a “big M” coefficient (switching from minimizing to maximizing if the nature of the constraint warrants it).

The process may be time-consuming, but at least you are solving LPs rather than MIPs. It is also a bit trickier than I made it sound, at least when multiple \(M_i\) are involved. You have to guess values for any \(M_i\) for which you have not yet solved, and guessing big values for them may result in a looser relaxation than necessary, which in turn may result in an inflated value for the \(M_i\) you are currently choosing. You should definitely get correct choices for the \(M\) coefficients; it’s just the tightness that is in question. Even so, the values you get for the \(M_i\) just might be tight enough to save you more time in the MIP solution than it costs to solve the LPs.

### References

[1] Codato, G. and Fischetti, M. *Combinatorial Benders’ Cuts for Mixed-Integer Linear Programming*. Operations Research, **2006**, Vol. 54(4), pp. 756-766.

[2] Bai, L. and Rubin, P. A. *Combinatorial Benders Cuts for the Minimum Tollbooth Problem*. Operations Research, **2009**, Vol. 57(6), pp. 1510-1522.

[3] Rubin, P. A. *Heuristic Solution Procedures for a Mixed-Integer Programming Discriminant Model*. Managerial and Decision Economics, **1990**, Vol. 11(4), pp. 255-266.

#### Paul Rubin

#### Latest posts by Paul Rubin (see all)

- A Value-ordered Java Map - October 18, 2019
- Numerical Inaccuracy (Linear Algebra Edition) - October 9, 2019
- Using Java Collections with CPLEX - July 22, 2019