Greedy Methods Can Be Exact

Written by: Paul Rubin

Primary Source: OR in an OB World, 01/07/2020

We generally sort optimization algorithms (as opposed to models) into two or three categories, based on how certain we are that solutions will be either optimal or at least “good”. An answer by Michael Feldmeier to a question I posted on OR Stack Exchange neatly summarizes the categories:

  • exact methods eventually cough up provably optimal solutions;
  • approximate methods eventually cough up solutions with some (meaningful) guarantee regarding how far from optimal they might be; and
  • heuristics provide no worst-case guarantees (but generally are either easy to implement, fast to execute or both).

I should explain my use of “meaningful” (which is not part of Michael’s answer). A common way to estimate the “gap” between a solution and the optimum is to take \(|z – \tilde{z}|/|z|\), where \(z\) is the objective value of the solution produced by the algorithm and \(\tilde{z}\) is some bound (lower bound in a minimization, upper bound in a maximization) of the optimal solution. Now suppose that we are minimizing a function known to be nonnegative. If we set \(\tilde{s}=0\), we know that any method, no matter how stupid, will have a gap no worse than 100%. To me, that is not a meaningful guarantee. So I’ll leave the definition of “meaningful” to the reader.

What brings all this to mind is a question posted on Mathematics Stack Exchange. The author of the question was trying to solve a nonlinear integer program. He approached it by applying a “greedy algorithm“. Greedy algorithms are generally assumed to be heuristics, since it seldom is possible to provide useful guarantees on performance. In his case, though, the greedy algorithm is provably optimal, mainly due to the objective function being concave and separable. I’ll state the problem and show a proof of optimality below (changing the original notation a bit). Brace yourself: the proof is a bit long-winded.

You start with \(N\) workers to be assigned to \(M\) work stations. The output of workstation \(m\), as a function of the number of workers \(x\) assigned to it, is given by

\(\displaystyle f_{m}(x)=a_{m}x+b_{m}-\frac{c_{m}}{x},\)

where \(a_{m},b_{m},c_{m}\) are all positive constants. Since \(f(0)=-\infty\), we can assume that each work station gets at least one worker (and, consequently, that \(N>M\)). Since \(f_{m}'(x)=a_{m}+c_{m}/x^{2}>0\), each \(f_{m}()\) is monotonically increasing. Thus, we can safely assume that all \(N\) workers will be assigned somewhere. \(f_{m}”(x)=-2c_{m}/x^{3}<0\), so \(f_{m}()\) is strictly concave (which we will need later). We also note, for future reference, that the impact of adding one worker to a current staff of \(x\) at station \(m\) is

\(\displaystyle \Delta f_{m}(x)=a_{m}+\frac{c_{m}}{x(x+1)}>0.\)

Similarly, the impact of removing one worker at station \(m\) is

\(\displaystyle \delta f_{m}(x)=-a_{m}-\frac{c_{m}}{x(x-1)}<0.\)

We see that \(\delta f_{m}(x)\) is an increasing function of \(x\) (i.e., it gets less negative as \(x\) gets bigger). We also note that \(\Delta f_{m}(x)=-\delta f_{m}(x+1)\).

The IP model is easy to state. Let \(x_{m}\) be the number of workers assigned to work station \(m\). The model is

\(\displaystyle \max\sum_{m=1}^{M}f_{m}(x_{m})\)

subject to

\(\displaystyle \sum_{m=1}^{M}x_{m}\le N\)


\(\displaystyle x\in\mathbb{Z}_{+}^{M}.\)

The greedy algorithm starts with a single worker at each station (\(x=(1,\dots,1)\)) and, at each step, adds one worker to the workstation where that worker produces the greatest increase in objective value (breaking ties arbitrarily). It stops when all \(N\) workers are assigned. To prove that it actually finds an optimal solution, I’ll use proof by contradiction.

Let \(x^{(0)},x^{(1)},\dots,x^{(N-M)}\) be the sequence of solutions constructed by the greedy algorithm, with \(x^{(0)}=(1,\dots,1)\), and let \(x^{(k)}\) be the last solution in the sequence for which an optimal solution \(x^{*}\) exists such that \(x^{(k)}\le x^{*}\). The significance of the inequality is that if \(x\le x^{*}\), it is possible to extend the partial solution \(x\) to the optimal solution \(x^{*}\) by adding unassigned workers to work stations. We know that \(k\) is well defined because \(x^{(0)}\le x^{*}\) for any optimal \(x^{*}\). Since we are assuming that the greedy algorithm does not find an optimum, it must be that \(k<N-M\).

Now identify the work station \(j\) to which the greedy algorithm added a worker at step \(k\), meaning that \(x_{j}^{(k+1)}=x_{j}^{(k)}+1\) and \(x_{i}^{(k+1)}=x_{i}^{(k)}\) for \(i\neq j\). Since, by assumption, \(x^{(k)}\le x^{*}\) but \(x^{(k+1)}\not\le x^{*}\), it must be that \(x_{j}^{(k)}=x_{j}^{*}\).

Next, since \(x^{(k)}\le x^{*}\) and \(x^{(k)}\neq x^{*}\) (else \(x^{(k)}\) would be optimal), there is some work station \(h\neq j\) such that \(x_{h}^{(k)}<x_{h}^{*}\). Let \(\tilde{x}\) be the solution obtained from \(x^{(k)}\) by adding a worker to station \(h\): \(\tilde{x}_{h}=x_{h}^{(k)}+1\) and \(\tilde{x}_{i}=x_{i}^{(k)}\) for \(i\neq h\). Observe that \(\tilde{x}\le x^{*}\). The greedy algorithm chose work station \(j\) over work station \(h\) at \(x^{(k)}\), so it must be that

\(\displaystyle \Delta f_{j}(x_{j}^{(k)})\ge\Delta f_{h}(x_{h}^{(k)}). \quad (1)\)

Finally, let \(\hat{x}\) be the result of starting from optimal solution \(x^{*}\) and shifting one worker from station \(h\) to station \(j\). Since

\(\displaystyle x_{j}^{(k+1)}=x_{j}^{(k)}+1=x_{j}^{*}+1=\hat{x}_{j},\)

\(\displaystyle x_{h}^{(k+1)}=x_{h}^{(k)}<x_{h}^{*}\implies x_{h}^{(k+1)}\le\hat{x}_{h}\)


\(\displaystyle x_{i}^{(k+1)}=x_{i}^{(k)}\le x_{i}^{*}=\hat{x}_{i}\,\forall i\notin\{h,j\},\)

we have \(x^{(k+1)}\le\hat{x}\). Under the assumption that \(x^{(k)}\) was the last solution in the greedy sequence that could be extended to an optimal solution, it must be that \(\hat{x}\) is not optimal. Thus the net change to the objective function at \(x^{*}\) when shifting one worker from station \(h\) to station \(j\) must be negative, i.e.,

\(\displaystyle \Delta f_{j}(x_{j}^{*})+\delta f_{h}(x_{h}^{*})<0.\quad (2)\)

We showed previously that, under our assumptions, \(x_{j}^{(k)}=x_{j}^{*}\), from which it follows that

\(\displaystyle \Delta f_{j}(x_{j}^{*})=\Delta f_{j}(x_{j}^{(k)}). \quad (3)\)

We also showed that \(\delta f_{h}()\) is an increasing function. Since \(\tilde{x}_{h}\le x_{h}^{*}\),

\(\displaystyle \delta f_{h}(x_{h}^{*})\ge\delta f_{h}(\tilde{x}_{h})=-\Delta f_{h}(x_{h}^{(k)}). \quad (4)\)

Combining (4) with (2), we have

\(\displaystyle \Delta f_{j}(x_{j}^{*})-\Delta f_{h}(x_{h}^{(k)})<0,\)


\(\displaystyle \Delta f_{j}(x_{j}^{*})<\Delta f_{h}(x_{h}^{(k)}). \quad (5)\)

Combining (3) with (5) yields

\(\displaystyle \Delta f_{j}(x_{j}^{(k)})<\Delta f_{h}(x_{h}^{(k)})\)

which contradicts (1).

The following two tabs change content below.
I'm an apostate mathematician, retired from a business school after 33 years of teaching mostly (but not exclusively) quantitative methods courses. My academic interests lie in operations research. I also study Tae Kwon Do a bit on the side.

Latest posts by Paul Rubin (see all)