Linear Constraint Optimization

\begin{align} \min_{x}\ &c^{\top} x + d \\ s.t.\ &Gx \preceq h \\ & Ax = b \end{align}

linear objective function linear constraints single our inequality forms a half-space; the entire feasible set is denoted by a series of linear functions—-these linear equalities are each CONVEX. The resulting feasible set, then, is ALSO convex—-meaning any line within the set remains within the set. So, any local minimum is a global minimum.
This is a convex problem where all constrains and objectives are affine.
Chebyshev center The Chebyshev center of some polyhedron

\begin{equation} \mathcal{P}= \left\{x \mid a_{i}^{T} x \leq b_{i}, i \in 1 \dots m\right\} \end{equation}

is the center of the largest ball:

\begin{equation} \mathcal{B} = \left\{x_{c} + u \mid \norm{u}_{2} \leq r\right\} \end{equation}

that you can stick into the polyhedron. Now, a_{I}^{T} x \leq b_{i} for all x \in \mathcal{B} iff:

\begin{equation} \text{sup}\left\{ a_{i}^{T} \left(x_{c}+u\right) \mid \norm{u}_{2} \leq r\right\} = a_{i}^{T} x_{c} + r \norm{a_{i}}_{2} \leq b_{i} \end{equation}

which means we have an LP in x_{c}, r:

\begin{align} &\max r \\ &s.t.\ a_{i}^{T} x_{c} + r \norm{a_{i}}_{2} \leq b_{i} \end{align}

3 cases of design points points on the interior of feasible set is always non-optimal, because we can always move along c gradient points on the faces could be optimal IFF the face is perpendicular to c, the gradient of our objective function—but you can always slide along the face, making there be infinite solutions if its on a face (because c doesn’t change along that face) points on vertex could be optimal unclosed feasible set—possibly unbounded solution linear program equality form \begin{align} \min_{x}\ &c^{\top} x \ s.t.\ &Ax = b \ & x \geq 0 \end{align}
We can transpose our standard-form expression into an equality form above by introducing another “slack variable” s, such that we write:

\begin{equation} Ax \leq b \implies Ax + s = b \end{equation}

and introducing s > 0 as another constraint.
FONC for linear program To convert a linear program into a unconstrained program—
Our Lagrangian:

\begin{align} \mathcal{L} = c^{\top} x - \mu^{\top} - \lambda^{\top} (Ax - b) \end{align}

Recall our KKT Conditions:
feasibility: Ax = b, x \geq 0 (satisfies our constraints) dual feasibility: \mu \geq 0 complementary slackness: u \odot x = 0 stationarity: A^{\top}\lambda + \mu = c (because FONC: we want \nabla_{x}\mathcal{L} = 0 Recall:

\begin{equation} \min_{x} \max_{\mu \geq 0, \lambda} \mathcal{L}(x,\mu) \end{equation}

For linear programs, is that FONC are sufficient by themselves!
Dual Certificates Dual Certificates is a method of, given some proposed solution x^{*} of the primal problem for a Linear Program, we can verify the solution with the dual solution.
Recall the dual form of the primal problem:

\begin{align} \max_{\lambda}\ &b^{\top}\lambda\\ s.t.\ &A^{\top} \lambda \leq c \end{align}

Typically, this would give us an lower bound to the primal solution. However, for Linear Programs, they are equal.
So, given some (x^{*}, \lambda^{*}), we can verify it by checking:
primal feasible: Ax \leq b, x \geq 0 dual feasible: A^{\transpose} \lambda \leq c dual certificate: c^{\top} x^{*} = b^{\top} \lambda^{*} This allows us to check, for a given solution, whether or not it is actually the correct solution.
simplex algorithm find feasible sets check for KKT Conditions in FONC for linear program if done, return if not, try to swap possible vertices while maintaining optimality recall that feasible solution essentially only exist on verticies. so this algorithm is a way of systematically checking verticies.
Vertex Partitioning We need to define the act of partitioning before we can perform the simplex algorithm.
We have two sets \mathcal{B}, \mathcal{V} which contains indicies in x. For \mathcal{B}, if i \in \mathcal{B}, then x_{i} \geq 0. If i \in \mathcal{V}, x_{i} = 0. Consider n, the number of design variables in x; and m, the number of equality constraints in linear program equality form; vertices is defined the number of ways you can zero-out n-m of the values of x. This means that a valid vertex would need n-m elements in \mathcal{V}, and m elements in \mathcal{B} (because that’s just n-(n-m) = m).
Recall:

\begin{align} \min_{x}\ &c^{\top} x \\ s.t.\ &Ax = b \\ & x \geq 0 \end{align}

Because \mathcal{B} are the only sets for which non-negative vertices’s exist (that is, x_{\mathcal{V}} = 0), we can write:

\begin{equation} Ax = A_{\mathcal{B}} x_{\mathcal{B}} = b \end{equation}

meaning, we can solve for the correct basis by writing:

\begin{equation} x_{\mathcal{B}} = A_{\mathcal{B}}^{-1} b \end{equation}

this will have a unique solution, then, when A_{\mathcal{B}} is an isomorphism. Importantly, A_{\mathcal{B}} is not going to be invertible for all choices of set \mathcal{B}, so not all partitions are valid verticies.
@dataclass class LinearProgram: A: Matrix b: Vector c: Vector def get_valid_vertex(self, basis: List[int]): # because we want to index wile reserving the # original order, we sort basis = sorted(basis) # then, we want to get our A basis and try to # get x basis A_basis = self.A[:, basis] try: x_basis = A_basis.invert() @ b except: print("Can't invert! Not a valid basis.") # we want to set the rest of it as 0 x_new = zeros_like(self.c) x_new[basis] = x_basis return x_new Partitioning FONC Recall our KKT Conditions:
feasibility: Ax = b, x \geq 0 (satisfies our constraints) dual feasibility: \mu \geq 0 complementary slackness: u \odot x = 0 stationarity: A^{\top}\lambda + \mu = c We want to re-write this in terms of our basis. We can write A^{\top}\lambda + \mu = c as:

\begin{equation} \begin{cases} A_{\mathcal{B}}^{\top} \lambda + \mu_{\mathcal{B}} = c_{\mathcal{B}} \\ A_{\mathcal{V}}^{\top} \lambda + \mu_{\mathcal{V}} = c_{\mathcal{V}} \end{cases} \end{equation}

And our complementary slackness tells us that either \mu or x is zero. If x_{\mathcal{B}} is non-zero, then \mu needs to be zero. If \mu is non-zero, then all x would need to be zero. Either way:

\begin{equation} \mu_{\mathcal{B}} = 0 \end{equation}

Therefore, let’s consider the previous statement:

\begin{equation} A_{\mathcal{B}}^{\top} \lambda + \mu_{\mathcal{B}} = c_{\mathcal{B}} \end{equation}

plugging in the \mu_{\mathcal{B}} from above, we obtain:

\begin{align} &A_{\mathcal{B}}^{\top} \lambda + \mu_{\mathcal{B}} = c_{\mathcal{B}} \\ \Rightarrow\ & A_{\mathcal{B}}^{\top} \lambda + 0 = c_{\mathcal{B}} \\ \Rightarrow\ & \lambda = A^{-\top}_{\mathcal{B}} c_{\mathcal{B}} \end{align}

this allows us to check the FONC for linear program directly as we can compute \lambda for a given problem and basis.
Phase 1: Initialize with a Valid Partition We need to first choose an initial partition which gives an initial feasible vertex. Recall our normal LP:

\begin{align} \min_{x}\ &c^{\top} x \\ s.t.\ &Ax = b \\ & x \geq 0 \end{align}

Let’s change this problem by simply asking for constraint satisfaction and NOT minimality (i.e. set c = 0), AND let us introduce a helper variable z:

\begin{align} \min_{x,z}\ &\mqty[0^{\top}, 1^{\top}] \mqty[x \\ z] \\ s.t.\ &\mqty[A, Z] \mqty[x \\ z] = b \\ & \mqty[x \\ z] \geq 0 \end{align}

where Z is a diagonal square matrix of size of b:

\begin{equation} Z_{ii} = \begin{cases} 1, \text{if}\ b_{i} \geq 0 \\ -1 \end{cases} \end{equation}

(we do this because without this we can’t support negative b).
Notice! For A \in \mathcal{L}(n,m), the partition \mathcal{B} = \{n+1, …, n+m\} satisfies our constraints! We can see this by seting all x = 0, and initialize z_{j} = |b_{j}|.
If the original LP is feasible, the optimal solution to this LP should result in all of the z = 0. So, if we solved the above and yet z \neq 0 to satisfy the constraint Ax + Zz = b, our original LP is infeasible.
Once we obtain the \mathcal{B} from the setup above, we can use that set as the initial partition to solve (because constraints are the same, and z=0, our partitions can be recycled):

\begin{align} \min_{x,z}\ &\mqty[c^{\top} & 0^{\top}] \mqty[x \\ z] \\ s.t.\ &\mqty[A & I \\ 0 & I] \mqty[x \\ z] = \mqty[b \\ 0] \\ & \mqty[x \\ z] \geq 0 \end{align}

(we keep z around because our initial partition may include z).
Phase 2: Bounce Between Valid Partitions Until We Are Optimal Consider an initial partition \mathcal{B}; choose an entering index q \in \mathcal{V}, and a leaving index p \in \mathcal{B}. This results in a vertex transition x \to x’.
Let’s now define this thing x_{\mathcal{B}}’, which is the elements that survived the transition. Formally, this is the slice of the next vertex, projected onto the previous basis. Since p left (i.e. the subspace formed by new \mathcal{B}, of which x’ belongs, does not include index p), we expect that (x’_{\mathcal{B}})_{p} = 0.
So then, the middle parts of deriving phase 2 essentially involves “how do we grantee constraint satisfaction?” Recall Linear Program KKT, and recall:

\begin{align} \min_{x}\ &c^{\top} x \\ s.t.\ &Ax = b \\ & x \geq 0 \end{align}

feasibility: Ax = b, x \geq 0 (satisfies our constraints) So we need first:

\begin{equation} Ax = b = Ax' \end{equation}

specifically, then:

\begin{equation} A_{\mathcal{B}} + x_\mathcal{B} = b = A_{\mathcal{B}} x’_{\mathcal{B}} + A_{\{q\}} x_{\{q\}}' \end{equation}

(the right side makes sense because, as with the remark above, the leaving index would have a value of 0, meaning it won’t change our computation).
Using the equality above, we can write:

\begin{equation} x_{\mathcal{B}}’ = x_{\mathcal{B}} - A^{-1}_{\mathcal{B}} A_{\{q\}} x’_{\{q\}} \end{equation}

objective optimality: c^{\top} x’ We can re-write our objective in terms of the old basis

\begin{align} c^{\top} x’ &= c_{\mathcal{B}}^{\top} x’_{\mathcal{B}} + c_{q} x’_{q} \end{align}

Plugging in our value for x’_{\mathcal{B}} from above, we obtain:

\begin{align} c^{\top} x’ &= c_{\mathcal{B}}^{\top} \left(x_{\mathcal{B}} - A^{-1}_{\mathcal{B}} A_{\{q\}} x’_{\{q\}}\right) + c_{q} x’_{q} \end{align}

Simplifying, we will obtain a c_{\mathcal{B}}^{\top} A^{-1}_{\mathcal{B}}, which, recall from above that \lambda = A^{-\top}_{\mathcal{B}} c_{\mathcal{B}} \implies \lambda^{\top} = c_{\mathcal{B}}^{\top} A^{-1}_{\mathcal{B}}, so we can write this into:

\begin{align} c^{\top} x’ &= c_{\mathcal{B}}^{\top} x_{\mathcal{B}} - \lambda^{\top} A_{\{q\}} x’_{q} + c_{q} x’_{q} \end{align}

Plugging in our value from stationarity above (A_{\mathcal{V}}^{\top} \lambda + \mu_{\mathcal{V}} = c_{\mathcal{V}}), but specifically constraining to a single set \{q\} instead of \mathcal{V}, and simplifying, we also have:

\begin{align} c^{\top} x’ &= c^{\top} x + \mu_{q} x’_{q} \end{align}

conditions for optimality The top two steps gives:

\begin{equation} x_{\mathcal{B}}’ = x_{\mathcal{B}} - A^{-1}_{\mathcal{B}} A_{\{q\}} x_{q}' \end{equation}

and:

\begin{equation} c^{\top} x’ - c^{\top} x = \mu_{q} x_{q}' \end{equation}

key idea: the second expression tells that our objective will only decrease if \mu_{q} is negative (recall that correctly constrained x is non negative). So, to progress towards optimality, we need to find some q which makes \mu_{q} negative. If all of \mu_{\mathcal{V}} is non-negative, we know we can’t find a better \mu anymore so we found the optimum.
finishing up Finally, we need to 1) choose the entering index q and 2) solve for the corresponding leaving index p.
Possible Heuristics for choosing q
choose q which reduces c^{\top} x maximally Danzig: choose q with the most negative \mu_{\{q\}} Bland: choose first q with negative \mu_{\{q\}} we will use the first one.
Solving for p
First choose an entering index q, then solve for leaving index p. Given a choice of q, a correctly chosen p would give (x’_{\mathcal{B}})_{p} = 0. Using this fact and substituting it into the definition of x_{\mathcal{B}}’ gives:

\begin{equation} \left(x’_{\mathcal{B}}\right)_{p} = 0 = \left(x_{\mathcal{B}}\right)_{p} - \left(A^{-1}_{\mathcal{B}} A_{\{q\}}\right)_{p} x’_{q} \end{equation}

the right hand side, then gives the following expression we call minimum ratio test:

\begin{equation} x_{q}’ = \frac{(x_{\mathcal{B}})_{p}}{(A_{\mathcal{B}}^{-1} A_{\{q\}})_{p}} \end{equation}

This computation, then, allows us to check the value of \mu_{q} x’_{q}, which we want to make as small as possible. For a possibly-valid of choice of q, we will have negative \mu_{q} (see above in conditions for optimality); therefore, to minimize \mu_{q}x’_{q}, we need to make x’_{q} as large as we can through our choice of p.
Hence, to choose p, we just want to iteratively select the best p to minimize the above ratio.
putting it all together @dataclass class LinearProgram: A: Matrix b: Vector c: Vector def solve(self): row,col = self.A.shape ### initialize components of basis problem ### # recall we want Z to be an identity # matrix, but sign match b to maintain # positivity z = Z = diag(self.b/abs(self.b)) ### set up basis problem ### A_p = cat([self.A, Z]) b_p = b # x # z c_p = cat([zeros(col), zeros(row)]) # initial basis is n+1 ... n+m initial_basis = range(col+1, col+row+1) # create problem lp = LinearProgram(A_p, b_p, c_p) solver = LPSolver(lp, initial_basis) x_opt, basis = solver.solve() ### check feasibility ### # if any z results are non-zero, bad an stop if any(x_opt[col:] != 0): raise Exception("LP constraints not feasible") ### solve surrogate problem ### A_p = block_matrix(self.A, identity(row, row)], [zeros((row, col)), identity(row, row)) b_p = cat([b, zeros(row)]) c_p = cat([c, zeros(row)]) # create actual problem lp = LinearProgram(A_p, b_p, c_p) solver = LPSolver(lp, initial_basis) x_final, _ = solver.solve() # return only the x part, and not z return x_final[:col] @dataclass class LPSolver: LP: LinearProgram basis: List[int] # we solve by iterating upon the basis until # we find which which satisfies optimality (no swaps are possible) def solve(self): # step returns True when optimal while self.step() == False: continue # solve for our Xb (recall Ax = b) Ab = self.LP.A[:, self.basis] # we assume that basis_indicies is a valid basis, so Ab is invertable x_basis = Ab.invert() @ self.LP.b # fill that in x = zeros_like(self.c) x[self.basis] = x_basis return x, self.basis # greedily select the q that minimizes the system def step(self): basis_indicies = sorted(self.basis) # free indicies are the indicies that are not in the basis free_indicies = sorted(set(range(1, len(self.LP.c)+1)) - set(basis_indicies)) # partition our program into free and unfree sections Ab, Av = self.LP.A[:, basis_indicies], self.LP.A[:, free_indicies] cb, cv = self.LP.c[basis_indicies], self.LP.c[free_indicies] ##### # solve for our new Xb (recall Ax = b) # we assume that basis_indicies is a valid basis, so Ab is invertable x_basis = Ab.invert() @ b # solve for our lagrange multipliers, which is derived in "partitioning FONC" section lmbda = Ab.invert().T() @ cb mu_v = cv - Av.T() @ lmbda ###### # where best_delta is c^top x' - c^\top x (q_indx_best, p_indx_best, xq_prime_best, best_delta) = (None, None, float("inf"), float("inf")) # now, we need to use the greedy heuristic to choose q, meaning # we need to find our most negative mu_q x'q decrement out of v # recall q in V for q_indx, mu_q in enumerate(mu_v): if mu_q < 0: # otherwise it will literally be an increase because mu_v[indx] > 0, # and we want the most negative mu_q # we want to find the best p given this q p_indx, xq_prime = self.solve_for_p_given_q(q_indx) # apply greedy heuristic if mu_q * xq_prime < best_delta: q_indx_best, p_indx_best, xq_prime_best, best_delta = ( q_indx, p_indx, xq_prime, mu_q * xq_prime ) # if q is still None, this means that we are at optimiality (i.e. all # choices of mu_q results in a higher error if q == None: # satisfies dual feasibility, this is global optimum return True # otherwise, this means that our problem is likely unbounded # that is, no smaller xq' exists as the minimum raito if xq_prime_best == float("inf"): raise Exception("unbounded LP") # swap p and q basis_indicies[basis_indicies.index(p_indx_best)] = q_indx_best # cache our new basis self.basis = basis_indicies # and then tell the system we haven't optimality return False # use the minimum ratio test def solve_for_p_given_q(self, q_cand_indx): basis_indicies = sorted(self.basis) # free indicies are the indicies that are not in the basis free_indicies = sorted(set(range(1, len(self.LP.c)+1)) - set(basis_indicies)) # partition our program into free and unfree sections Ab = self.LP.A[:, basis_indicies] # this is A_{q}, which only has our one q candidate Aq = self.LP.A[:, free_indicies[q_cand_indx]] # solve for our new Xb (recall Ax = b) # we assume that basis_indicies is a valid basis, so Ab is invertable x_basis = Ab.invert() @ self.LP.b ##### # solve for our minimum ratio # first, we can cache all possible denomitanor # values for x_q' given each index of p xq_prime_denominator_cache = Ab.inverse() @ Aq # cache the smallest ratio we have see p_indx_best, xq_prime_best = 0, float("inf") # now iteratively find a p that will work for p_indx, d in enumerate(xq_prime_denominator_cache): # it could be zero, and we don't want a divide by zero if d > 0: # see if we got a better minimum ratio as what we had min_ratio = x_basis[p_indx] if min_ratio < xq_prime_best: p_indx_best, xq_prime_best = p_indx, min # return the best we have return p_indx_best, xq_prime_best