8-nonlinear-programming

Nonlinear Programming

Recall the unconstrained optimization problem: \[ \inf_{x \in \R^n} \quad f(x) \]

Proposition: Let \(f: \R^n \mapsto \R\) be a continuously differentiable function, \(\bar x \in \R^n\) be an arbitrary point. If there exists s direction \(d \in \R^n \setminus \{ 0 \}\) such that \(\nabla f(\bar x)^T d < 0\), then there exists \(\alpha_0 > 0\) such that \[ f(\bar x + \alpha d) < f(\bar x), \forall \alpha \in (0, \alpha_0] \] Here, \(d\) is called a descent direction of \(f\) at \(\bar x\).

As a result, a necessary condition for \(\bar x\) to be a local minima is that \(\nabla f(\bar x) = 0\) (first-order necessary condition).

Proposition: Let \(f: \R^n \mapsto \R\) be a convex and continuously differentiable function. Then, \(\bar x\) is a global minima of \(f\) if and only if \(\nabla f(\bar x) = 0\).

Proposition: second-order sufficient condition. Let \(f: \R^n \mapsto \R\) be a twice continuously differentiable function. If \(\nabla f(\bar x) = 0\) and \(\nabla^2 f(\bar x) \succ 0\), then \(\bar x\) is a local minima.

Constrained Optimization

In constrained case, simple conditions does not apply, e.g. \(\inf_{x \ge 1} x^2\) and \(\inf_{x \ge -1} x^2\). Consider the following constrained problem: \[ \label{primal} \tag{P} \begin{aligned} \inf_{x \in \R^n} \quad & f(x) \\ \text{s.t.} \quad & g_i(x) \le 0, i=1,\dots,r \\ & h_j(x) = 0, j=1,\dots,s \end{aligned} \] where \(f, g_i, h_j\) are continuously differentiable.

Theorem: Fritz John necessary condition. Let \(\bar x\) be a local minimum of \(\eqref{primal}\). Then there exist multipliers \(u \in \R\), \(v_1, \dots, v_r \in \R\), \(w_1, \dots, w_s \in \R\) such that \[ \begin{gather} u \nabla f(\bar x) + \sum_{i=1}^r v_i \nabla g_i(\bar x) + \sum_{j=1}^s w_j \nabla h_j(x) = 0 \tag{vanishing gradient} \\ [u, v_1, \dots, v_r, w_1, \dots, w_s] \ne 0 \tag{non-trivial solution} \\ u, v_i \ge 0, i=1,\dots,r \tag{non-negativity} \\ v_i g_i(\bar x) = 0, i=1,\dots,r \tag{complementarity} \\ \end{gather} \] \(v_i\) tells the importance of the inequality constraint \(g_i(x)\); \(v_i = 0\) implies that \(g_i(x) \le 0\) is well-fulfilled (strictly less than zero).

The implication is that, at a local minima, we should not be able to find a direction that decreases the objective value as well as maintains the feasibility. For simplicity of discussion, we drop the equality constraints below.

  1. Linear non-independence

    The vanishing gradient together with the non-trivial solution in essence rules out the possibility of linear independence among the gradients. This is easy to interpret. If \(\nabla f(\bar x), \nabla g_1(\bar x), \dots, \nabla g_r(\bar x)\) are linearly independent, it is easy to find a direction \(d\) in the \(\Col^\perp(\nabla g_1(\bar x), \dots, \nabla g_r(\bar x))\) that is acute to \(-\nabla f(\bar x)\). Then moving along \(d\) at \(\bar x\) will decrease the objective function value but maintain the inequality constraints, which contradicts that \(\bar x\) is a local minima.

  2. Non-negativity + complementarity

    These two components should be discussed together and are a bit intriguing. Please refer to this paper.

Surely, the premise is that \(u\) is nonzero. When the only solution to \(u\) is zero, the corresponding solution \(\bar x\) to \(x\) will be a garbage point: it will not be a local minima. In this case, \(\nabla f(\bar x)\) must have a component that is orthogonal to \(\nabla g_i(\bar x)\)’s. We can walk along this component (or its opposite) to decrease \(f\) without compromising inequality constraints. This motivates the study of constraint qualification, which aims to ensure a nonzero \(u\).

Note that in the above discussion, we ignore the effect of \(\nabla h_j(\bar x)\)’s. They only make the choice of direction stricter, since the direction has to be orthogonal to them.

KKT Conditions and Constraint Qualification

One observation is that, if the constraint gradients are linearly independent (though required dependent), there is no way to have \(u = 0\) or otherwise \(v_1, \dots, v_r, u_1, \dots, u_s\) has to be zero due to the linear independence, which in turn violates the nonzero multiplier condition. How to “obtain” the contradicting linear independence expectation and nonzero multiplier condition at the same time?

Theorem: Karush-Kuhn-Tucker conditions. Let \(\bar x\) be a local minima of \(\eqref{primal}\). Let \[ I(\bar x) = \{ i: g_i(\bar x) = 0 \} \] be the index set on the active inequality constraint. Suppose that \(\{ \nabla g_i(\bar x) \}_{i \in I} \cup \{ \nabla h_j(\bar x) \}_{j=1}^s\) are linearly independent (linear-independence constraint qualification). Then, there exists \(v \in \R^r\) and \(w \in \R^s\) such that \[ \nabla f(\bar x) + \sum_{i=1}^r \nabla g_i(x) + \sum_{j=1}^s \nabla h_j(x) = 0 \\ v_i \ge 0, i=1,\dots,r \\ v_i g_i(\bar x) = 0, i=1,\dots,r \]

Example: importance of CQ. Consider the problem \[ \begin{aligned} \inf \quad & f(x_1, x_2) = x_1 \\ \text{s.t.} \quad & g_1(x_1, x_2) = (x_1 - 1)^2 + (x_2 - 1)^2 - 1 \le 0 \\ & g_2(x_1, x_2) = (x_1 - 1)^2 + (x_2 + 1)^2 - 1 \le 0 \\ \end{aligned} \] The only feasible and thus optimal solution is \(\bar x = (1, 0)\). In this case, the active inequality constraint gradient is \[ \begin{bmatrix} 0 \\ -2 \end{bmatrix}, \begin{bmatrix} 0 \\ 2 \end{bmatrix} \] They are linearly dependent. Therefore, KKT conditions doesn’t hold. Here is the reason why we intentionally separate the equality constraints from inequality constraints. Though \(h(x) = 0 \iff h(x) \le 0 \land -h(x) \le 0\), if we lay down the equality constraint as inequality constraints, the gradient of \(h(x)\) is always linearly dependent to that of \(-h(x)\).

One takeaway is that, even in this convex optimization problem, KKT may not hold.

Another takeaway is that, it is very convenient to “draw circles” when finding counter examples related to constraint qualification. Better still, leave only one feasible point.

The problem with LICQ is that it is tedious to check for every solution of \(\bar x\). We may prefer some kind of “looser” constraint qualifications.

Theorem: Slater constraint qualification. Suppose that \(g_1, \dots, g_r\) are convex and \(h_1, \dots, h_s\) are affine. Let \(\bar x\) be a local minima. Denote the feasible region as \(S\). Suppose that there exists \(x' \in S\) such that \(g_i(x') < 0\) for \(i=1,\dots,r\). Then, the KKT conditions are necessary for optimality.

This Slater condition quite resembles that in the conic Farkas’ lemma.

Theorem: Suppose that \(g_1, \dots, g_r\) are concave and \(h_1, \dots, h_s\) are affine. Then, the KKT conditions are necessary for optimality.

This theorem is especially useful when all the constraint functions are affine (since affine functions are both convex and concave).

Example: Let \(A \in \R{m \times n}, b \in \R^m, c \in \R^n\) be given. Consider \[ \begin{aligned} \min \quad & c^T x \\ \text{s.t.} \quad & Ax = b \\ & x \ge 0 \end{aligned} \] Beware of the dimension of constraint functions when converting this LP problem to nonlinear programming problem: \[ \begin{aligned} \min \quad & c^Tx \\ \text{s.t.} \quad & g_i(x) = -x_i = -e_i^T x \le 0, i=1,\dots,n \\ & h_i(x) = b_j - a_j^T x = 0, j=1,\dots,m \end{aligned} \] This is a linearly constrained problem; thus KKT conditions are necessary for optimality: \[ \begin{gather} c - \sum_{i=1}^r v_i e_i - \sum_{j=1}^s w_i a_j = 0 \label{grad} \\ v_i \ge 0 \label{dual} \\ v_i x_i = 0, i=1,\dots,r \label{compl} \end{gather} \] From \(\eqref{grad}\), \[ c - v - A^T w = 0 \] Given \(v \ge 0\) from \(\eqref{dual}\) and the complementarity from \(\eqref{compl}\), we have \(c \ge A^T w\) and \((c - A^T w)_i x_i = 0\). This essentially recovers the sufficient and necessary conditions for the optimality of LP. But KKT only tells the condition is necessary.

The question is when are KKT conditions sufficient?

Theorem: KKT sufficient conditions. Suppose that \(f, g_1, \dots, g_r\) are convex and \(h_1, \dots, h_s\) are affine. Suppose further that there exists \((\bar x, \bar v, \bar w)\) satisfying the KKT conditions: \[ \begin{gather} g_i(\bar x) \le 0, h_j(\bar x) = 0, i=1,\dots,r, j=1,\dots,s \tag{primal feasibility} \\ \tag{dual feasibility} \left. \begin{gathered} \nabla f(\bar x) + \sum_{i=1}^r \bar v_i \nabla g_i(\bar x) + \sum_{j=1}^s \bar w_i \nabla h_i(x) = 0 \\ \bar v_i \ge 0, i=1,\dots,r \end{gathered} \right\} \\ \bar v_i g_i(x) = 0, i=1,\dots,r \tag{complentary slackness} \end{gather} \] Then \(\bar x\) is a global minima.

Lagrangian Duality

For simplicity, we rewrite the constrained problem as follows: \[ \label{primalp} \tag{P} \begin{aligned} v_p^* = \inf \quad & f(x) \\ \text{s.t.} \quad & G(x) \le 0 \\ & H(x) = 0 \end{aligned} \] where \(G(x) = [g_1(x), \dots, g_r(x)]\) and \(H(x) = [h_1(x), \dots, h_s(x)]\).

Observe that \[ \eqref{primalp} \equiv \inf_{x \in \R^n} \sup_{v \in \R_+^r, w \in \R^s} \underbrace{f(x) + v^T G(x) + w^T H(x)}_{L(x, v, w)} \] The dual of \(\eqref{primalp}\) is then \[ \begin{equation} v_d^* = \sup_{v \in \R_+^r, w \in \R^s} \inf_{x \in \R^n} L(x, v, w) \label{dualp} \tag{D} \end{equation} \] Observe that \[ \underbrace{\inf_{x \in \R^n}L(x, \bar v, \bar w)}_{\theta(\bar v, \bar w)} \le L(\bar x, \bar v, \bar w) \le \underbrace{\sup_{v \in \R_+^r, w \in \R^s} L(\bar x, v, w)}_{\gamma(\bar x)} \] This implies that \[ v_d^* = \sup_{v \in \R_+^r, w \in \R^s} \theta(v, w) \le \inf_{x \in \R^n} \gamma(x) = v_p^* \]

Theorem: weak duality. Let \(\bar x\) be feasible for \(\eqref{primalp}\) and \((\bar v, \bar w)\) be feasible for \(\eqref{dualp}\). Then, \[ f(\bar x) = \gamma(\bar x) \ge \theta(\bar v, \bar w) \]

Example: illustration of weak duality. Consider a simple case: \[ \begin{gathered} \begin{aligned} v_p^* = \inf \quad & f(x) \\ \text{s.t.} \quad & g(x) \le 0 \end{aligned} \\ \rule{6cm}{0.4pt} \\ \begin{aligned} v_d^* = \sup_{v \ge 0} & \inf_{x \in \R^n} [f(x) + v g(x)] \\ \end{aligned} \end{gathered} \] Let \(\mathcal{G} = \{ (y,z): y = g(x), z = f(x), x \in \R^n \}\). Then, \[ \begin{aligned} \theta(v) &= \inf_{x \in \R^n} [f(x) + v g(x)] \\ &= \inf_{(y, z) \in \mathcal{G}} [z + v y] \end{aligned} \] This set is quite related to the proof of the strong duality under KKT sufficient condition together with Slater condition.

Previous