Lagrange Multiplier
Standard Form
The standard form of the Lagrange multiplier optimization problem is \[ \begin{aligned} &\inf f_0(x) \\ s.t.\quad & f_i(x) \le 0, i = 1,\dots,r \\ & h_i(x) = 0, i = 1,\dots,s \\ \end{aligned} \] Denote the feasible set of \(x\) that satisfies all the requirements in the above problem as \(\mathcal X \subseteq \R^n\). Then the Lagrangian function is \(L:\R^n \times \R^r \times \R^s \mapsto \R\) (there is an implicit constraint that the variables must reside in the natural domain of the functions):
\[ L(x, \lambda, \mu) = f_0(x) + \sum_{i=1}^r\lambda_if_i(x) + \sum_{i=1}^s\mu_ih_i(x) \] Define the function \(P: \mathcal X \mapsto \R\) as \[ P(x) = \sup\limits_{\lambda,\mu;\lambda_i \ge 0}L(x,\lambda,\mu) \] The primal problem is \[ \begin{equation} \inf_{x \in \mathcal X} P(x) \label{primal} \end{equation} \] It is easy to have \(P(x) =f_0(x)\) on \(\mathcal X\). Thus the primal problem is equivalent to the original problem. Denote primal problem \(\eqref{primal}\)’s optimal value as \(p^\star\).
Define the dual function \(D: \R^r \times \R^s \mapsto \R\) as \[ D(\lambda, \mu) = \inf_{x \in \mathcal \R^n}L(x, \lambda, \mu) \] The Lagrangian dual problem is \[ \begin{equation} \sup_{\lambda,\mu;\lambda_i \ge 0} D(\lambda, \mu) \label{dual} \end{equation} \] Denote the dual problem \(\eqref{dual}\)’s optimal value as \(d^\star\). We always have \(p^\star \ge d^\star\).
Proof
For any feasible \(x \in \mathcal X, (\lambda, \mu) \in {\R^r}^+ \times \R^s\), \[ P(x) = \sup\limits_{\lambda',\mu';\lambda'_i \ge 0}L(x,\lambda',\mu') \ge L(x,\lambda,\mu) \ge \inf_{x' \in \R^n}L(x', \lambda, \mu) =D (\lambda, \mu) \] Therefore \(p^\star \ge d^\star\).
Strong Duality
\(p^\star \ge d^\star\) is called weak duality because it always holds. \(p^\star = d^\star\) is called strong duality because it does not hold in general. Assume, though, a strong duality holds, let \(x^\star\) be the primal optima, \((\lambda^\star, \mu^\star)\) be the dual optima, then
\[ f_0(x^\star) = P(x^\star) = D(\lambda^\star, \mu^\star) \\ \]
Proof \[ \begin{gathered} f_0(x^\star) = p^\star = d^\star = D(\lambda^\star, \mu^\star) = \inf_{x \in \R^n}L(x,\lambda^\star,\mu^\star) \le L(x^\star,\lambda^\star,\mu^\star) \\ f_0(x^\star) = p^\star = P(x^\star) = \sup\limits_{\lambda,\mu;\lambda_i \ge 0}L(x^\star,\lambda,\mu) \ge L(x^\star,\lambda^\star,\mu^\star) \\ L(x^\star,\lambda^\star,\mu^\star) \le f_0(x^\star) \le L(x^\star,\lambda^\star,\mu^\star) \end{gathered} \]
Therefore, \[ f_0(x^\star) = P(x^\star) = D(\lambda^\star, \mu^\star) = L(x^\star,\lambda^\star,\mu^\star) \]
If the strong duality holds and \((x^\star,\lambda^\star,\mu^\star)\) is the optima, they must satisfy the KKT conditions, and we can leverage the KKT conditions to solve the optima and optimal value:
Karush-Kuhn-Tucker Conditions
In standard optimization problem, KKT conditions refer to the below four:
primal constraint: \(f_1(x) \dots,f_r(x) \le 0, h_1(x),\dots,h_s(x) = 0\);
dual constraint: \(\lambda_1,\dots,\lambda_r \ge 0\);
complementary slackness: \(\lambda_1f_1(x),\dots,\lambda_rf_r(x) = 0\);
vanishing gradient of Lagrangian w.r.t. \(x\) : \[ \nabla f_0(x) + \sum_{i=1}^r\lambda_i\nabla f_i(x) + \sum_{i=1}^s\mu_i\nabla h_i(x) = 0 \]
Note that solutions satisfying KKT conditions do not imply a strong duality or an optimal point. For a better discussion between strong duality and KKT conditions, please go to this discussion.
When-to-apply
Slater Condition
Strong duality does not hold generally. But it does hold in standard convex optimization problem. In such case, KKT conditions are also sufficient for strong duality provided that \(x^\star\) is an interior point of the feasible region.
General Case
In cases where we cannot tell strong duality directly, we may still try to apply Lagrangian multiplier to convert the primal problem to the less-constrained dual problem (\(\lambda \ge 0\) is much looser than the constraints in the original problem). That is, we solve \(d^\star\) first, and then check if \(d^\star = p^\star\). We do so with the following process:
- firstly take derivative of \(L\) w.r.t. the unconstrained \(x\) and make it zero (vanishing gradient) to obtain the closed-form expression of \(D(\lambda, \mu)\),
- find \(\lambda^\star \ge 0\) (dual constraint) and \(\mu^\star\) that maximizes the \(D(\lambda, \mu)\) and solve \(x^\star\) w.r.t. \(\lambda^\star\) and \(\mu^\star\),
- finally verify that \(x^\star\) satisfies the constraints (primal constraint and the implied complementary slackness) in the original problem and \(f_0(x^\star) = d^\star\) (strong duality).
If we can successfully go through the above process, we can still solve the problem with Lagrangian multiplier.