1-optimization-problem

Problem Formulation

The standard optimization problem will be in the form: \[ \inf_{x \in X} f(x) \\ \] where \(X\) is the feasible/constraint region/set, \(f: \R^n \mapsto \R\) is the objective function.

Definition: We say that \(x^*\) is an optimal solution to the problem if \(v^* = f(x^*)\). In this case, we also say that \(x^*\) attains optimal value \(v^*\).

Definition: We say that \(x'\) is a local minimizer if \(\exists \epsilon > 0, \forall x \in B(x', \epsilon), f(x) \ge f(x')\).

Types of Problems

  1. Unconstrained: \(X = \R^n\)

  2. Discrete programming

    \(X\) is a discrete set, which means \(\forall x \in X, \exists \epsilon > 0, X \cap B(x, \epsilon) = \{ x \}\).

    Note: Every feasible solution to discrete optimization problem is a local minimizer.

  3. Linear programming

    \(X = \{ x \in \R^n | (a^i)^T x \le c_i, i=1,2,\dots,m \}\) is a set defined by a finite number of linear inequalities.

    \(f = b_1 x_1 + b_2 x_2 + \dots + b_n x_n = b^T x\).

  4. Quadratic programming

    \(X\) is the same as that in linear programming.

    \(f(x) = \sum_{i=1}^n \sum_{j=1}^n a_{ij} x_i x_j = x^T A x\), where \(A = [a_{ij}] \in R^{n \times n}\).

    Remark: This form does not include any linear term. Generally, a quadratic function takes on the form \(f(x) = x^T A x + b^T x\).

    Remark: We may assume \(A\) is symmetric, since even it is not, we can have \(A' = \frac{A + A^T}{2}\) and \[ x^T A x = x^T A^T x \to x^T A x = x^T A' x \]

    The right hand side is obvious because \(x^T A^T x\) is a number, and it equals to its transpose \(x^T A x\).

  5. Semi-definite programming

    Definition: Consider \(Q \in \mathcal{S}^{n}\), where \(\mathcal{S}^n\) is the set of \(n \times n\) symmetric matrix. The following is equivalent:

    1. \(Q\) is positive semi-definite (short as PSD, denoted as \(Q \succcurlyeq 0\)).
    2. \(\forall x \in \R^n, x^T Q x \ge 0\).
    3. All eigenvalues of \(Q\) are non-negative.

    Let \(C, A_1, \dots, A_m \in S^n\) and \(b \in \R^m\) be given, the semi-definite programming is \[ \begin{aligned} \inf_{x \in \R^n} \quad & b^T x \\ \textrm{s.t.} \quad & C - \underbrace{\sum_{i=1}^m x_i A_i}_{M(x)} \succcurlyeq 0 \end{aligned} \]

    Remark: The constraint is called a linear matrix inequality. Observe that \(M: \R^m \mapsto S^n\) satisfies \[ M(\alpha x + \beta y) = \alpha M(x) + \beta M(y) \] \(M\) is a linear map.

    Remark: Compare between linear programming and positive semi-definite programming:

    \[ \begin{gathered} \begin{aligned}[t] \inf_{x \in \R^n} \quad & b^T x \\ \textrm{s.t.} \quad & (a^i)^T x \le c_i, \\ & i = 1,\dots,m \end{aligned} \quad \quad \begin{aligned}[t] \inf_{x \in \R^n} \quad & b^T x \\ \textrm{s.t.} \quad & C - \sum_{i=1}^m x_i A_i \succcurlyeq 0 \end{aligned} \end{gathered} \]

    In linear programming, construct matrices:

    \[ C' = \begin{pmatrix} c_1 & & & \\ & c_2 & & \\ & & \ddots & \\ & & & c_m \end{pmatrix}, A_i' = \begin{pmatrix} a_1^i & & & \\ & a_2^i & & \\ & & \ddots & \\ & & & a_m^i \\ \end{pmatrix} \] Then $$ C’ - _{i=1}^m x_i A_i’ = \[\begin{pmatrix} c_1 - \sum_{j=1}^m x_j a_1^j & & & \\ & c_2 - \sum_{j=1}^m x_j a_2^j & & \\ & & \ddots & \\ & & & c_m - \sum_{j=1}^m x_j a_m^j & \\ \end{pmatrix}\]

    $$ because a diagonal matrix is PSD if and only if all of its diagonal entries are non-negative.

    In this sense, linear programming is special case of semi-definite programming where \(C\) and \(A_i\)’s are all diagonal.

Examples of Problems

  1. Air traffic control

    • \(n\) airplanes are arriving.
    • \(i\)-th airplane arrives within \([a_i, b_i]\).
    • Assume airplanes arrive and land in order.
    • Let \(t_i\) be the landing time assigned to \(i\)-th airplane \(i\).
    • The metering time is defined to be the time difference between two consecutive airplane landings.

    The implicit constraints derived from above conditions is that \(a_i \le t_i \le b_i\) and $ t_i t_{i+1}$. For safety, we want to the minimum metering time to be maximized. That is, \[ \begin{aligned} \max_{t} \quad & f(t) \triangleq \min_{1 \le i \le n-1} t_{i+1} - t_i \\ \textrm{s.t.} \quad & a_i \le t_i \le b_i, i=1,\dots,n \\ & t_i \le t_{i+1}, i=1,\dots,n-1 \end{aligned} \] The constraints are all linear. The min operation in \(f\), however, doesn’t comfort us. We can introduce a new variable \(z\) and convert the original problem to an equivalent one: \[ \begin{aligned} \max_{t, z} \quad & z \\ \textrm{s.t.} \quad & z = \min_{1 \le i \le n-1} t_{i+1} - t_i \\ & a_i \le t_i \le b_i, i=1,\dots,n \\ & t_i \le t_{i+1}, i=1,\dots,n-1 \end{aligned} \] Further, since the objective is to maximize and \(z\)’s coefficient is positive, the problem can be converted to \[ \begin{aligned} \max_{t, z} \quad & z \\ \textrm{s.t.} \quad & z \le t_{i+1} - t_i, i=1,\dots,n-1\\ & a_i \le t_i \le b_i, i=1,\dots,n \\ & t_i \le t_{i+1}, i=1,\dots,n-1 \end{aligned} \] Now the problem becomes a linear one and is easy to solve.

  2. Data fitting problem

    Given data points \((a_i, b_i) \in \R^n \times \R\), a typical choice to fit those data would be an affine function \(f(x) = y^T x + t\). Other than the choice of function, the choice of objective function matters too.

    • Least squares

      The objective minimizes the sum of the squares of errors for all data points: \[ \begin{aligned} \min_{y \in \R^n, t \in \R} \quad & \sum_{i} (y^T a_i - b_i)^2 \\ \end{aligned} \] This is a quadratic programming problem.

    • Minimum absolute deviation \[ \begin{aligned} \min_{y \in \R^n, t \in \R} \quad & \sum_{i} |y^T a_i - b_i| \\ \end{aligned} \] Using the same trick of reparameterization, the problem is equivalent to \[ \begin{aligned} \min_{y \in \R^n, t \in \R, z \in \R^m} \quad & \sum_{i=1}^m z_i \\ \textrm{s.t.} \quad & z_i = |y^T a_i - b_i|, i=1,\dots,m \\ \end{aligned} \] Further, the trick of relaxation applies: \[ \begin{aligned} \min_{y \in \R^n, t \in \R, z \in \R^m} \quad & \sum_{i=1}^m z_i \\ \textrm{s.t.} \quad & y^T a_i - b_i \le z_i, i=1,\dots,m \\ & y^T a_i - b_i \ge -z_i, i=1,\dots,m \\ \end{aligned} \] This becomes a linear programming problem.

The key takeaway from these examples are the problem equivalence trick, which includes introducing new variables and relaxing \(\max\) or \(\min\) to inequalities.

Next