1-optimization-problem

Problem Formulation

The standard optimization problem will be in the form: \[ \inf_{x \in X} f(x) \\ \] where $X$ is the feasible/constraint region/set, $f: \R^n \mapsto \R$ is the objective function.

Definition: We say that $x^*$ is an optimal solution to the problem if $v^* = f(x^*)$. In this case, we also say that $x^*$ attains optimal value $v^*$.

Definition: We say that $x'$ is a local minimizer if $\exists \epsilon > 0, \forall x \in B(x', \epsilon), f(x) \ge f(x')$.

Types of Problems

Unconstrained: $X = \R^n$
Discrete programming

$X$ is a discrete set, which means $\forall x \in X, \exists \epsilon > 0, X \cap B(x, \epsilon) = \{ x \}$.

Note: Every feasible solution to discrete optimization problem is a local minimizer.
Linear programming

$X = \{ x \in \R^n | (a^i)^T x \le c_i, i=1,2,\dots,m \}$ is a set defined by a finite number of linear inequalities.

$f = b_1 x_1 + b_2 x_2 + \dots + b_n x_n = b^T x$.
Quadratic programming

$X$ is the same as that in linear programming.

$f(x) = \sum_{i=1}^n \sum_{j=1}^n a_{ij} x_i x_j = x^T A x$, where $A = [a_{ij}] \in R^{n \times n}$.

Remark: This form does not include any linear term. Generally, a quadratic function takes on the form $f(x) = x^T A x + b^T x$.

Remark: We may assume $A$ is symmetric, since even it is not, we can have $A' = \frac{A + A^T}{2}$ and \[ x^T A x = x^T A^T x \to x^T A x = x^T A' x \]

The right hand side is obvious because $x^T A^T x$ is a number, and it equals to its transpose $x^T A x$.
Semi-definite programming
Definition: Consider $Q \in \mathcal{S}^{n}$, where $\mathcal{S}^n$ is the set of $n \times n$ symmetric matrix. The following is equivalent:
1. $Q$ is positive semi-definite (short as PSD, denoted as $Q \succcurlyeq 0$).
2. $\forall x \in \R^n, x^T Q x \ge 0$.
3. All eigenvalues of $Q$ are non-negative.
Let $C, A_1, \dots, A_m \in S^n$ and $b \in \R^m$ be given, the semi-definite programming is \[ \begin{aligned} \inf_{x \in \R^n} \quad & b^T x \\ \textrm{s.t.} \quad & C - \underbrace{\sum_{i=1}^m x_i A_i}_{M(x)} \succcurlyeq 0 \end{aligned} \]

Remark: The constraint is called a linear matrix inequality. Observe that $M: \R^m \mapsto S^n$ satisfies \[ M(\alpha x + \beta y) = \alpha M(x) + \beta M(y) \] $M$ is a linear map.

Remark: Compare between linear programming and positive semi-definite programming:

\[ \begin{gathered} \begin{aligned}[t] \inf_{x \in \R^n} \quad & b^T x \\ \textrm{s.t.} \quad & (a^i)^T x \le c_i, \\ & i = 1,\dots,m \end{aligned} \quad \quad \begin{aligned}[t] \inf_{x \in \R^n} \quad & b^T x \\ \textrm{s.t.} \quad & C - \sum_{i=1}^m x_i A_i \succcurlyeq 0 \end{aligned} \end{gathered} \]

In linear programming, construct matrices:
\[ C' = \begin{pmatrix} c_1 & & & \\ & c_2 & & \\ & & \ddots & \\ & & & c_m \end{pmatrix}, A_i' = \begin{pmatrix} a_1^i & & & \\ & a_2^i & & \\ & & \ddots & \\ & & & a_m^i \\ \end{pmatrix} \] Then $$ C’ - _{i=1}^m x_i A_i’ = \[\begin{pmatrix} c_1 - \sum_{j=1}^m x_j a_1^j & & & \\ & c_2 - \sum_{j=1}^m x_j a_2^j & & \\ & & \ddots & \\ & & & c_m - \sum_{j=1}^m x_j a_m^j & \\ \end{pmatrix}\]
$$ because a diagonal matrix is PSD if and only if all of its diagonal entries are non-negative.

In this sense, linear programming is special case of semi-definite programming where $C$ and $A_i$’s are all diagonal.

Examples of Problems

Air traffic control
- $n$ airplanes are arriving.
- $i$-th airplane arrives within $[a_i, b_i]$.
- Assume airplanes arrive and land in order.
- Let $t_i$ be the landing time assigned to $i$-th airplane $i$.
- The metering time is defined to be the time difference between two consecutive airplane landings.
The implicit constraints derived from above conditions is that $a_i \le t_i \le b_i$ and $ t_i t_{i+1}$. For safety, we want to the minimum metering time to be maximized. That is, \[ \begin{aligned} \max_{t} \quad & f(t) \triangleq \min_{1 \le i \le n-1} t_{i+1} - t_i \\ \textrm{s.t.} \quad & a_i \le t_i \le b_i, i=1,\dots,n \\ & t_i \le t_{i+1}, i=1,\dots,n-1 \end{aligned} \] The constraints are all linear. The min operation in $f$, however, doesn’t comfort us. We can introduce a new variable $z$ and convert the original problem to an equivalent one: \[ \begin{aligned} \max_{t, z} \quad & z \\ \textrm{s.t.} \quad & z = \min_{1 \le i \le n-1} t_{i+1} - t_i \\ & a_i \le t_i \le b_i, i=1,\dots,n \\ & t_i \le t_{i+1}, i=1,\dots,n-1 \end{aligned} \] Further, since the objective is to maximize and $z$’s coefficient is positive, the problem can be converted to \[ \begin{aligned} \max_{t, z} \quad & z \\ \textrm{s.t.} \quad & z \le t_{i+1} - t_i, i=1,\dots,n-1\\ & a_i \le t_i \le b_i, i=1,\dots,n \\ & t_i \le t_{i+1}, i=1,\dots,n-1 \end{aligned} \] Now the problem becomes a linear one and is easy to solve.
Data fitting problem

Given data points $(a_i, b_i) \in \R^n \times \R$, a typical choice to fit those data would be an affine function $f(x) = y^T x + t$. Other than the choice of function, the choice of objective function matters too.
- Least squares
  
  The objective minimizes the sum of the squares of errors for all data points: \[ \begin{aligned} \min_{y \in \R^n, t \in \R} \quad & \sum_{i} (y^T a_i - b_i)^2 \\ \end{aligned} \] This is a quadratic programming problem.
- Minimum absolute deviation \[ \begin{aligned} \min_{y \in \R^n, t \in \R} \quad & \sum_{i} |y^T a_i - b_i| \\ \end{aligned} \] Using the same trick of reparameterization, the problem is equivalent to \[ \begin{aligned} \min_{y \in \R^n, t \in \R, z \in \R^m} \quad & \sum_{i=1}^m z_i \\ \textrm{s.t.} \quad & z_i = |y^T a_i - b_i|, i=1,\dots,m \\ \end{aligned} \] Further, the trick of relaxation applies: \[ \begin{aligned} \min_{y \in \R^n, t \in \R, z \in \R^m} \quad & \sum_{i=1}^m z_i \\ \textrm{s.t.} \quad & y^T a_i - b_i \le z_i, i=1,\dots,m \\ & y^T a_i - b_i \ge -z_i, i=1,\dots,m \\ \end{aligned} \] This becomes a linear programming problem.

The key takeaway from these examples are the problem equivalence trick, which includes introducing new variables and relaxing $\max$ or $\min$ to inequalities.

Last updated on Jan 7, 2022