3-convex-function

Convex Function

Definition: Let \(f: \R^n \mapsto \R_+\), where \(\R_+ = \R \cup \{ +\infty \}\) (in convex discussion, usually only \(+\infty\) is included) be an extended real-valued function. Note that \(f\) shouldn’t trivially be \(+\infty\) everywhere. We say that \(f\) is convex if \(\forall x_1, x_2 \in \R^n, \alpha \in [0,1]\), \[ f(\alpha x_1 + (1-\alpha) x_2) \le \alpha f(x_1) + (1-\alpha) f(x_2) \] Note that for \(x \in \R\), \(x < +\infty, x + \infty = \infty + x = \infty, +\infty \le +\infty\).

Interestingly, convex functions on an open domain are always continuous.

Definition: The epigraph of a function \(f: \R^n \mapsto \R\) is the set \(\epi f \triangleq \{ (x, t) \in \R^n \times \R: f(x) \le t \}\).

Definition: The effective domain of \(f\) is the set \(\dom f \triangleq \{ x \in \R^n: f(x) < \infty \}\).

Note that the real line does not contain \(\infty\). Therefore, the effective domain of \(f(x) = x\) is still \(\R\).

Definition: Let \(S \subseteq \R^n\) be a set. The indicator of \(S\) is the function \[ \mathbb 1_S (x) = \begin{cases} 0, & x \in S \\ +\infty, & \text{otherwise} \end{cases} \]

Using the indicator, we have \[ \underset{\text{constrained}}{\inf_{x \in S} f(x)} \iff \underset{\text{unconstrained}}{\inf_{x \in \R^n} f(x) + \mathbb 1_S(x)} \] That is, we convert a constrained problem to a unconstrained one.

Proposition (verify it): Let \(f: \R^n \mapsto \R_+\). Then, \(f\) is convex (as a function) if and only if \(\epi f\) is convex (as a set). Moreover, let \(S \subseteq \R^n\) be a set. Then \(S\) is convex (as a set) if and only if \(\mathbb 1_S\) is convex (as a function).

The proposition above associates the convexity of a function with that of its epigraph; and the convexity of a set with that of its indicator.

Corollary: Jensen’s inequality. Let \(f: \R^n \mapsto \R_+\). Then \(f\) is convex if and only if \(f(\sum_{i=1}^m \alpha_i x_i) \le \sum_{i=1}^m \alpha_i f(x_i)\) for any \(m \in \N^+\) and any \(x_1, \dots, x_m \in \R^n\) and any \(\alpha_1, \dots, \alpha_m \ge 0\) such that \(\sum_{i=1}^m \alpha_i = 1\).

Proof:

  • Sufficiency

    Sufficiency is easy to show by interpreting the definition of convex function.

  • Necessity

    For \(i=1,\dots,m\), we have \[ \big( x_i, f(x_i) \big) \in \epi f \] Since \(\epi f\) is convex, \[ \sum_{i=1}^m \alpha_i \big( x_i, f(x_i) \big) = \big( \sum_{i=1}^m \alpha_i x_i, \sum_{i=1}^m \alpha_i f(x_i) \big) \in \epi f \] which means \[ f(\sum_{i=1}^m \alpha_i x_i) \le \sum_{i=1}^m \alpha_i f(x_i) \]

Convexity-preserving Operations

  1. Non-negative combination

    Let \(f_1, \dots, f_m\) be convex functions, \(\alpha_1, \dots, \alpha_m \ge 0\) be non-negative scalars. Then, \[ f \triangleq \sum_{i=1}^m \alpha_i f_i \text{ is convex.} \]

  2. Pointwise supremum

    Let \(I\) be an index set (either finite or infinite) and \(\{ f_i: i \in I \}\) be a collection of convex functions. Then \[ f \triangleq \sup_{i \in I} f_i \text{ is convex.} \] To show it, let \(x_1, x_2 \in \R^n\) and \(\alpha \in [0, 1]\). Then, \[ \begin{aligned} &f(\alpha x_1 + (1-\alpha) x_2) = \sup_{i \in I} f_i(\alpha x_1 + (1-\alpha) x_2) \\ &\le \sup_{i \in I} \big( \alpha f_i(x_1) + (1-\alpha) f_i(x_2) \big) \\ &\le [\alpha \sup_{i \in I} f_i(x_1)] + [(1-\alpha) \sup_{i \in I} f_i(x_2)] \\ &= \alpha f(x_1) + (1-\alpha) f(x_2) \end{aligned} \]

    Geometrically, pointwise supremum is intersecting the epigraphs of \(f_i\)’s. > Example: Consider the mapping \(f: \R^{m \times n} \supseteq X \to ||X||\) where \(||X||\) is the largest singular value of \(X\). Show that \(f\) is convex. > > By the Courant-Fischer theorem, > \[ > \begin{aligned} > ||X|| &= \max_{u \in \R^m, v \in \R^n} u^T X v \\ > \text{s.t.} &\quad ||u||_2 = 1, ||v||_2 = 1 > \end{aligned} > \] > Let \(f_{u,v}(X) = u^T X v\) and \(I = \{ (u,v) \in \R^m \times \R^n: ||u||_2 = 1, ||v||_2 = 1 \}\). Then, > \[ > f(X) = \max_{(u,v) \in I} f_{u,v}(X) > \] > Note that \(f_{u,v}(X)\) is linear and thus convex in \(X\). It follows directly from the pointwise supremum principle that \(f\) is convex.

  3. Composition with increasing function

    Let \(g: \R^n \mapsto \R\) be convex, \(h: \R \mapsto \R\) be convex. \(h \circ g\) is not generally convex. To verify it, take \[ h(x) = -x, g(x) = x^2 \] But if \(h\) is convex as well as increasing, then \(h \circ g\) is convex.

  4. Restriction on lines

    Given a point \(x_0 \in \R^n\) and a direction \(h \in \R^n \setminus \{ 0 \}\), we call the set \[ \{ x_0 + t h: t \in \R \} \] a line through \(x_0\) in the direction \(h\). Let \(f: \R^n \mapsto \R\) be a function. Define

    \[ \tilde f_{x_0, h}(t) \triangleq f(x_0 + t h) \] as the restriction of \(f\) on the line \(\{ x_0 + t h: t \in \R \}\).

Then, \(f\) is convex if and only if \(\tilde f_{x_0, h}\) is convex for any \(x_0 \in \R^n\) and any \(h \in \R^n \setminus \{ 0 \}\).

Differentiable Convex Function

Theorem: Let \(f: \R^n \mapsto \R\) be differentiable, i.e. \(\nabla f\) exists. Then, \(f\) is convex if and only if for every \(x, y \in \R^n\), \[ f(x) \ge f(y) + (\nabla f(y))^T (x - y) \tag{gradient inequality} \]

For a fixed \(y\), the right-hand side of the gradient inequality is in essence an affine function of \(x\). The graph of this affine function is \[ t(x) = f(y) + (\nabla f(y))^T (x - y) \\ \Updownarrow \\ \underbrace{[(\nabla f(y))^T, -1]}_{s^T} \underbrace{\begin{bmatrix} x \\ t \end{bmatrix}}_{z} - \underbrace{[(\nabla f(y))^T y - f(y)]}_{c} = 0 \]

Interestingly, the normal vector \(s\) always points downwards (due to the \(-1\)).

Theorem: Let \(f\) be twice continuously differentiable, i.e. \(\nabla f, \nabla^2 f\) exist and \(\nabla^2 f\) is continuous. Then, \(f\) is convex if and only if \[ \nabla^2 f(x) \succcurlyeq 0, \forall x \in \R^n \]

Example: Let \(f: \mathcal{S}_{++}^n \mapsto \R\) be given by \(f(X) = -\ln \det (X)\) where \(\mathcal{S}_{++}^n\) is the set of \(n \times n\) positive definite matrix. Show that \(f\) is convex.

This problem usually emerges when calculating the capacity of channel in information theory. To show it, we construct \(f\)’s restriction on all the lines. Let \(X_0 \in \mathcal{S}_{++}^n\) and \(H \in \mathcal{S}^n \setminus \{ 0 \}\). Define \[ g(t) \triangleq \tilde f(X_0 + t H) = -\ln \det(X_0 + t H) \] Since \(X_0 \in \mathcal{S}_{++}^n\), we can write \(X_0 = X_0^{1/2} X_0^{1/2}\) where \(X_0^{1/2} \in \mathcal{S}_{++}^n\). Let \(X_0^{-1/2} = (X_0^{1/2})^{-1}\). Then, \[ \begin{align*} g(t) &= -\ln \det(X_0^{1/2} (I + t X_0^{-1/2} H X_0^{-1/2}) X_0^{1/2}) \\ &= -2 \ln \det(X_0^{1/2}) - \ln \det(I + t X_0^{-1/2} H X_0^{-1/2}) \end{align*} \] Note that \(X_0^{-1/2} H X_0^{-1/2}\) is real symmetric. Let \(\lambda_1, \dots, \lambda_n\) be its eigenvalues. Then \((I + t X_0^{-1/2} H X_0^{-1/2})\)’s eigenvalues are \(t \lambda_1 + 1, \dots, t \lambda_n + 1\). \[ \begin{align*} g(t) &= -2 \ln \det(X_0^{1/2}) - \ln \det(I + t X_0^{-1/2} H X_0^{-1/2}) \\ &= -2 \ln \det(X_0^{1/2}) - \sum_i \ln(t\lambda_i + 1) \end{align*} \] Now it is easy to verify that \(-\ln(t\lambda_i + 1)\)’s are convex, which concludes the proof.

Differentiability-agnostic Convex Function

“Differentiability-agnostic” means that no premise on the differentiability of the convex function is assumed. In this section, we discuss several “differentiability-agnostic” topics.

Subgradient

Definition: Let \(f: \R^n \mapsto \R\) be a function. A vector \(s \in \R^n\) is called a subgradient of \(f\) at \(x_0\) if \[ f(x) \ge f(x_0) + s^T (x - x_0), \forall x \in \R^n \] The set \[ \partial f(x_0) \triangleq \{ s \in \R^n: \text{$s$ is a subgradient of $f$ at $x_0$} \} \] is called the sub-differential of \(f\) at \(x_0\). Sub-differential is always convex and closed.

The idea of subgradient is quite inspired from the gradient inequality. The geometric interpretation of subgradient resembles that of gradient.

Theorem: Let \(f: \R^n \mapsto \R\) be convex. Then, \(\partial f(x)\) is non-empty, convex and compact (non-emptiness is due to the continuity (which in turn is due to the openness of \(\R^n\)) and boundedness is due to convexity). Moreover,

  1. \(f\) is differentiable at \(x_0\) if and only if \(\partial f(x_0) = \{ \nabla f(x_0) \}\);
  2. \(x_0\) is the global minima if and only if \(0 \in \partial f(x_0)\);

The topological properties that the sub-differential possesses imply that we can do projection on the sub-differential; besides we may apply Weierstrass theorem to solve some optimization problem w.r.t. the sub-differential (such as what is the minimal-norm subgradient).

Conjugate Function

If two functions have the identical epigraphs, then these two functions are identical. That’s the geometric view of a function.

Let \(f: \R^n \mapsto \R\) be convex. \(\epi f\) is known to be convex. Suppose in addition that \(\epi f\) is non-empty and closed. Recall that a non-empty, closed and convex set can be written as the intersection of lower halfspaces that contain it: \[ \epi f = \bigcap_{H^-([s^T, y_0]^T,c) \supseteq \epi f} H^-([s^T, y_0]^T, c) \]

where \(s \in \R^n\) and \(y_0 \in \R\). We argue that \(y_0 \le 0\) because the last entry of some \(z \in \epi f\) can be arbitrarily large. We further argue that \(y_0 < 0\) or else \(f(x) = -\infty\) everywhere. Without loss of generality let \(y_0 = -1\), since \(H^-([s^T, y_0]^T,c) = H^-([-s^T/y_0, -1]^T,c)\).

Notice that this lower half-space \(H^-([s^T, -1]^T, c)\) is exactly the epigraph of the affine function \[ h_{s, c}(x) = s^T x - c \] because for any \((x, y) \in \R^n \times \R\) that belongs to \(H^-([s^T, -1]^T, c)\), we have \[ \begin{aligned} s^T x - y &\le c \\ y &\ge s^T x - c \end{aligned} \] On the other hand, \(\epi f \subseteq H^-([s^T, -1]^T, c) = \epi{h_{s, c}}\) indicates that \(\forall x \in \R^n, f(x) \ge h_{s, c}(x)\). Consequently, \(\epi f\) can be re-written as \[ \begin{aligned} \epi f &= \bigcap_{H^-([s^T, y_0]^T,c) \supseteq \epi f} H^-([s^T, -1]^T,c) \\ &= \bigcap_{\epi{h_{s, c}} \supseteq \epi f} \epi{h_{s, c}} \\ &= \bigcap_{f \ge h_{s, c}} \epi{h_{s, c}} \end{aligned} \] Therefore, \(f\) can be written as the pointwise supremum of the affine functions who are less than \(f\): \[ f = \sup_{h \le f} h \] The normal vectors of those affine functions that are less than \(f\) and that pass through \((x_0, f(x_0))\) essentially forms the sub-differential of \(f\) at \(x_0\).

Consider the set \[ \begin{aligned} S_f &= \{ (y, t) \in \R^n \times \R: y^T x - t \le f(x), \forall x \in \R^n \} \\ &= \{ (y, t) \in \R^n \times \R: \sup_{x}(y^T x - f(x)) \le t \} \\ &\triangleq \epi{f^*} \end{aligned} \] where \[ f^*(y) \triangleq \sup_{x}(y^T x - f(x)) \] \(f^*(y)\) is called the conjugate function of \(f\).

Previous
Next