3-convex-function

Convex Function

Definition: Let $f: \R^n \mapsto \R_+$, where $\R_+ = \R \cup \{ +\infty \}$ (in convex discussion, usually only $+\infty$ is included) be an extended real-valued function. Note that $f$ shouldn’t trivially be $+\infty$ everywhere. We say that $f$ is convex if $\forall x_1, x_2 \in \R^n, \alpha \in [0,1]$, \[ f(\alpha x_1 + (1-\alpha) x_2) \le \alpha f(x_1) + (1-\alpha) f(x_2) \] Note that for $x \in \R$, $x < +\infty, x + \infty = \infty + x = \infty, +\infty \le +\infty$.

Interestingly, convex functions on an open domain are always continuous.

Definition: The epigraph of a function $f: \R^n \mapsto \R$ is the set $\epi f \triangleq \{ (x, t) \in \R^n \times \R: f(x) \le t \}$.

Definition: The effective domain of $f$ is the set $\dom f \triangleq \{ x \in \R^n: f(x) < \infty \}$.

Note that the real line does not contain $\infty$. Therefore, the effective domain of $f(x) = x$ is still $\R$.

Definition: Let $S \subseteq \R^n$ be a set. The indicator of $S$ is the function \[ \mathbb 1_S (x) = \begin{cases} 0, & x \in S \\ +\infty, & \text{otherwise} \end{cases} \]

Using the indicator, we have \[ \underset{\text{constrained}}{\inf_{x \in S} f(x)} \iff \underset{\text{unconstrained}}{\inf_{x \in \R^n} f(x) + \mathbb 1_S(x)} \] That is, we convert a constrained problem to a unconstrained one.

Proposition (verify it): Let $f: \R^n \mapsto \R_+$. Then, $f$ is convex (as a function) if and only if $\epi f$ is convex (as a set). Moreover, let $S \subseteq \R^n$ be a set. Then $S$ is convex (as a set) if and only if $\mathbb 1_S$ is convex (as a function).

The proposition above associates the convexity of a function with that of its epigraph; and the convexity of a set with that of its indicator.

Corollary: Jensen’s inequality. Let $f: \R^n \mapsto \R_+$. Then $f$ is convex if and only if $f(\sum_{i=1}^m \alpha_i x_i) \le \sum_{i=1}^m \alpha_i f(x_i)$ for any $m \in \N^+$ and any $x_1, \dots, x_m \in \R^n$ and any $\alpha_1, \dots, \alpha_m \ge 0$ such that $\sum_{i=1}^m \alpha_i = 1$.

Proof:

Sufficiency

Sufficiency is easy to show by interpreting the definition of convex function.

Necessity

For $i=1,\dots,m$, we have \[ \big( x_i, f(x_i) \big) \in \epi f \] Since $\epi f$ is convex, \[ \sum_{i=1}^m \alpha_i \big( x_i, f(x_i) \big) = \big( \sum_{i=1}^m \alpha_i x_i, \sum_{i=1}^m \alpha_i f(x_i) \big) \in \epi f \] which means \[ f(\sum_{i=1}^m \alpha_i x_i) \le \sum_{i=1}^m \alpha_i f(x_i) \]

Convexity-preserving Operations

Non-negative combination

Let $f_1, \dots, f_m$ be convex functions, $\alpha_1, \dots, \alpha_m \ge 0$ be non-negative scalars. Then, \[ f \triangleq \sum_{i=1}^m \alpha_i f_i \text{ is convex.} \]
Pointwise supremum

Let $I$ be an index set (either finite or infinite) and $\{ f_i: i \in I \}$ be a collection of convex functions. Then \[ f \triangleq \sup_{i \in I} f_i \text{ is convex.} \] To show it, let $x_1, x_2 \in \R^n$ and $\alpha \in [0, 1]$. Then, \[ \begin{aligned} &f(\alpha x_1 + (1-\alpha) x_2) = \sup_{i \in I} f_i(\alpha x_1 + (1-\alpha) x_2) \\ &\le \sup_{i \in I} \big( \alpha f_i(x_1) + (1-\alpha) f_i(x_2) \big) \\ &\le [\alpha \sup_{i \in I} f_i(x_1)] + [(1-\alpha) \sup_{i \in I} f_i(x_2)] \\ &= \alpha f(x_1) + (1-\alpha) f(x_2) \end{aligned} \]

Geometrically, pointwise supremum is intersecting the epigraphs of $f_i$’s. > Example: Consider the mapping $f: \R^{m \times n} \supseteq X \to ||X||$ where $||X||$ is the largest singular value of $X$. Show that $f$ is convex. > > By the Courant-Fischer theorem, > \[ > \begin{aligned} > ||X|| &= \max_{u \in \R^m, v \in \R^n} u^T X v \\ > \text{s.t.} &\quad ||u||_2 = 1, ||v||_2 = 1 > \end{aligned} > \] > Let $f_{u,v}(X) = u^T X v$ and $I = \{ (u,v) \in \R^m \times \R^n: ||u||_2 = 1, ||v||_2 = 1 \}$. Then, > \[ > f(X) = \max_{(u,v) \in I} f_{u,v}(X) > \] > Note that $f_{u,v}(X)$ is linear and thus convex in $X$. It follows directly from the pointwise supremum principle that $f$ is convex.
Composition with increasing function

Let $g: \R^n \mapsto \R$ be convex, $h: \R \mapsto \R$ be convex. $h \circ g$ is not generally convex. To verify it, take \[ h(x) = -x, g(x) = x^2 \] But if $h$ is convex as well as increasing, then $h \circ g$ is convex.
Restriction on lines

Given a point $x_0 \in \R^n$ and a direction $h \in \R^n \setminus \{ 0 \}$, we call the set \[ \{ x_0 + t h: t \in \R \} \] a line through $x_0$ in the direction $h$. Let $f: \R^n \mapsto \R$ be a function. Define

\[ \tilde f_{x_0, h}(t) \triangleq f(x_0 + t h) \] as the restriction of $f$ on the line $\{ x_0 + t h: t \in \R \}$.

Then, $f$ is convex if and only if $\tilde f_{x_0, h}$ is convex for any $x_0 \in \R^n$ and any $h \in \R^n \setminus \{ 0 \}$.

Differentiable Convex Function

Theorem: Let $f: \R^n \mapsto \R$ be differentiable, i.e. $\nabla f$ exists. Then, $f$ is convex if and only if for every $x, y \in \R^n$, \[ f(x) \ge f(y) + (\nabla f(y))^T (x - y) \tag{gradient inequality} \]

For a fixed $y$, the right-hand side of the gradient inequality is in essence an affine function of $x$. The graph of this affine function is \[ t(x) = f(y) + (\nabla f(y))^T (x - y) \\ \Updownarrow \\ \underbrace{[(\nabla f(y))^T, -1]}_{s^T} \underbrace{\begin{bmatrix} x \\ t \end{bmatrix}}_{z} - \underbrace{[(\nabla f(y))^T y - f(y)]}_{c} = 0 \]

Interestingly, the normal vector $s$ always points downwards (due to the $-1$).

Theorem: Let $f$ be twice continuously differentiable, i.e. $\nabla f, \nabla^2 f$ exist and $\nabla^2 f$ is continuous. Then, $f$ is convex if and only if \[ \nabla^2 f(x) \succcurlyeq 0, \forall x \in \R^n \]

Example: Let $f: \mathcal{S}_{++}^n \mapsto \R$ be given by $f(X) = -\ln \det (X)$ where $\mathcal{S}_{++}^n$ is the set of $n \times n$ positive definite matrix. Show that $f$ is convex.

This problem usually emerges when calculating the capacity of channel in information theory. To show it, we construct $f$’s restriction on all the lines. Let $X_0 \in \mathcal{S}_{++}^n$ and $H \in \mathcal{S}^n \setminus \{ 0 \}$. Define \[ g(t) \triangleq \tilde f(X_0 + t H) = -\ln \det(X_0 + t H) \] Since $X_0 \in \mathcal{S}_{++}^n$, we can write $X_0 = X_0^{1/2} X_0^{1/2}$ where $X_0^{1/2} \in \mathcal{S}_{++}^n$. Let $X_0^{-1/2} = (X_0^{1/2})^{-1}$. Then, \[ \begin{align*} g(t) &= -\ln \det(X_0^{1/2} (I + t X_0^{-1/2} H X_0^{-1/2}) X_0^{1/2}) \\ &= -2 \ln \det(X_0^{1/2}) - \ln \det(I + t X_0^{-1/2} H X_0^{-1/2}) \end{align*} \] Note that $X_0^{-1/2} H X_0^{-1/2}$ is real symmetric. Let $\lambda_1, \dots, \lambda_n$ be its eigenvalues. Then $(I + t X_0^{-1/2} H X_0^{-1/2})$’s eigenvalues are $t \lambda_1 + 1, \dots, t \lambda_n + 1$. \[ \begin{align*} g(t) &= -2 \ln \det(X_0^{1/2}) - \ln \det(I + t X_0^{-1/2} H X_0^{-1/2}) \\ &= -2 \ln \det(X_0^{1/2}) - \sum_i \ln(t\lambda_i + 1) \end{align*} \] Now it is easy to verify that $-\ln(t\lambda_i + 1)$’s are convex, which concludes the proof.

Differentiability-agnostic Convex Function

“Differentiability-agnostic” means that no premise on the differentiability of the convex function is assumed. In this section, we discuss several “differentiability-agnostic” topics.

Subgradient

Definition: Let $f: \R^n \mapsto \R$ be a function. A vector $s \in \R^n$ is called a subgradient of $f$ at $x_0$ if \[ f(x) \ge f(x_0) + s^T (x - x_0), \forall x \in \R^n \] The set \[ \partial f(x_0) \triangleq \{ s \in \R^n: \text{$s$ is a subgradient of $f$ at $x_0$} \} \] is called the sub-differential of $f$ at $x_0$. Sub-differential is always convex and closed.

The idea of subgradient is quite inspired from the gradient inequality. The geometric interpretation of subgradient resembles that of gradient.

Theorem: Let $f: \R^n \mapsto \R$ be convex. Then, $\partial f(x)$ is non-empty, convex and compact (non-emptiness is due to the continuity (which in turn is due to the openness of $\R^n$) and boundedness is due to convexity). Moreover,

$f$ is differentiable at $x_0$ if and only if $\partial f(x_0) = \{ \nabla f(x_0) \}$;

$x_0$ is the global minima if and only if $0 \in \partial f(x_0)$;

The topological properties that the sub-differential possesses imply that we can do projection on the sub-differential; besides we may apply Weierstrass theorem to solve some optimization problem w.r.t. the sub-differential (such as what is the minimal-norm subgradient).

Conjugate Function

If two functions have the identical epigraphs, then these two functions are identical. That’s the geometric view of a function.

Let $f: \R^n \mapsto \R$ be convex. $\epi f$ is known to be convex. Suppose in addition that $\epi f$ is non-empty and closed. Recall that a non-empty, closed and convex set can be written as the intersection of lower halfspaces that contain it: \[ \epi f = \bigcap_{H^-([s^T, y_0]^T,c) \supseteq \epi f} H^-([s^T, y_0]^T, c) \]

where $s \in \R^n$ and $y_0 \in \R$. We argue that $y_0 \le 0$ because the last entry of some $z \in \epi f$ can be arbitrarily large. We further argue that $y_0 < 0$ or else $f(x) = -\infty$ everywhere. Without loss of generality let $y_0 = -1$, since $H^-([s^T, y_0]^T,c) = H^-([-s^T/y_0, -1]^T,c)$.

Notice that this lower half-space $H^-([s^T, -1]^T, c)$ is exactly the epigraph of the affine function \[ h_{s, c}(x) = s^T x - c \] because for any $(x, y) \in \R^n \times \R$ that belongs to $H^-([s^T, -1]^T, c)$, we have \[ \begin{aligned} s^T x - y &\le c \\ y &\ge s^T x - c \end{aligned} \] On the other hand, $\epi f \subseteq H^-([s^T, -1]^T, c) = \epi{h_{s, c}}$ indicates that $\forall x \in \R^n, f(x) \ge h_{s, c}(x)$. Consequently, $\epi f$ can be re-written as \[ \begin{aligned} \epi f &= \bigcap_{H^-([s^T, y_0]^T,c) \supseteq \epi f} H^-([s^T, -1]^T,c) \\ &= \bigcap_{\epi{h_{s, c}} \supseteq \epi f} \epi{h_{s, c}} \\ &= \bigcap_{f \ge h_{s, c}} \epi{h_{s, c}} \end{aligned} \] Therefore, $f$ can be written as the pointwise supremum of the affine functions who are less than $f$: \[ f = \sup_{h \le f} h \] The normal vectors of those affine functions that are less than $f$ and that pass through $(x_0, f(x_0))$ essentially forms the sub-differential of $f$ at $x_0$.

Consider the set \[ \begin{aligned} S_f &= \{ (y, t) \in \R^n \times \R: y^T x - t \le f(x), \forall x \in \R^n \} \\ &= \{ (y, t) \in \R^n \times \R: \sup_{x}(y^T x - f(x)) \le t \} \\ &\triangleq \epi{f^*} \end{aligned} \] where \[ f^*(y) \triangleq \sup_{x}(y^T x - f(x)) \] $f^*(y)$ is called the conjugate function of $f$.

Last updated on Jan 7, 2022