Orthogonality and Projection

Orthogonality and Independence

If \(\{u_1, u_2, ..., u_k\}\) are orthogonal to each other, then they are independent with each other.

Orthonormality

An \(m \times n\) matrix \(U\) has orthonormal columns if and only if \(U^TU = I\).

An orthogonal matrix is a square invertible matrix \(U\) such that \(U^{-1} = U^T\). By its definition, it has orthonormal columns and orthonormal rows.

Projection

Projection onto Orthogonal Basis

Let \(\{u_1, u_2, ..., u_k\}\) be an orthogonal basis for a subspace \(W\) of \(\R^n\). Then for each \(y\) in \(W\), the weights in the linear combination \[ y = c_1u_1 + c_2u_2 + ... + c_ku_k \] are given by \[ c_i = \frac{y \cdot u_i}{u_i \cdot u_i} \label{coef} \]

Projection onto Vector

Given a nonzero vector \(u\) in \(\R^n\) and another vector \(y\) in \(\R^n\), we wish to decompose \(y\) such that \[ y = \hat y + z \] where \(\hat y = \alpha u\) for some scalar \(\alpha\) and \(z\) is some vector orthogonal to \(u\). \[ \begin{aligned} z &= y - \hat y \\ z \cdot u &= (y - \alpha u) \cdot u \\ y \cdot u - \alpha u \cdot u &= 0 \\ \alpha &= \frac{y \cdot u}{u \cdot u} \end{aligned} \] \(\hat y = \frac{y \cdot u}{u \cdot u}u\) is called the orthogonal projection of \(y\) onto \(u\), \(z = y - \hat y\) is called the component of \(y\) orthogonal to \(u\).

The projection of \(y\) onto \(cu\) for any scalar \(c\) is the same as that onto \(u\). Therefore \(\hat y\) is the projection onto the subspace \(L\) spanned by \(u\). In this sense, \(\hat y\) is also denoted as \(\Pi_L(y)\).

Projection onto Subspace

Let \(W\) be a linear subspace of \(\R^n\), then each \(y\) in \(\R^n\) can be uniquely written in the form: \[ y = \hat y + z \] where \(\hat y\) is in \(W\) and \(z\) is in \(W^\perp\). \(\hat y\) is called the orthogonal projection of \(y\) onto \(W\), denoted as \(\Pi_W(y)\).

\(\hat y\) is also called the best approximation to \(y\) in \(W\), in the sense that: \[ ||y - \hat y||_2 \le ||y - v||_2, \forall v \in W \] It can be shown by \[ y - v = (y - \hat y) + (\hat y - v) \] which gives \[ \begin{aligned} &||y - v||_2^2 = ||(y - \hat y) + (\hat y - v)||_2^2 \\ &= ||y - \hat y||_2^2 + ||\hat y - v||_2^2 + 2(y - \hat y)^T (\hat y - v) \\ &\Downarrow_{y - \hat y \in W^\perp, \hat y - v \in W} \\ &= ||y - \hat y||_2^2 + ||\hat y - v||_2^2 \\ &> ||y - \hat y||_2^2 \end{aligned} \]

Projection onto Column Space

When \(A \in \R^{n \times m}\) is a matrix, for any \(y \in \R^n\) we may still want to find its projection onto the column space (also a linear subspace) spanned by \(A\). As mentioned above, \(y\) can be decomposed into \(\hat y\) and \(z\) such that \(\hat y \in \Col A\) and \(z \in (\Col A)^\perp = \Nul A^T\). Therefore, \[ \begin{gather} y = \hat y + z \label{decomp} \\ A^T z = 0 \label{perp} \\ \exists x \in \R^n, A x = \hat y \label{proj} \end{gather} \] Substitute \(\hat y\) in \(\eqref{proj}\) to \(\eqref{decomp}\), and then substitute \(z\) in \(\eqref{decomp}\) to \(\eqref{perp}\) to give \[ \begin{aligned} A^T (y - A x) &= 0 \\ A^T A x &= A^T y \\ \end{aligned} \] Note that \(\rank {A^T A} = \rank A\). If either \(A\)’s columns are independent or \(A\) is of full rank, we can solve the above equation as \[ x = (A^T A)^{-1} A^T y \] Thus, \[ \hat y = A x = A (A^T A)^{-1} A^T y \] For any \(y \in \R^n\), its projection onto \(\Col A\) can be found by left-multiplying \(A\)’s projection matrix \(P \triangleq A (A^T A)^{-1} A^T\). And we can verify that \(z \in \Nul A^T\): \[ A^T z = A^T (y - \hat y) = A^T (y - A (A^T A)^{-1} A^T y) = A^T y - A^T y = 0 \] There are some interesting properties with this projection matrix \(P\):

\(P\) is symmetric;
\(P\) is idempotent in that \(P^2 = P\);
\(\Col P = \Col A\).

For every vector \(x \in \Col A\), by the definition of projection matrix, we have \[ P x = x \] which means \(x \in \Col P\) and thus \(\Col A \subseteq \Col P\).

For every vector \(y \in \Col P\), there exists a vector \(z\) such that \[ P z = y \] By the definition of projection matrix, we know that \((P z) \in \Col A\) and thus \(\Col P \subseteq \Col A\).

Therefore, \(\Col P = \Col A\). Interestingly and as a direct result, we have \[ A^T P = A^T \]

General Projection Matrix

The projection matrix derived above for a column space is actually an orthogonal projection matrix. The residual \(z\) of the original vector \(y\) after projection is perpendicular to \(\Col A\) and thus to \(\hat y\).

In a more general case, the residual is not necessarily perpendicular to \(\hat y\). Note that \(\hat y\) is also the closest point in \(\Col A\) to \(y\). This instead is the definitive property of a projection, suitable for any set of vectors like linear subspace or non-convex set.

We can similarly develop the notion of a general projection matrix. If an \(n \times n\) matrix \(P\) satisfies that \(P^2 = P\), then it is a projection matrix.

Note that we develop the concept of orthogonal projection matrix from the orthogonal projection onto a linear subspace. But for a general projection matrix, we only require that \(P^2 = P\). We don’t associate general projection matrix with general projection, because for a vector \(y\), \(P y\) may not be the projection of \(y\) onto the projection space \(\{ P x: x \in \R^n \}\).

Example: Let \[ P = \begin{pmatrix} 0 & 1 \\ 0 & 1 \end{pmatrix} \] \(P\) is a projection matrix and it transforms all 2-D vectors \(y = [y_1, y_2]^T\) to \(\hat y = [y_2, y_2]^T\). The projection space formed by \(P\) is \(\mathcal{P} = \{ x \in \R^2: x_1 = x_2 \}\). Clearly, neither \(\hat y\) is the closest point to \(y\) in \(\mathcal{P}\), nor the residual \(y - \hat y\) is perpendicular to \(y\).

There are some properties with a general projection matrix \(P\):

\(P\) is an orthogonal projection matrix if and only if it is symmetric;
\(P\) is an orthogonal projection matrix if and only if its singular values are either \(1\) or \(0\);

Refer to this or this for proof.
\(P\) is invertible only if it is the identity matrix.

Last updated on Dec 20, 2021