Theorem.
QR factorization
If $\boldsymbol{A}$ is an $m\times{n}$ matrix with linearly independent column vectors, then $\boldsymbol{A}$ can be factorized as:
$$\boldsymbol{A}=
\boldsymbol{QR}$$
Where:
Proof. Let $\boldsymbol{A}$ be an $m\times{n}$ matrix with $n$ linearly independent column vectors and let $\boldsymbol{Q}$ be an $m\times{n}$ orthogonal matrixlink whose column vectors are constructed by applying the Gram-Schmidt process on the columns of $\boldsymbol{A}$. These matrices are shown below:
$$\boldsymbol{A}=
\begin{pmatrix}
\vert&\vert&\cdots&\vert\\
\boldsymbol{x}_1&\boldsymbol{x}_2&\cdots&\boldsymbol{x}_n\\
\vert&\vert&\cdots&\vert\\
\end{pmatrix},\;\;\;\;\;\;
\boldsymbol{Q}=
\begin{pmatrix}
\vert&\vert&\cdots&\vert\\
\boldsymbol{q}_1&\boldsymbol{q}_2&\cdots&\boldsymbol{q}_n\\
\vert&\vert&\cdots&\vert\\
\end{pmatrix}$$
Note that for the Gram-Schmidt process to work, the column vectors of $\boldsymbol{A}$ must be linearly independent.
The set $\{\boldsymbol{q}_1,\boldsymbol{q}_2,
\cdots,\boldsymbol{q}_n\}$ is an orthonormal basislink for $\mathbb{R}^n$. By theoremlink, each column vector of $\boldsymbol{A}$ can be expressed using this orthonormal basis like so:
$$\begin{align*}
\boldsymbol{x}_1
&=(\boldsymbol{x}_1\cdot\boldsymbol{q}_1)\boldsymbol{q}_1
+(\boldsymbol{x}_1\cdot\boldsymbol{q}_2)\boldsymbol{q}_2
+\cdots+(\boldsymbol{x}_1\cdot\boldsymbol{q}_n)\boldsymbol{q}_n\\
\boldsymbol{x}_2
&=(\boldsymbol{x}_2\cdot\boldsymbol{q}_1)\boldsymbol{q}_1
+(\boldsymbol{x}_2\cdot\boldsymbol{q}_2)\boldsymbol{q}_2
+\cdots+(\boldsymbol{x}_2\cdot\boldsymbol{q}_n)\boldsymbol{q}_n\\
&\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\vdots\\
\boldsymbol{x}_n&=(\boldsymbol{x}_n\cdot\boldsymbol{q}_1)\boldsymbol{q}_1
+(\boldsymbol{x}_n\cdot\boldsymbol{q}_2)\boldsymbol{q}_2
+\cdots+(\boldsymbol{x}_n\cdot\boldsymbol{q}_n)\boldsymbol{q}_n
\end{align*}$$
By theoremlink, this can be written as:
$$\begin{equation}\label{eq:LOwUNEaa0XTeKlhE1sn}
\begin{pmatrix}
\vert&\vert&\cdots&\vert\\
\boldsymbol{x}_1&\boldsymbol{x}_2&\cdots&\boldsymbol{x}_n\\
\vert&\vert&\cdots&\vert\\
\end{pmatrix}=
\begin{pmatrix}
\vert&\vert&\cdots&\vert\\
\boldsymbol{q}_1&\boldsymbol{q}_2&\cdots&\boldsymbol{q}_n\\
\vert&\vert&\cdots&\vert\\
\end{pmatrix}
\begin{pmatrix}
\boldsymbol{x}_1\cdot\boldsymbol{q}_1
&\boldsymbol{x}_2\cdot\boldsymbol{q}_1
&\cdots&\boldsymbol{x}_n\cdot\boldsymbol{q}_1\\
\boldsymbol{x}_1\cdot\boldsymbol{q}_2&
\boldsymbol{x}_2\cdot\boldsymbol{q}_2
&\cdots&\boldsymbol{x}_n\cdot\boldsymbol{q}_2\\
\vdots&\vdots&\smash\ddots&\vdots\\
\boldsymbol{x}_1\cdot\boldsymbol{q}_n
&\boldsymbol{x}_2\cdot\boldsymbol{q}_n
&\cdots&\boldsymbol{x}_n\cdot\boldsymbol{q}_n
\end{pmatrix}
\end{equation}$$
By the propertylink of the Gram-Schmidt process, we have that:
$$\begin{align*}
&\boldsymbol{x_1}
\;\text{ is orthogonal to }\;
\boldsymbol{\boldsymbol{q}_2},\boldsymbol{\boldsymbol{q}_3},
\boldsymbol{\boldsymbol{q}_4},\cdots\boldsymbol{\boldsymbol{q}_n}\\
&\boldsymbol{x_2}
\;\text{ is orthogonal to }\;
\boldsymbol{\boldsymbol{q}_3},\boldsymbol{\boldsymbol{q}_4},\cdots,
\boldsymbol{\boldsymbol{q}_n}\\
&\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\vdots\\
&\boldsymbol{x_{n-1}}
\;\text{ is orthogonal to }\;
\boldsymbol{\boldsymbol{q}_n}
\end{align*}$$
Therefore, \eqref{eq:LOwUNEaa0XTeKlhE1sn} becomes:
$$\begin{equation}\label{eq:YO3WKNRMjx6l4ScZ6pw}
\begin{pmatrix}
\vert&\vert&\cdots&\vert\\
\boldsymbol{x}_1&\boldsymbol{x}_2&\cdots&\boldsymbol{x}_n\\
\vert&\vert&\cdots&\vert\\
\end{pmatrix}=
\begin{pmatrix}
\vert&\vert&\cdots&\vert\\
\boldsymbol{q}_1&\boldsymbol{q}_2&\cdots&\boldsymbol{q}_n\\
\vert&\vert&\cdots&\vert\\
\end{pmatrix}
\begin{pmatrix}
\boldsymbol{x}_1\cdot\boldsymbol{q}_1
&\boldsymbol{x}_2\cdot\boldsymbol{q}_1
&\cdots&\boldsymbol{x}_n\cdot\boldsymbol{q}_1\\
0&
\boldsymbol{x}_2\cdot\boldsymbol{q}_2
&\cdots&\boldsymbol{x}_n\cdot\boldsymbol{q}_2\\
\vdots&\vdots&\smash\ddots&\vdots\\
0
&0&\cdots&\boldsymbol{x}_n\cdot\boldsymbol{q}_n
\end{pmatrix}
\end{equation}$$
Notice how the right-hand side is now a product of an orthogonal matrix $\boldsymbol{Q}$ and an upper triangular matrix which we denote as $\boldsymbol{R}$. Let's rewrite \eqref{eq:YO3WKNRMjx6l4ScZ6pw} in short-hand:
$$\boldsymbol{A}=
\boldsymbol{QR}$$
Next, by theoremlink, we have that $\boldsymbol{x}_i
\cdot
\boldsymbol{q}_i
=\Vert\boldsymbol{v}_i\Vert$ for $i=1,2,\cdots,n$ where $\boldsymbol{v}_i$ is an orthogonal basis vector. Therefore, \eqref{eq:YO3WKNRMjx6l4ScZ6pw} can be written as:
$$\begin{pmatrix}
\vert&\vert&\cdots&\vert\\
\boldsymbol{x}_1&\boldsymbol{x}_2&\cdots&\boldsymbol{x}_n\\
\vert&\vert&\cdots&\vert\\
\end{pmatrix}=
\begin{pmatrix}
\vert&\vert&\cdots&\vert\\
\boldsymbol{q}_1&\boldsymbol{q}_2&\cdots&\boldsymbol{q}_n\\
\vert&\vert&\cdots&\vert\\
\end{pmatrix}
\begin{pmatrix}
\Vert\boldsymbol{v}_1\Vert
&\boldsymbol{x}_2\cdot\boldsymbol{q}_1
&\cdots&\boldsymbol{x}_n\cdot\boldsymbol{q}_1\\
0&
\Vert\boldsymbol{v}_2\Vert
&\cdots&\boldsymbol{x}_n\cdot\boldsymbol{q}_2\\
\vdots&\vdots&\smash\ddots&\vdots\\
0&0&\cdots&\Vert\boldsymbol{v}_n\Vert
\end{pmatrix}$$
Note the following:
This means that $\Vert\boldsymbol{v}_i\Vert\gt0$ for $i=1,2,\cdots,n$. By theoremlink, a triangular matrix with non-zero diagonal entries is invertible. This completes the proof.
■
Theorem.
Invertible matrices can be QR-factorized
If $\boldsymbol{A}$ is an invertible matrixlink, then $\boldsymbol{A}$ can be $\boldsymbol{QR}$-factorized.
Proof. By theoremlink, if $\boldsymbol{A}$ is an invertible matrix, then the columns of $\boldsymbol{A}$ are linearly independent. By theoremlink, $\boldsymbol{A}$ can be $\boldsymbol{QR}$-factorized. This completes the proof.
■
Example.
Perform $\boldsymbol{QR}$ factorization on the following matrix:
$$\begin{pmatrix}
3&2\\
4&6
\end{pmatrix}$$
Proof. The first step is to obtain an orthogonal matrix using the Gram-Schmidt process:
$$\boldsymbol{Q}=
\begin{pmatrix}
3&2\\4&6
\end{pmatrix}$$
We start as follows:
$$\boldsymbol{v}_1=
\begin{pmatrix}
3\\4
\end{pmatrix}$$
Next, we find $\boldsymbol{v}_2$ that is orthogonal to $\boldsymbol{v}_1$ like so:
$$\begin{align*}
\boldsymbol{v}_2&=
\begin{pmatrix}2\\6\end{pmatrix}-
\frac{(2)(3)+(6)(4)}{3^2+4^2}
\begin{pmatrix}3\\4\end{pmatrix}\\
&=\begin{pmatrix}2\\6\end{pmatrix}-
\frac{30}{25}
\begin{pmatrix}3\\4\end{pmatrix}\\
&=\begin{pmatrix}2\\6\end{pmatrix}-
\begin{pmatrix}18/5\\24/5\end{pmatrix}\\
&=\begin{pmatrix}-8/5\\6/5\end{pmatrix}
\end{align*}$$
Let's turn $\boldsymbol{v}_1$ and $\boldsymbol{v}_2$ into unit vectors $\boldsymbol{q}_1$ and $\boldsymbol{q}_2$ respectively. $\boldsymbol{q}_1$ is:
$$\begin{align*}
\boldsymbol{q}_1&=
\frac{1}{\sqrt{3^2+4^2}}
\begin{pmatrix}3\\4\end{pmatrix}\\
&=\frac{1}{5}
\begin{pmatrix}3\\4\end{pmatrix}\\
&=\begin{pmatrix}3/5\\4/5\end{pmatrix}
\end{align*}$$
Next, $\boldsymbol{q}_2$ is:
$$\begin{align*}
\boldsymbol{q}_2&=
\frac{1}{\sqrt{(-8/5)^2+(6/5)^2}}
\begin{pmatrix}-8/5\\6/5\end{pmatrix}\\
&=\frac{1}{2}
\begin{pmatrix}-8/5\\6/5\end{pmatrix}\\
&=\begin{pmatrix}-4/5\\3/5\end{pmatrix}
\end{align*}$$
Therefore, the associated orthogonal matrix of $\boldsymbol{A}$ is:
$$\boldsymbol{Q}=
\begin{pmatrix}3/5&-4/5\\4/5&3/5\end{pmatrix}$$
Next, the upper triangular matrix $\boldsymbol{R}$ is:
$$\begin{align*}
\boldsymbol{R}&=\begin{pmatrix}
\boldsymbol{x}_1\cdot\boldsymbol{q}_1&
\boldsymbol{x}_2\cdot\boldsymbol{q}_1\\
0&\boldsymbol{x}_2\cdot\boldsymbol{q}_2
\end{pmatrix}\\&=
\begin{pmatrix}
(3)(3/5)+(4)(4/5)&
(2)(3/5)+(6)(4/5)
\\0&(2)(-4/5)+(6)(3/5)
\end{pmatrix}\\&=
\begin{pmatrix}5&6\\0&2\end{pmatrix}
\end{align*}$$
Therefore, the $\boldsymbol{QR}$ factorization of $\boldsymbol{A}$ is:
$$\boldsymbol{A}=
\begin{pmatrix}3/5&-4/5\\4/5&3/5\end{pmatrix}
\begin{pmatrix}5&6\\0&2\end{pmatrix}$$
■
Example.
Perform $\boldsymbol{QR}$ factorization on the following matrix:
$$\boldsymbol{A}=\begin{pmatrix}
0&1\\1&0\\0&1
\end{pmatrix}
$$
Solution. We first must obtain the associated orthogonal matrix $\boldsymbol{Q}$ of $\boldsymbol{A}$ using the Gram-Schmidt process. Let $\boldsymbol{q}_1$ be:
$$\boldsymbol{q}_1=
\begin{pmatrix}
0\\1\\0
\end{pmatrix}$$
The second vector $\boldsymbol{v}_2$ that is orthogonal to $\boldsymbol{q}_1$ is:
$$\begin{align*}
\boldsymbol{v}_2&=
\begin{pmatrix}1\\0\\1\end{pmatrix}
-\frac{(1)(0)+(0)(1)+(1)(0)}{(1)^2}
\begin{pmatrix}0\\1\\0\end{pmatrix}
=\begin{pmatrix}1\\0\\1\end{pmatrix}
\end{align*}$$
Let's convert $\boldsymbol{v}_2$ into an unit vector:
$$\boldsymbol{q}_2=
\frac{1}{\sqrt{1^2+1^2}}
\begin{pmatrix}1\\0\\1\end{pmatrix}=
\begin{pmatrix}1/\sqrt2\\0\\1/\sqrt2\end{pmatrix}$$
Therefore, the associated orthogonal matrix is:
$$\boldsymbol{Q}=
\begin{pmatrix}
0&1/\sqrt2\\1&0\\0&1/\sqrt2
\end{pmatrix}$$
The upper triangular matrix $\boldsymbol{R}$ is:
$$\begin{align*}
\boldsymbol{R}&=\begin{pmatrix}
\boldsymbol{x}_1\cdot\boldsymbol{q}_1&
\boldsymbol{x}_2\cdot\boldsymbol{q}_1\\
0&\boldsymbol{x}_2\cdot\boldsymbol{q}_2
\end{pmatrix}\\&=
\begin{pmatrix}
(0)(0)+(1)(1)+(0)(0)&(1)(0)+(0)(1)+(1)(0)\\
0&(1)(1/\sqrt2)+(0)(0)+(1)(1/\sqrt2)
\end{pmatrix}\\
&=\begin{pmatrix}1&0\\0&\sqrt2\end{pmatrix}
\end{align*}$$
Therefore, the $\boldsymbol{QR}$ factorization of $\boldsymbol{A}$ is:
$$\begin{pmatrix}
0&1\\1&0\\0&1
\end{pmatrix}=
\begin{pmatrix}
0&1/\sqrt2\\1&0\\0&1/\sqrt2
\end{pmatrix}
\begin{pmatrix}1&0\\0&\sqrt2\end{pmatrix}
$$
■
Theorem.
Uniqueness of QR factorization
The $\boldsymbol{QR}$ factorization of a matrix is unique. In other words, there can be at most one $\boldsymbol{QR}$ form of a matrix.
Proof. Suppose $\boldsymbol{A}$ can be $\boldsymbol{QR}$-factorized into $\boldsymbol{A}=\boldsymbol{Q}_1\boldsymbol{R}_1$ and $\boldsymbol{A}=
\boldsymbol{Q}_2
\boldsymbol{R}_2$. Equating the two equations gives:
$$\begin{equation}\label{eq:ueU7wc2mRd2QCZaVdNM}
\begin{aligned}[b]
\boldsymbol{Q}_1\boldsymbol{R}_1&=
\boldsymbol{Q}_2\boldsymbol{R}_2\\
\boldsymbol{Q}_1\boldsymbol{R}_1\boldsymbol{R}_2^{-1}&=
\boldsymbol{Q}_2\\
\boldsymbol{R}_1\boldsymbol{R}_2^{-1}&=
\boldsymbol{Q}_1^{-1}\boldsymbol{Q}_2\\
\end{aligned}
\end{equation}$$
By theoremlink, because $\boldsymbol{R}_2$ is an upper triangular matrix, its inverse $\boldsymbol{R}^{-1}_2$ is also an upper triangular matrix. By theoremlink, $\boldsymbol{R}_1\boldsymbol{R}^{-1}_2$ is upper triangular because the product of two upper triangular matrices is also upper triangular.
Next, by theoremlink, the inverse of an orthogonal matrix is also orthogonal, which means $\boldsymbol{Q}^{-1}_1$ is orthogonal. By theoremlink, $\boldsymbol{Q}^{-1}_1\boldsymbol{Q}_2$ is also orthogonal because the product of two orthogonal matrices is orthogonal.
Therefore, the left-hand side of \eqref{eq:ueU7wc2mRd2QCZaVdNM} is upper triangular while the right-hand side is orthogonal. By theoremlink, upper triangular orthogonal matrices are diagonal matrices with diagonal entries $\pm1$. However, because the diagonal entries of $\boldsymbol{R}_1$ and $\boldsymbol{R}_2^{-1}$ are strictly positive, the diagonal entries of $\boldsymbol{R}_1\boldsymbol{R}^{-1}_2$ are also strictly positive by theoremlink. This means that:
$$\begin{equation}\label{eq:V8pFSzlUhOsQTF5m6aG}
\boldsymbol{R}_1\boldsymbol{R}_2^{-1}=
\boldsymbol{Q}_1^{-1}\boldsymbol{Q}_2=\boldsymbol{I}_n
\end{equation}$$
Where $\boldsymbol{I}_n$ is the $n\times{n}$ identity matrix. Now, \eqref{eq:V8pFSzlUhOsQTF5m6aG} implies:
$$\begin{align*}
\boldsymbol{R}_1\boldsymbol{R}_2^{-1}
&=\boldsymbol{I}_n\\
\boldsymbol{R}_1
&=\boldsymbol{R}_2
\end{align*}$$
Also, \eqref{eq:V8pFSzlUhOsQTF5m6aG} implies:
$$\begin{align*}
\boldsymbol{Q}_1^{-1}\boldsymbol{Q}_2
&=\boldsymbol{I}_n\\
\boldsymbol{Q}_2
&=\boldsymbol{Q}_1\\
\end{align*}$$
Because $\boldsymbol{R}_1=\boldsymbol{R}_2$ and $\boldsymbol{Q}_1=\boldsymbol{Q}_2$, we conclude that $\boldsymbol{QR}$-factorization is unique. This completes the proof.
■
Theorem.
Using QR factorization to obtain the least squares solution
Suppose $\boldsymbol{A}$ can be $\boldsymbol{QR}$-factorized. For any $\boldsymbol{b}\in\mathbb{R}^m$, the system $\boldsymbol{Ax}=\boldsymbol{b}$ has a unique least squares solution given by:
$$\boldsymbol{x}=
\boldsymbol{R}^{-1}
\boldsymbol{Q}^T\boldsymbol{b}$$
Proof. Recalllink that the least squares solution of the system $\boldsymbol{Ax}=\boldsymbol{b}$ is given by:
$$\boldsymbol{x}
=(\boldsymbol{A}^T\boldsymbol{A})^{-1}
\boldsymbol{A}^T\boldsymbol{b}$$
Now, we substitute $\boldsymbol{A}=\boldsymbol{QR}$ to get:
$$\begin{align*}
\boldsymbol{x}
&=\big((\boldsymbol{QR})^T(\boldsymbol{QR})\big)
^{-1}(\boldsymbol{QR})^T\boldsymbol{b}\\
&=\big(\boldsymbol{R}^T\boldsymbol{Q}^T\boldsymbol{QR}\big)^{-1}\boldsymbol{R}^T\boldsymbol{Q}^T\boldsymbol{b}\\
&=\big(\boldsymbol{R}^T\boldsymbol{Q}^{-1}\boldsymbol{QR}\big)^{-1}\boldsymbol{R}^T\boldsymbol{Q}^T\boldsymbol{b}\\
&=\big(\boldsymbol{R}^T\boldsymbol{R}\big)^{-1}\boldsymbol{R}^T\boldsymbol{Q}^T\boldsymbol{b}\\
&=\boldsymbol{R}^{-1}
(\boldsymbol{R}^{T})^{-1}\boldsymbol{R}^T\boldsymbol{Q}^T\boldsymbol{b}\\
&=\boldsymbol{R}^{-1}\boldsymbol{Q}^T\boldsymbol{b}
\end{align*}$$
Here, we used theoremlink and theoremlink. This completes the proof.
■
Theorem.
Solving the least squares problem using QR factorization
Let's revisit examplelink in which we found the least squares solution to the system $\boldsymbol{Ax}=\boldsymbol{b}$ where:
$$\boldsymbol{A}=\begin{pmatrix}
0&1\\1&0\\0&1
\end{pmatrix},\;\;\;\;\;
\boldsymbol{b}=
\begin{pmatrix}
1\\2\\2
\end{pmatrix}
$$
Find the least squares solution using $\boldsymbol{QR}$ factorization.
Proof. We found the least squares solution by solving:
$$\begin{equation}\label{eq:YwRlrMsBKoQdhcvtdix}
\boldsymbol{x}
=(\boldsymbol{A}^T\boldsymbol{A})^{-1}\boldsymbol{A}^T\boldsymbol{b}
\end{equation}$$
The least squares solution was:
$$\begin{equation}\label{eq:tfLTHmD6bgmLeH74dOt}
\boldsymbol{x}=\begin{pmatrix}2\\1.5\end{pmatrix}
\end{equation}$$
Let's arrive at the same solution using $\boldsymbol{QR}$ factorization. As found in examplelink, the $\boldsymbol{QR}$ factorization of $\boldsymbol{A}$ is:
$$\begin{pmatrix}
0&1\\1&0\\0&1
\end{pmatrix}=
\begin{pmatrix}
0&1/\sqrt2\\1&0\\0&1/\sqrt2
\end{pmatrix}
\begin{pmatrix}1&0\\0&\sqrt2\end{pmatrix}
$$
In this case, $\boldsymbol{R}$ is a diagonal matrix. By theoremlink, the inverse of $\boldsymbol{R}$ is another diagonal matrix whose diagonal entries are the reciprocals:
$$\boldsymbol{R^{-1}}=
\begin{pmatrix}1&0\\0&1/\sqrt2\end{pmatrix}$$
By theoremlink, the least squares solution is:
$$\begin{align*}
\boldsymbol{x}&=
\boldsymbol{R}^{-1}
\boldsymbol{Q}^T\boldsymbol{b}\\
&=\begin{pmatrix}1&0\\0&1/\sqrt2\end{pmatrix}
\begin{pmatrix}0&1&0\\1/\sqrt2&0&1/\sqrt2\end{pmatrix}
\begin{pmatrix}1\\2\\2\end{pmatrix}\\
&=\begin{pmatrix}
2\\1.5
\end{pmatrix}
\end{align*}$$
Great, this aligns with \eqref{eq:tfLTHmD6bgmLeH74dOt}.
■
Benefits of using QR factorization
We won't go into the technical details here but it turns out that solving certain problems such as the least squares problem via $\boldsymbol{QR}$ factorization offers higher numerical accuracy. In contrast, the standard approach of computing \eqref{eq:YwRlrMsBKoQdhcvtdix} using computer programs is that the result may be inaccurate due to rounding-off errors. In addition, an iterative algorithm called the $\boldsymbol{QR}$ algorithm, which is based on $\boldsymbol{QR}$ factorization, is also widely used to find eigenvalues of larger matrices efficiently.