In this section, we see some issues that can arise when computers use the Gauss-Jordan Elimination algorithm, and we learn some of what computers do to compute solutions more efficiently, especially when we want to solve equations \(AX = B\) for several different column matrices \(B\text{.}\)
Computers are very fast at performing computations, but sometimes our brains are still better. For instance, we know that \(0.1 + 0.2 = 0.3\text{,}\) but execute the code below to see what Python says.
Running the code above shows that when using Python, the computer does not have the same answer for 0.1 + 0.2 and 0.3. Computers perform arithmetic using base 2 numbers, which means that numbers we enter in decimal form, such as \(0.1\text{,}\) must be converted to base 2. Even though 0.1 has a simple decimal (base 10) form, its representation in base 2 is the repeating decimal
To accurately represent this number inside a computer would require infinitely many digits, but a computer can only hold a finite number of digits. Thus, computers are necessarily using an approximation when internally storing most numbers.
Because computers use a finite number of digits to approximate numbers, they are also prone to round off errors. Gauss-Jordan elimination, when applied to an \(n\times n\) matrix, requires approximately \(\frac 23 n^3\) operations of multiplying and adding numbers. The examples we have seen are small enough to do by hand, but applications in computer graphics, machine learning, and many others can easily have hundreds or thousands of variables. If we have a \(1000\times1000\) matrix, performing the Gauss-Jordan algorithm would take roughly a billion operations, and any errors introduced in an operation early on could accumulate and compound as we use that erroneous number in the next operation, and the next, and the next.
There are a few things computers do to mitigate the issue of numerical errors. You may have noticed that when we perform Gauss-Jordan elimination by hand, we primarily use only two of the three row operations: scaling and multiply-and-add. We swap rows only during the forward steps and only if there was a \(0\) in a position we needed to use to eliminate nonzero numbers still appearing in the same column.
However, when computers perform the forward steps, they swap rows so that they are using the number with the largest absolute value to eliminate the other entries in the same column. This is called partial pivoting, and it helps to reduce the number and magnitude of round off errors.
Another thing computers do to mitigate operational errors is rewrite the coefficient matrix as a product of matrices which are easier to work with, not unlike in algebra when we solve a quadratic equation by factoring it. We are going to use triangular matrices in this section, but there are many other ways to factor, or decompose, a matrix and each factorization is useful in various different applications and objectives.
A matrix which is both lower triangular and upper triangular matrix must have 0βs in the entries below and above the main diagonal. It can have any number, zero or nonzero, in the other positions.
Matrices that are both upper and lower triangular have the form
A matrix which is both lower triangular and upper triangular matrix must have 0βs in the entries below and above the main diagonal. It can have any number, zero or nonzero, in the other positions.
Matrices that are both upper and lower triangular have the form
There is a nonzero entry above the main diagonal, in the first row and third column, and also below the main diagonal in the second row and first column. This matrix is not triangular.
Recall ExampleΒ 2.5.1 and ExampleΒ 2.5.4 where we used elementary matrices to row reduce a matrix. In LU-decomposition, we are going to stop after the forward steps and not continue to full reduced row echelon form.
We know we could do this by augmenting and row reducing, or by calculating the inverse of the coefficient matrix. Letβs see yet another way to solve this system.
Take the coefficient matrix \(A=\begin{bmatrix} 1 \amp 2 \\ 3 \amp 5 \end{bmatrix}\) and use an elementary matrix to perform the first step in Gauss-Jordan elimination, which is \(-3R_1+R_2 \rightarrow R_2\text{.}\) Then \(E_1=\begin{bmatrix} 1 \amp 0 \\ -3 \amp 1 \end{bmatrix}\) and \(E_1A\) is
Notice that \(E_1\) is lower-triangular and \(E_1A\) is upper-triangular. Define \(U=E_1A=\begin{bmatrix} 1 \amp 2 \\ 0 \amp -1 \end{bmatrix}\text{.}\)
In our work in SectionΒ 2.5 as a whole and in ActivityΒ 2.5.5 in particular, we saw that every elementary matrix is invertible. In fact, \(E_1^{-1}=\begin{bmatrix} 1 \amp 0 \\ 3 \amp 1 \end{bmatrix}\text{,}\) which is also lower-triangular. Define \(L=E_1^{-1}\text{.}\)
How has this helped us? A linear system with a triangular coefficient matrix does not need to be row reduced any further. It is more efficient to solve such a system with substitution, coming full-circle to what we did back in SectionΒ 1.1 before we ever learned about elimination and row operations.
We are trying to solve the linear system \(AX=B\text{,}\) and we have seen how to rewrite \(A\) as \(LU\text{,}\) so we are now trying to solve \(LUX=B\text{.}\) We know that \(U\) is a \(2\times 2\) matrix and \(X\) is a \(2\times 1\) matrix. That means that \(UX\) is also a \(2\times 1\) matrix. For the moment, letβs define that \(Y=UX=\begin{bmatrix} y_1 \\ y_2 \end{bmatrix}\text{,}\) so that the system weβre trying to solve is
To see how this is easier to solve, look at the first row of \(LY=B\text{.}\) We see that \(y_1+0y_2=-1\text{,}\) which means that \(y_1=-1\text{.}\) The equation from the second row is
Why would we want to do this when we already have multiple other ways to solve a linear system? For small systems, we wouldnβt. However, computers tend to use LU-decomposition when you ask them to solve matrix systems for the following reason. Suppose we had an \(1000\times 1000\) matrix \(A\) and needed to solve two systems \(AX=B_1\) and \(AX=B_2\text{.}\) Solving the first system would take about a billion operations, no matter whether we used Gauss-Jordan elimination or LU-decomposition. But if we used LU-decomposition and already had \(A=LU\text{,}\) then solving the second system would only take about a million more operations, instead of another billion to perform all of Gauss-Jordan elimination again on the second system. That is a significant time and memory savings, and also thereβs orders of magnitude fewer opportunities for numerical errors to propagate.
We could perform one more row operation to scale the third row and put a 1 in the last position, but this is already an upper triangular matrix so we can just stop here and then
If a square matrix doesnβt require any interchanging of rows to put it into reduced row echelon form, then
Use elementary matrices \(E_1, E_2, \ldots E_k\) to perform the forward steps of Gauss-Jordan elimination. This means youβll have something of the form \(E_k\cdots E_2E_1A=U\text{,}\) where \(U\) is upper triangular.
Elementary matrices which perform the forward steps of Gauss-Jordan elimination (scale or multiply-and-add) are all lower-triangular, and so are their inverses. The product of lower triangular matrices is still lower triangular.
To solve \(AX=B\text{,}\) we replace \(A\) with \(LU\) and then \(UX\) with \(Y\text{.}\) We can then solve the simpler systems \(LY=B\) and \(UX=Y\) with substitution.
The forward steps of Gauss-Jordan elimination are \(-R_1+R_3 \rightarrow R_3\text{,}\) then \(\frac{1}{2}R_2 \rightarrow R_2\text{,}\) then \(-2R_2+R_3 \rightarrow R_3\text{.}\)
---
Using elementary matrices to perform the row operations means that
\begin{align*}
L_1=\begin{bmatrix} a \amp 0 \amp 0 \\ b \amp c \amp 0 \\ d \amp e \amp f \end{bmatrix}, \amp \amp \amp \amp L_2=\begin{bmatrix} g \amp 0 \amp 0 \\ h \amp i \amp 0 \\ j \amp k \amp l \end{bmatrix} \text{.}
\end{align*}
Calculate the product \(L_1L_2\) (or enough of it) and explain why \(L_1L_2\) is lower triangular.
Recall we define \(L\) in terms of elementary matrices which are invertible, but all weβve said about \(U\) is that \(U\) is the result of applying some row operations.
A matrix obtained from the identity matrix by swapping rows some number of times is called a permutation matrix, and itβs always invertible (perform the same swaps in the reverse order). When computers solve \(AX=B\) they often perform a permutation \(P\) and find an LU decomposition for \(PA\) so that they are solving \(PAX=PB\) by solving \(LUX=PB\text{.}\)
Solve the system in ExerciseΒ 5 above again, this time by using Sage to calculate \(P,L\text{,}\) and \(U\text{.}\) Then by hand, calculate \(PB\) and use substitution to solve \(LY=PB\) and \(UX=Y\text{.}\) Verify that you get the same answer you got before.
Computers are fast and can store lots of information, but they are prone to numerical errors due to binary representations and the use of finitely many digits. We can mitigate the effect of numerical errors using partial pivoting and by reducing the number of operations needed where errors could spread.
Every square matrix which doesnβt need any row swap operations to perform Gauss-Jordan elimination can be expressed as the product of a lower triangular matrix with an upper triangular matrix.