Numerical Linear Algebra

FLOP A FLOP is a “floating-point operations”. One addition, subtraction, multiplication, or division of two floating-point numbers.
Basic Linear Algebra Subroutines BLAS Level 1 Vector-Vector operations
inner product x^{T}y which is 2n-1 flops sum x+y and scale \alpha x which is n flops BLAS Level 2 Matrix-vector product
y = Ax with A \in \mathbb{R}^{m \times n}
standard multiply m\left(2n - 1\right) 2N if A is sparse with N non-zero elements 2p\left(n+m\right) if A is given as A = UV^{T}, where U \in \mathbb{R}^{m \times p} and V \in \mathbb{R}^{n \times p}, which is called a “banded” matrix (sparse everywhere except for a strip nexht to diagonal) BLAS Level 3 Matrix-matrix operations
for A \in \mathbb{R}^{m\times n} and B \in \mathbb{R}^{n \times p}
mp\left(2n-1\right) flops (on the order of 2 mnp) less if A and or B is false \frac{1}{2}m\left(m+1\right)\left(2n-1\right) \approx m^{2}n if m = p and AB is symmetric BLAS Speeds CPU single thread does at around 1-10 GFlops/s CPU multi-tread 10-100 GFlops/s GPU typically 100 GFlops/s - 1 TFloaps/s GPUs uses “blocking” (divide and concur)
Solving Linear Equations solution of Ax = b is x = A^{-1}b
no one actually materializes A^{-1} for general methods, O\left(n^{3}\right) for A with half-bandwidth k, we have O\left(k^{2}n\right) For special structures, we can have:
special matricies lower-triangular — forward substitution upper-triangular — do it backwards orthogonal matrices — 2n^{2} flops to compute with householder transformation factorization Factor A as a product of simple matricies A = A_1 … A_{k}. And then we can compute:

\begin{equation} x = A^{-1}b = A_{k}^{-1} \dots A_{1}^{-1} b \end{equation}

Cost of solver is dominated by this factoring instead of actually the solving. We can solve a modest number of equations with the same coefficinets at the same cost as one (because you are just factoring most of the time, you can just cache the factored results).
LU Factorization Every nonsingular matrix A can be factored as:

\begin{equation} A = PLU \end{equation}

where P is a permutation, L is lower triangular, U is upper triangular. Solving procedure:
LU factorization permutation (shuffle X around backwards) forward substitution backward substitution Sparse LU Factorization For A sparse and invertible, we can factor it into four matricies A = P_1 L U P_2.
Doing these row and column permutations allows L and U to be sparse. They are then chosen heuristically to yield sparse L, U.
Cholesky Factorization Suppose A is positive definite.
Factor A into A = LL^{T}. L is lower-triangular with positive diagonal entries.
Sparse Cholesky Factorization for sparse positive definite A, factor as A = PLL^{T}P^{T} adding permutation matrix P makes sparser L This is same as:
permuting rows and columns of A to get \tilde{A} = P^{T}AP and then Cholesky Interestingly we can figure out P just with the sparsity pattern of A, with no value.
LDL-transpose Factorization Every nonsingular symmetric matrix A can be factored as:

\begin{equation} A = PLDL^{T} P^{T} \end{equation}

with P permutaiton matrix, L lower triangular, D block diagonal with 1 \times 1 or 2 \times 2 diagonal blocks.
Structured Sub-Blocks For:

\begin{equation} \mqty(A_{11} & A_{12} \\ A_{21} &A_{22}) \mqty(x_1 \\ x_2) = \mqty(b_1 \\ b_2) \end{equation}

We can first solve:
Form A_{11}^{-1}A_{12} and A_{11}^{-1}b_1 (factor A_{11}, solve with each column of A_{12} and b) Form Shure’s complement: S = A_{22} - A_{21}A^{-1}_{11} A_{12} and \tilde{b} = b_2 - A_{21} A_{11}^{-1}b_1 Determine x_2 by solving S x_2 = \tilde{b} Determine x_1 by solving A_{11}x_1 = b_1 - A_{12} x_2 Matrix Inversion Lemma \begin{equation} \left(A+BC\right)^{-1} = A^{-1} - A^{-1}B\left(I + CA^{-1}B\right)^{-1}CA^{-1} \end{equation}