Numerical Linear Algebra: QR, Hessenberg, and Schur

Numerical Linear Algebra: QR, Hessenberg, and Schur

This post follows Golub and Van Loan, introducing Householder relections and Givens rotations, then using these tools to sketch out implementations of QR, Hessenberg, and Schur decompositions.


import numpy as np
np.set_printoptions( precision=3, linewidth = 132, suppress = True )

First, create some test data:


# Set dimensions and create identity matrices
m,n = (8,6)
I_m = np.identity(m)
I_n = np.identity(n)

# Seed the RNG.
rng = np.random.default_rng(111)

# A: a "tall" m x n matrix
data = rng.negative_binomial(10, 0.25, size=m*n)    
A0 = np.array(data, dtype=np.float64).reshape((m,n))

# B: a square n x n matrix
data = rng.negative_binomial(10, 0.25, size=n*n)
B0 = np.array(data, dtype=np.float64).reshape((n,n))

QR factorization

house computes a Householder matrix. This takes a vector and returns a matrix that zeros out the trailing portion.


def house(x):
  """
  Golub and Van Loan, 3rd Ed.  
  Algorithm 5.1.1
  """
  n = len(x)
  sigma = x[1:] @ x[1:]
  v = np.concatenate([[1], x[1:]])

  if sigma == 0:
    beta = 0
  else:
    mu = np.sqrt(x[0]**2 + sigma)
    if x[0] <= 0:
      v[0] = x[0] - mu
    else:
      v[0] = -sigma / (x[0] + mu)
    beta = 2 * v[0]**2 / (sigma + v[0]**2)
    v = v/v[0]

  return(v, beta)

When we apply the above to the first column of A, we obtain a vector that has zeros in all but the first component.


# A householder matrix to zero out elements of the first column of A

# Let `x0` be the first column of A
x0 = A0[:,0].copy()

# Compute the householder matrix
v, beta = house(x0)
P0 = (I_m - beta * np.outer(v,v))

# Its effect: zeroes!
P0 @ x0
array([102.113,   0.   ,   0.   ,   0.   ,  -0.   ,  -0.   ,  -0.   ,   0.   ])

A second application can be applied, iteratively, to the next column of A. In the context of a QR factorization, it need only concern itself with zeros below the diagonal.


P1 = np.identity(m)

# Let `x1` be the next column of A. 
x1 = A0[:,1].copy()

# A Householder matrix to introduce zeros below the diagonal in the second column.  At the
# second iteration, we'll be working with P0 @ x:
v1, b1 = house((P0 @ x1)[1:m])
P1[1:m, 1:m] = np.identity(m-1) - b1 * np.outer(v1, v1)

# P1 @ P0 @ A0
P1 @ P0 @ A0
array([[102.113,  75.671,  65.506,  74.163,  47.115,  94.631],
       [  0.   ,  65.642,  41.91 ,  42.107,  28.85 ,  30.943],
       [ -0.   ,   0.   ,   3.837,   8.791,   2.362,   2.244],
       [  0.   ,  -0.   , -14.007, -21.777, -11.129,  -9.395],
       [ -0.   ,   0.   ,  44.901,  -5.048,  10.737,  -7.839],
       [  0.   ,   0.   , -10.154,   8.878,  12.918, -16.552],
       [ -0.   ,  -0.   , -15.09 ,   5.039,  -6.707,  -8.635],
       [  0.   ,   0.   ,  20.092,  10.613,   3.509,  17.952]])

With more efficient bookkeeping, the above application of Household matrices takes the form:


"""
Golub and Van Loan, 3rd Ed.  
Algorithm 5.2.1: Householder QR
"""

# The "R" in a "QR" factorization
A = A0.copy()
R = A
V = np.zeros_like(A)
beta = np.zeros(n)

for j in range(n):
  x = A[j:m,j]
  v,beta[j] = house(x) 

  R[j:m, j:n] = R[j:m, j:n] - beta[j] * np.outer(v, v @ A[j:m, j:n])

  if j < m:
    R[(j+1):m,j] = 0
    V[(j+1):m,j] = v[1:(m-j+1)]

# The "Q" in "QR" factorization
Q = np.identity(m)
for j in reversed(range(n)):
  v = V[j:m,j].copy()
  v[0] = 1
  Q[j:m,j:m] = Q[j:m,j:m] - beta[j] * np.outer(v, v @ Q[j:m,j:m])

# Q is unitary
np.allclose(I_m, Q.T @ Q)
True

# R is upper diagonal
R
array([[102.113,  75.671,  65.506,  74.163,  47.115,  94.631],
       [  0.   ,  65.642,  41.91 ,  42.107,  28.85 ,  30.943],
       [  0.   ,   0.   ,  54.419,   2.924,  12.635,   8.219],
       [  0.   ,   0.   ,   0.   ,  28.023,  10.352,   8.562],
       [  0.   ,   0.   ,   0.   ,   0.   ,  14.215, -18.268],
       [  0.   ,   0.   ,   0.   ,   0.   ,   0.   ,  18.734],
       [  0.   ,   0.   ,   0.   ,   0.   ,   0.   ,   0.   ],
       [  0.   ,   0.   ,   0.   ,   0.   ,   0.   ,   0.   ]])

# Q @ R = A
np.allclose(Q @ R, A0)
True

Hessenberg Reduction

A Hessenberg reduction is a transformation that produces a matrix that is almost triangular: it has non-zero entries on the first sub-diagonal. This is obtained by applying a sequence of Householder matrices to both sides of the original, square matrix.

$$ U^{\top} B U = H $$

where $U$ is unitary, and $H$ is upper Hessenberg.


"""
Golub and Van Loan, 3rd Ed.  
Algorithm 7.4.2: Householder Reduction to Hessenberg Form
"""

H = B0.copy()

V = np.zeros_like(H)
beta = np.zeros(n-2)

for k in range(n-2):
  x = H[(k+1):n,k].copy()
  v, beta[k] = house(x)

  V[(k+1):n,k] = v

  H[(k+1):n,k:n] = H[(k+1):n,k:n] - beta[k] * np.outer(v, v @ H[(k+1):n,k:n])
  H[0:n,(k+1):n] = H[0:n,(k+1):n] - beta[k] * np.outer( H[0:n,(k+1):n] @ v, v)

# The "U" in U' @ B0 @ U = H
U = np.identity(n)
for j in reversed(range(n-2)):
  v = V[j:n,j]
  U[j:n,j:n] = U[j:n,j:n] - beta[j] * np.outer(v, v @ U[j:n,j:n])

H
array([[ 57.   ,  62.449,  17.459,  15.444,  15.648,   3.3  ],
       [ 77.006, 140.441,  17.814,  23.75 ,  -3.11 , -12.1  ],
       [  0.   ,  38.265,  12.419,   2.275,  12.307,   9.983],
       [  0.   ,   0.   ,  18.178,   3.63 , -22.4  , -15.5  ],
       [  0.   ,  -0.   ,  -0.   ,  11.187, -12.616,  -2.537],
       [  0.   ,   0.   ,  -0.   ,  -0.   ,  -0.744,   4.125]])

# U' @ B0 @ U = H 
np.allclose(U.T @ B0 @ U, H)
True

# U' @ U = I
np.allclose(U.T @ U, np.identity(n))
True

Real Schur Decomposition

A real Schur decomposition goes further, producing an eigenvalue revealing structure. It can be computed using Givens rotations, which also produce zeros in their target matrices.

The givens function:


def givens(a,b):
  """
  Golub and Van Loan, 3rd Ed.  
  Algorithm 5.1.3
  """
  if b == 0:
    c = 1
    s = 0
  else:
    if np.abs(b) > np.abs(a):
      tau = -a/b
      s = 1 / np.sqrt(1 + tau**2)
      c = s * tau
    else:
      tau = -b/a
      c = 1/np.sqrt(1 + tau**2)
      s = c*tau

  return (c,s)

The algorithm is iterative, applying the following step until convergence. This starts with the upper Hessenberg matrix. Each iteration has the effect of computing a QR factorization and replacing its input with an RQ product.


"""
Golub and Van Loan, 3rd Ed.  
Algorithm 7.4.1
"""

def qrstep(H):
  G = np.zeros((n-1,2,2))
  for k in range(n-1):
    c,s = givens(H[k,k], H[k+1,k])
    G[k] = np.array([[c,s], [-s,c]])
    H[k:(k+2), k:n] = G[k].T @ H[k:(k+2),k:n]
  
  for k in range(n-1):
    H[0:(k+2), k:(k+2)] = H[0:(k+2), k:(k+2)] @ G[k]

  return H

Assert that the above does what we expect:


# Set the starting point and apply one step
H = U.T @ B0 @ U
qrstep(H)
array([[177.594,  26.677,  27.902,   7.974,   7.042,   7.971],
       [ 40.804,  25.203,  -6.765,   2.318,  16.891,  -0.56 ],
       [ -0.   ,  15.919,   9.842, -11.048,  -8.942,   5.965],
       [  0.   ,   0.   ,   9.737, -15.545,   7.986,   5.693],
       [ -0.   ,  -0.   ,  -0.   , -25.281,   3.223,  19.307],
       [ -0.   ,  -0.   ,  -0.   ,  -0.   ,  -0.136,   4.684]])

# The goal after one step: check
H = U.T @ B0 @ U
Q, R = np.linalg.qr(H)
R @ Q
array([[177.594,  26.677,  27.902,   7.974,  -7.042,   7.971],
       [ 40.804,  25.203,  -6.765,   2.318, -16.891,  -0.56 ],
       [  0.   ,  15.919,   9.842, -11.048,   8.942,   5.965],
       [  0.   ,   0.   ,   9.737, -15.545,  -7.986,   5.693],
       [ -0.   ,  -0.   ,   0.   ,  25.281,   3.223, -19.307],
       [  0.   ,   0.   ,   0.   ,   0.   ,   0.136,   4.684]])

We iterate the qrstep to convergence and obtain a upper quasi-triangular, eigenvalue-revealing matrix. This is the real Schur decomposition.

Note that four hundred steps is a lot. However, this is the most naive form of the QR algorithm. Golub and Van Loan describe a shifted QR iteration that converges far more quickly.


# Apply 400 steps to obtain a Schur decomposition.
S = U.T @ B0 @ U
for i in range(400):
  S = qrstep(S)
S
array([[185.029,  -4.625, -20.198,   6.178,  18.849,   7.617],
       [ -0.   ,  15.024, -20.639,   7.524,  -0.262,  -2.735],
       [ -0.   ,   9.783,  16.429,   7.656,   7.622,   0.623],
       [  0.   ,   0.   ,  -0.   , -16.901,  14.95 ,   8.47 ],
       [ -0.   ,  -0.   ,  -0.   , -25.157,   0.797,  18.896],
       [ -0.   ,  -0.   ,  -0.   ,  -0.   ,   0.   ,   4.623]])

The eigenvalues can be obtained from the one-by-one and two-by-two blocks in the matrix above.


# E.g., the eigenvalues from the second diagonal block:
np.linalg.eigvals(S[1:3, 1:3])
array([15.726+14.192j, 15.726-14.192j])

# All eigenvalues; these include the ones from the second diagonal block, as expected
np.linalg.eigvals(S)
array([185.029 +0.j   ,  -8.052+17.257j,  -8.052-17.257j,  15.726+14.192j,  15.726-14.192j,   4.623 +0.j   ])

Finally, we'll show numerically that S and B0 have the same eigenvalues. Again, this should be true given that they S and B0 are similar to each other.


# similar matrices have the same eigenvalues: compare the spectra
eb = np.linalg.eigvals(B0)
es = np.linalg.eigvals(S)

import matplotlib as mpl
import matplotlib.pyplot as plt

colors = [c['color'] for c in mpl.rcParams['axes.prop_cycle']]

fig = plt.figure(figsize = (8,8))
ax = fig.gca()
ax.plot(np.real(eb), np.imag(eb), ls="none", marker="o", ms = 8, color = colors[0])
ax.plot(np.real(es), np.imag(es), ls="none", marker="o", ms = 16, color = "none", markeredgecolor = colors[1])
ax.set_xlabel("real")
ax.set_ylabel("imaginary")
ax.set_title("spectra of 'S' and 'B0'")
ax;
No description has been provided for this image