Contents¶
In this notebook, we will learn
- Key concepts for the solution of PDEs on large-scale computers
- What is a preconditioner
- What is a multi-grid solver
- Parallelization of conjugate gradient method
- Distributed sparse matrix-vector product
Mini project¶
- Simulating the temperature distribution in a closed room
Problem statement¶
- Given the temperature at the boundary of a room (walls, window, heater)
- Predict the temperature at any point of the room
- Compute it in parallel
Laplace equation¶
$\dfrac{\partial^2 u(x,y)}{\partial x^2} + \dfrac{\partial^2 u(x,y)}{\partial y^2} = 0$
Pierre-Simon, marquis de Laplace (1749-1827)
Picture from wikipedia
Boundary value problem (BVP)¶
$\dfrac{\partial^2 u}{\partial x^2} + \dfrac{\partial^2 u}{\partial y^2} = 0 \text{ in the room}$
$u = 5 \text{ on the window}$
$u = 40 \text{ on the heater}$
$u = 20 \text{ on the walls}$
Numerical methods for PDEs¶
- Finite difference method (FDM)
- Finite element method (FEM)
- Finite volume method (FVM)
- Boundary element method (BEM)
- Meshfree methods
Main idea: Transform a PDE into a system of algebraic equations.
Finite Difference method¶
- Pro: Easy to implement and computationally efficient
- Con: Difficult to handle complex geometries
- Goal: Compute the temperature at each grid point
- The (unknown) values can be stored in a computer using an array
u[i,j]is the temperature at point $(i,j)$
] add Plots IterativeSolvers Preconditioners Printf SparseArrays LinearAlgebra
N = 8
u = zeros(N,N)
function fill_boundary!(u)
u_window = 5.0
u_heater = 40.0
u_wall = 20.0
window_span = (0.6,0.9)
heater_span = (0.2,0.4)
nx,ny = size(u)
hx,hy = 1 ./ (nx-1,ny-1)
for j in 1:ny
y = hy*(j-1)
for i in 1:nx
x = hx*(i-1)
if j==ny && (window_span[1]<=x && x<=window_span[2])
u[i,j] = u_window
elseif i==1 && (heater_span[1]<=y && y<=heater_span[2])
u[i,j] = u_heater
elseif (j==1||j==ny) || (i==1||i==ny)
u[i,j] = u_wall
end
end
end
u
end
fill_boundary!(u)
Data Visualization¶
using Plots
function visualize(u;title="Temperature distribution")
xlabel="x-coordinate"
ylabel="y-coordinate"
aspectratio = :equal
plt = plot(;size= 1.5 .*(350,300),title,xlabel,ylabel,aspectratio,fontsize=100)
nx,ny = size(u)
hx,hy = (1,1) ./ (nx-1,ny-1)
x = hx .* ((1:nx) .- 1)
y = hy .* ((1:ny) .- 1)
heatmap!(x,y,transpose(u),color=:thermal)
plt
end
visualize(u)
How to find the temperature at the interior points?¶
- We have an unknown for each interior point
- We need an equation for each unknown
- How to define these equations?
Finite Difference stencil¶
$\dfrac{\partial^2 u}{\partial x^2} + \dfrac{\partial^2 u}{\partial y^2} = 0 $
$\downarrow$
$u_{i-1,j} + u_{i+1,j} + u_{i,j-1} + u_{i,j+1} - 4u_{i,j} = 0$
System of linear equations¶
- At each interior point we have an equation
- All these equations can be arranged in array form
$Ax=b$
using SparseArrays
function generate_system_sparse(u)
nx,ny = size(u)
stencil = [(-1,0),(1,0),(0,-1),(0,1)]
nnz_bound = 5*nx*ny
nrows = (nx-2)*(ny-2)
ncols = nrows
b = zeros(nrows)
I = zeros(Int,nnz_bound)
J = zeros(Int,nnz_bound)
V = zeros(nnz_bound)
inz = 0
for j in 2:(ny-1)
for i in 2:(nx-1)
row = i-1 + (ny-2)*(j-2)
inz += 1
I[inz] = row
J[inz] = row
V[inz] = 4.0
for (di,dj) in stencil
on_boundary = i+di in (1,nx) || j+dj in (1,ny)
if on_boundary
b[row] += u[i+di,j+dj]
continue
end
col = i+di-1 + (ny-2)*(j+dj-2)
inz += 1
I[inz] = row
J[inz] = col
V[inz] = -1.0
end
end
end
A = sparse(view(I,1:inz),view(J,1:inz),view(V,1:inz),nrows,ncols)
A,b
end
N = 5
u = zeros(N,N)
A, b = generate_system_sparse(u)
A
$u_{i-1,j} + u_{i+1,j} + u_{i,j-1} + u_{i,j+1} - 4u_{i,j} = 0$
Solution methods¶
Two possible solution methods have already been discussed in the course:
- Jacobi
- Gaussian elimination
Algorithmically scalable solver¶
- If the total cost scales at most linearly with respect to the total problem size
Answer: NO.
Problem size is R = O(N^2)
Gaussian Elimination: O(R^3) (more than linear)
Jacobi: O(N^2) = O(R) per iteration. Linear per iteration, but how many iterations?
Complexity of Jacobi method¶
using LinearAlgebra
function jacobi!(u,f=zeros(size(u));reltol=0.0,maxiters=0)
u_new = copy(u)
e = similar(u)
ni,nj = size(u)
for iter in 1:maxiters
for j in 2:(nj-1)
for i in 2:(ni-1)
u_new[i,j] =
0.25*(u[i-1,j] + u[i+1,j] + u[i,j-1] + u[i,j+1] + f[i,j])
end
end
e .= u_new .- u
relerror = norm(e)/norm(u_new)
if relerror < reltol
return u_new, iter
end
u, u_new = u_new, u
end
u, maxiters
end
N = 40
u = zeros(N,N)
fill_boundary!(u)
u, iter = jacobi!(u,reltol=1.0e-5,maxiters=1000000)
println("Jacobi converged in $iter iterations")
visualize(u)
Convergence analysis¶
plt = plot(xlabel="N^2",ylabel="Iterations");
Ns = [20,40,80,160]
reltol = 1.0e-6
iters = zeros(size(Ns))
for (i,N) in enumerate(Ns)
u = zeros(N,N)
fill_boundary!(u)
u, iter = jacobi!(u;reltol=reltol,maxiters=1000000)
iters[i] = iter
end
plot!(plt,Ns.^2,iters,xaxis=:log10,yaxis=:log10,label="reltol=$reltol (Jacobi)",marker=:auto)
plt
- The number of iterations to achieve a relative error of $10^{-s}$ increases as $O(s N^2)$
- Remember: The cost per iteration is $O(N^2)$
- Total cost is $O(N^4) = O((N^2)^2) $
Complexity of some solvers¶
Work to solve a Laplace equation on a regular mesh of $S$ points ($S=N^d$)
| Solver | 1D | 2D | 3D |
|---|---|---|---|
| Dense Cholesky | O(S^3) | O(S^3) | O(S^3) |
| Sparse Cholesky | O(S) | O(S^1.5) | O(S^2) |
| Conjugate gradient + Multi grid | O(S) | O(S) | O(S) |
Conjugate gradient method¶
- Idea: Transform the problem into an optimization problem
$A x = b$
equivalent to
$ x = \text{arg }\min_{y} f(y)$ with $ f(y)= \frac{1}{2}( y^\mathrm{T}Ay - y^\mathrm{T} b )$
We can use some sort of gradient descent to solve it
The Conjugate Gradient Method is a gradient descent algorithm optimized for symmetic ($A^\mathrm{T}=A$) positive-definite ($y^\mathrm{T}Ay > 0$) matrices
It is applicable to our problem
It is a type of Krylov subspace method
Top 10 algorithms of the 20th century¶
According to IEEE Computer Society
- Metropolis Algorithm for Monte Carlo
- Simplex Method for Linear Programming
- Krylov Subspace Iteration Methods
- The Decompositional Approach to Matrix Computations
- The Fortran Optimizing Compiler
- QR Algorithm for Computing Eigenvalues
- Quicksort Algorithm for Sorting
- Fast Fourier Transform
- Integer Relation Detection
- Fast Multipole Method
using IterativeSolvers: cg!
N = 80
u = zeros(N,N)
fill_boundary!(u)
A,b = generate_system_sparse(u)
x = zeros(length(b))
_,ch = cg!(x, A, b, reltol=1e-5,log=true)
display(ch)
u[2:end-1,2:end-1] = x
visualize(u)
Convergence analysis¶
#plt = plot(xlabel="N^2",ylabel="Iterations");
Ns = [20,40,80,160]
reltol = 1.0e-5
iters = zeros(size(Ns))
for (i,N) in enumerate(Ns)
u = zeros(N,N)
fill_boundary!(u)
A,b = generate_system_sparse(u)
x = zeros(length(b))
_,ch = cg!(x, A, b, reltol=reltol,log=true)
iters[i] = ch.iters
end
plot!(plt,Ns.^2,iters,xaxis=:log10,yaxis=:log10,label="reltol=$reltol (CG)",marker=:auto)
plt
Number of iterations¶
- The number of iterations to achieve a relative error of $10^{-s}$ increases as $O(s \sqrt{\kappa(A)})$
- $\kappa(A)=\dfrac{\lambda_{max}(A)}{\lambda_{min}(A)}$ is the condition number of A (ratio between the largest and smallest eigenvalues)
- In our example, $\kappa(A) = O(N^2)$
- Thus, the iterations number scales as $O(s N)$
Goal¶
- Find an iterative method whose number of iterations is independent of problem size
Preconditioner¶
A linear function $M$ such that
$M(b) \approx x$ with $Ax=b$ for any $b$
$\downarrow$
$ M(b) \approx A^{-1}b $
$\downarrow$
$ M \approx A^{-1}$
$\downarrow$
$ M A \approx I$ (Identity matrix)
$\downarrow$
$\kappa (MA) \approx 1$
$\downarrow$
Conjugate gradients will be fast when solving
$(MA)x = Mb \Longleftrightarrow Ax=b$
How to build a preconditioner ?¶
- $M=A^{-1}$ (exact preconditioner, but as costly as solving the original problem)
- $M=I$ (no extra work, but slow convergence)
- We need a trade-off (maths + computer science problem)
Jacobi Preconditioner¶
struct JacobiPrec{T}
u::Matrix{T}
f::Matrix{T}
niters::Int
end
function jacobi_prec(N;niters)
u = zeros(N,N)
f = zeros(N,N)
JacobiPrec(u,f,niters)
end
function LinearAlgebra.ldiv!(x,M::JacobiPrec,b)
M.u[2:end-1,2:end-1] .= 0
M.f[2:end-1,2:end-1] = b
u,_ = jacobi!(M.u,M.f,reltol=0,maxiters=M.niters)
x[:] = @view u[2:end-1,2:end-1]
x
end
N = 80
u = zeros(N,N)
fill_boundary!(u)
A,b = generate_system_sparse(u)
M = jacobi_prec(N,niters=10)
x = zeros(size(b))
ldiv!(x,M,b)
u[2:end-1,2:end-1] = x
visualize(u)
Ns = [20,40,80,160]
reltol = 1.0e-5
iters = zeros(size(Ns))
for (i,N) in enumerate(Ns)
u = zeros(N,N)
fill_boundary!(u)
A,b = generate_system_sparse(u)
M = jacobi_prec(N,niters=100)
x = zeros(length(b))
_,ch = cg!(x, A, b, Pl=M, reltol=reltol,log=true)
iters[i] = ch.iters
end
plot!(plt,Ns.^2,iters,xaxis=:log10,yaxis=:log10,label="reltol=$reltol (CG+Jacobi($(M.niters)))",marker=:auto)
plt
How can we improve the Jacobi method?¶
N = 10
u = zeros(N,N)
fill_boundary!(u)
A,b = generate_system_sparse(u)
M = jacobi_prec(N,niters=80)
x = zeros(size(b))
ldiv!(x,M,b)
u[2:end-1,2:end-1] = x
visualize(u)
Multi-grid method¶
function prolongate!(u_fine,u_coarse)
ni_coarse, nj_coarse = size(u_coarse)
ni_fine, nj_fine = size(u_fine)
@assert 2*(ni_coarse-1) == (ni_fine-1)
for j_fine in 1:nj_fine
j_coarse = div(j_fine-1,2)+1
j_rem = mod(j_fine-1,2)
for i_fine in 1:ni_fine
i_coarse = div(i_fine-1,2)+1
i_rem = mod(i_fine-1,2)
u_fine[i_fine,j_fine] = 0.25*(
u_coarse[i_coarse,j_coarse] +
u_coarse[i_coarse+i_rem,j_coarse] +
u_coarse[i_coarse,j_coarse+j_rem] +
u_coarse[i_coarse+i_rem,j_coarse+j_rem] )
end
end
end
iters_in_level = [2,2,2,2,2,2,2,2]
u_coarse = zeros(2,2)
anim = @animate for (level,iters) in enumerate(iters_in_level)
N = 2^level
u = zeros(1+N,1+N)
prolongate!(u,u_coarse)
fill_boundary!(u)
global u_coarse
u_coarse,_ = jacobi!(u,reltol=0,maxiters=iters)
visualize(u,title="Level $level")
end
gif(anim,"a2.gif",fps=1)
Multi-grid preconditioner¶
using Preconditioners
Ns = [20,40,80,160]
reltol = 1.0e-5
iters = zeros(size(Ns))
for (i,N) in enumerate(Ns)
u = zeros(N,N)
fill_boundary!(u)
A,b = generate_system_sparse(u)
M = AMGPreconditioner{SmoothedAggregation}(A)
x = zeros(length(b))
_,ch = cg!(x, A, b, Pl=M, reltol=reltol,log=true)
iters[i] = ch.iters
end
plot!(plt,Ns.^2,iters,xaxis=:log10,yaxis=:log10,label="reltol=$reltol (CG+MG)",marker=:auto)
plt
Number of iterations¶
- The number of iterations to achieve a relative error of $10^{-s}$ increases as $O(s)$
- The cost of each iteration is proportional to the number of grid points
Fig. from DOI: 10.1177/1094342015593158
Parallel implementation¶
Conjugate gradient method¶
function conjugate_gradient!(x,A,b;M,reltol,maxiters=size(A,1))
c = similar(x)
u = similar(x)
r = similar(x)
mul!(c,A,x)
r .= b .- c
norm_r0 = sqrt(dot(r,r))
for iter in 1:maxiters
ldiv!(c,M,r)
ρ_prev = ρ
ρ = dot(c,r)
β = ρ / ρ_prev
u .= c .+ β .* u
mul!(c, A, u)
α = ρ / dot(u,c)
x .= x .+ α .* u
r .= r .- α .* c
if sqrt(dot(r,r)) < reltol*norm_r0
break
end
end
x
end
The phases that are not trivially parallel are
- Dot products
- Sparse matrix-vector products
- Preconditioners
Dot product¶
$\text{dot}(a,b)= \sum_i a_i b_i$
MPI implementation¶
Sparse matrix-vector product¶
- Answer: Only the entries of x associated with the non-zero columns of A stored in the worker.
Ghost (halo) columns¶
Latency hiding¶
A = A_own + A_ghost
Mesh partition¶
- Answer: 2d block (as for Jacobi method)
Remember: The equation associated with point (i,j) is:
$u_{i-1,j} + u_{i+1,j} + u_{i,j-1} + u_{i,j+1} - 4u_{i,j} = 0$
How to partition unstructured meshes?¶
- FEM methods work on unstructured meshes
- One equation per node
- Non-zero columns are associated with nodes connected by mesh edges
k-way graph partitioning problem¶
Given a graph $G$ (i.e. the mesh)
- Partition the vertices of $G$ into k disjoint parts of equal size (load balance)
- Minimize the number of edges with end vertices belonging to different parts (reduce communication)