mirror of
https://github.com/fverdugo/XM_40017.git
synced 2025-11-24 09:24:32 +01:00
More work in notebooks
This commit is contained in:
@@ -27,9 +27,9 @@
|
||||
"\n",
|
||||
"In this notebook, we will learn\n",
|
||||
"\n",
|
||||
"- How to paralleize a Jacobi method\n",
|
||||
"- How to paralleize the Jacobi method\n",
|
||||
"- How the data partition can impact the performance of a distributed algorithm\n",
|
||||
"- How to use latency hiding\n",
|
||||
"- How to use latency hiding to improve parallel performance\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
@@ -93,9 +93,12 @@
|
||||
"id": "93e84ff8",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"When solving a Laplace equation in 1D, the Jacobi method leads to the following iterative scheme: The entry $i$ of vector $u$ at iteration $t+1$ is computed as:\n",
|
||||
"When solving a [Laplace equation](https://en.wikipedia.org/wiki/Laplace%27s_equation) in 1D, the Jacobi method leads to the following iterative scheme: The entry $i$ of vector $u$ at iteration $t+1$ is computed as:\n",
|
||||
"\n",
|
||||
"$u^{t+1}_i = \\dfrac{u^t_{i-1}+u^t_{i+1}}{2}$"
|
||||
"$u^{t+1}_i = \\dfrac{u^t_{i-1}+u^t_{i+1}}{2}$\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"This iterative is yet simple but shares fundamental challenges with many other algorithms used in scientific computing. This is why we are studying it here.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -130,6 +133,14 @@
|
||||
"end"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "432bd862",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"If you run it for zero iterations, we will see the initial condition."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
@@ -140,14 +151,78 @@
|
||||
"jacobi(5,0)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c75cb9a6",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"If you run it for enough iterations, you will see the expected solution of the Laplace equation: values that vary linearly form -1 to 1."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "b52be374",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"jacobi(5,100)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "22fda724",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In our version of the jacobi method, we return after a given number of iterations. Other stopping criteria are possible. For instance, iterate until the difference between u and u_new is below a tolerance:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "15de7bf5",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"using LinearAlgebra: norm\n",
|
||||
"function jacobi_with_tol(n,tol)\n",
|
||||
" u = zeros(n+2)\n",
|
||||
" u[1] = -1\n",
|
||||
" u[end] = 1\n",
|
||||
" u_new = copy(u)\n",
|
||||
" increment = similar(u)\n",
|
||||
" while true\n",
|
||||
" for i in 2:(n+1)\n",
|
||||
" u_new[i] = 0.5*(u[i-1]+u[i+1])\n",
|
||||
" end\n",
|
||||
" increment .= u_new .- u\n",
|
||||
" if norm(increment)/norm(u_new) < tol\n",
|
||||
" return u_new\n",
|
||||
" end\n",
|
||||
" u, u_new = u_new, u\n",
|
||||
" end\n",
|
||||
" u\n",
|
||||
"end"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "697ad307",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"n = 5\n",
|
||||
"tol = 1e-9\n",
|
||||
"jacobi_with_tol(n,tol)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6e085701",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<div class=\"alert alert-block alert-info\">\n",
|
||||
"<b>Note:</b> In our version of the jacobi method, we return after a given number of iterations. Other stopping criteria are possible. For instance, iterate until the difference between u and u_new is below a tolerance.\n",
|
||||
"</div>"
|
||||
"However, we are not going to parallelize this more complex in this notebook (we will consider it later in this course)."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -156,7 +231,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"### Where can we exploit parallelism?\n",
|
||||
"## Where can we exploit parallelism?\n",
|
||||
"\n",
|
||||
"Look at the two nested loops in the sequential implementation:\n",
|
||||
"\n",
|
||||
@@ -169,8 +244,8 @@
|
||||
"end\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"- The outer loop cannot be parallelized. The value of `u` at step `t+1` depends on the value at the previous step `t`.\n",
|
||||
"- The inner loop can be parallelized.\n",
|
||||
"- The outer loop over `t` cannot be parallelized. The value of `u` at step `t+1` depends on the value at the previous step `t`.\n",
|
||||
"- The inner loop is trivially parallel. The loop iterations are independent (any order is possible).\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
@@ -386,19 +461,11 @@
|
||||
"source": [
|
||||
"### Communication overhead\n",
|
||||
"- We update $N/P$ entries in each process at each iteration, where $N$ is the total length of the vector and $P$ the number of processes\n",
|
||||
"- Thus, computation complexity is $O(N/P)$\n",
|
||||
"- We need to get remote entries from 2 neighbors (2 messages per iteration)\n",
|
||||
"- We need to communicate 1 entry per message\n",
|
||||
"- Communication/computation ration is $O(P/N)$ (potentially scalable if $P<<N$)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f6b54b7b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## 1D Implementation\n",
|
||||
"\n",
|
||||
"We consider the implementation using MPI. The programming model of MPI is generally better suited for data-parallel algorithms like this one than the task-based model provided by Distributed.jl. In any case, one can also implement it using Distributed, but it requires some extra effort to setup the remote channels right for the communication between neighbor processes."
|
||||
"- Thus, communication complexity is $O(1)$\n",
|
||||
"- Communication/computation ration is $O(P/N)$, making the algorithm potentially scalable if $P<<N$.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -457,8 +524,10 @@
|
||||
"id": "8ed4129c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### MPI Code\n",
|
||||
"## MPI implementation\n",
|
||||
"\n",
|
||||
"We consider the implementation using MPI. The programming model of MPI is generally better suited for data-parallel algorithms like this one than the task-based model provided by Distributed.jl. In any case, one can also implement it using Distributed.jl, but it requires some extra effort to setup the remote channels right for the communication between neighbor processes.\n",
|
||||
" \n",
|
||||
"Take a look at the implementation below and try to understand it.\n"
|
||||
]
|
||||
},
|
||||
@@ -749,7 +818,7 @@
|
||||
"```\n",
|
||||
"\n",
|
||||
"- The outer loop cannot be parallelized (like in the 1d case). \n",
|
||||
"- The two inner loops can be parallelized\n"
|
||||
"- The two inner loops are trivially parallel\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
Reference in New Issue
Block a user