Improve explanations ASP and LEQ notebooks

This commit is contained in:
Gelieza K
2023-11-14 11:16:37 +01:00
parent 123053b9c7
commit 685d3db6a5
3 changed files with 130 additions and 81 deletions

View File

@@ -69,7 +69,7 @@
},
{
"cell_type": "markdown",
"id": "9e0f9545",
"id": "24b5c21a",
"metadata": {},
"source": [
"## The All Pairs of Shortest Paths (ASP) problem\n",
@@ -84,7 +84,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "ade31d26",
"metadata": {},
@@ -410,7 +409,7 @@
},
{
"cell_type": "markdown",
"id": "ebb1f4d7",
"id": "5f26f9b5",
"metadata": {},
"source": [
"### Parallelization strategy\n",
@@ -620,7 +619,7 @@
"id": "6993b9d0",
"metadata": {},
"source": [
"In summary, the send/computation ratio is $O(P^2/N)$ and the receive/computation ratio is $O(P/N)$. Therefore, algorithm is potentially scalable if $P<<N$."
"In summary, the send/computation ratio is $O(P^2/N)$ and the receive/computation ratio is $O(P/N)$. The algorithm is potentially scalable if $P<<N$."
]
},
{
@@ -675,7 +674,7 @@
"source": [
"### Code\n",
"\n",
"We split the code into two functions. The first function is called on the main process (the process running this notebook). It splits the input matrix into blocks of rows. Then, we call `floyd_worker!` (see below) to remotely compute Floyd's algorithm in each worker with its corresponding block of rows.\n"
"We split the code into two functions. The first function is called on the main process (the process running this notebook). It splits the input matrix into blocks of rows. Then, we use a remotecall to compute Floyd's algorithm in each worker with its corresponding block of rows.\n"
]
},
{
@@ -889,52 +888,10 @@
"source": [
"### Possible solutions\n",
"\n",
"- Use synchronous send MPI_SSEND (less efficient). Note that the blocking send MPI_SEND used above does not guarantee that the message was received. \n",
"- Barrier at the end of each iteration over $k$ (simple solution, but synchronization overhead)\n",
"- Order incoming messages (buffering and extra user code needed)\n",
"- Use a specific rank id instead of `MPI.ANY_SOURCE` or use `MPI.Bcast!` (one needs to know which are the rows owned by the other ranks)"
]
},
{
"cell_type": "markdown",
"id": "aab3fbfb",
"metadata": {},
"source": [
"### Exercise 1\n",
"Rewrite the function `floyd_worker!()` such that it runs correctly using `MPI.Bcast!`. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "aab1900f",
"metadata": {},
"outputs": [],
"source": [
"@everywhere function floyd_worker!(Cw,rows_w)\n",
" comm = MPI.Comm_dup(MPI.COMM_WORLD)\n",
" rank = MPI.Comm_rank(comm)\n",
" nranks = MPI.Comm_size(comm)\n",
" m,n = size(Cw)\n",
" C_k = similar(Cw,n)\n",
" # TODO: calculate order of source processor ranks\n",
" for k in 1:n\n",
" root_id = # TODO: calculate the rank of processor that owns k\n",
" if k in rows_w\n",
" myk = (k-first(rows_w))+1\n",
" C_k .= view(Cw,myk,:)\n",
" end\n",
" MPI.Bcast!(C_k, root=root_id, comm)\n",
" # TODO: enqueue C_k in list or array\n",
" # TODO: dequeue next C_k from array/list\n",
" for j in 1:n\n",
" for i in 1:m\n",
" @inbounds Cw[i,j] = min(Cw[i,j],Cw[i,k]+C_k[j])\n",
" end\n",
" end\n",
" end\n",
" Cw\n",
"end"
"1. **Synchronous sends**: Use synchronous send MPI_SSEND. This is less efficient because we spend time waiting until each message is received. Note that the blocking send MPI_SEND used above does not guarantee that the message was received. \n",
"2. **MPI.Barrier**: Use a barrier at the end of each iteration over $k$. This is easy to implement, but we get a synchronization overhead.\n",
"3. **Order incoming messages**: The receiver orders the incoming messages, e.g. according to MPI.Status or the sender rank. This requires buffering and extra user code.\n",
"4. **MPI.Bcast!**: Communicate row k using `MPI.Bcast!`. One needs to know which are the rows owned by the other ranks."
]
},
{