More changes in ASP and LEQ

This commit is contained in:
Francesc Verdugo 2024-09-19 10:37:09 +02:00
parent d7b9ff635a
commit 70f7004cc3
3 changed files with 3607 additions and 288 deletions

File diff suppressed because one or more lines are too long

View File

@ -647,7 +647,7 @@
"- On the receive side $O(N)/O(N^2/P) = O(P/N)$\n",
"\n",
"\n",
"In summary, the send/computation ratio is $O(P^2/N)$ and the receive/computation ratio is $O(P/N)$. The algorithm is potentially scalable if $P^2<<N$. Note that this is worse than for matrix-matrix multiplication, which is scalable for $P<<N$."
"In summary, the send/computation ratio is $O(P^2/N)$ and the receive/computation ratio is $O(P/N)$. The algorithm is potentially scalable if $P^2<<N$. Note that this is worse than for matrix-matrix multiplication, which is scalable for $P<<N$. I.e., you need a larger problem size in the current algorithm than in matrix-matrix multiplication."
]
},
{
@ -755,7 +755,7 @@
"### Distributing the input matrix\n",
"\n",
"Since only rank 0 receives the input matrix C, this rank needs to split it row-wise and\n",
"send the pieces to all other ranks. This is done in the function below. We start by communicating the problem size N to all ranks (at the start only rank 0 knows the problem size). This is trivially done with an `MPI.Bcast!`. Once all ranks know the problem size, they can allocate space for their local part of `C` , called `myC` in the code. After this, rank 0 sends the pieces to all other ranks. We do it here with `MPI.Send` and `MPI.Recv!`. This can also be done with `MPI.Scatter!`, but it is more challenging since we are using a row partition and Julia stores the matrices in column major order. Note that this algorithm can also be implemented using a column-wise matrix partition. In this case, using `MPI.Scatter!` would provably be the best option.\n"
"send the pieces to all other ranks. This is done in the function below. We start by communicating the problem size N to all ranks (at the start only rank 0 knows the problem size). This is trivially done with an `MPI.Bcast!`. Once all ranks know the problem size, they can allocate space for their local part of `C` , called `myC` in the code. After this, rank 0 sends the pieces to all other ranks. We do it here with `MPI.Send` and `MPI.Recv!`. This can also be done with `MPI.Scatter!`, but it is more challenging since we are using a row partition and Julia stores the matrices in column major order. Note that this algorithm can also be implemented using a column partition. In this case, using `MPI.Scatter!` would probably be the best option.\n"
]
},
{
@ -894,7 +894,7 @@
"source": [
"### Collecting back the results\n",
"\n",
"At this point, we have solved the ASP problem, but the solution is cut in different pieces, each one stored on a different MPI rank. It is often useful to gather the solution into a single matrix, e.g., to compare it against the sequential algorithm.The following function collects all pieces and stores them in $C$ on rank 0. Again, we implement this with `MPI.Send` and `MPI.Recv!` as it is easier as we are working with a row-partition. However, we could do it also with `MPI.Gather!`."
"At this point, we have solved the ASP problem, but the solution is cut in different pieces, each one stored on a different MPI rank. It is often useful to gather the solution into a single matrix, e.g., to compare it against the sequential algorithm.The following function collects all pieces and stores them in $C$ on rank 0. Again, we implement this with `MPI.Send` and `MPI.Recv!` as it is easier as we are working with a row partition. However, we could do it also with `MPI.Gather!`."
]
},
{
@ -1024,7 +1024,7 @@
"source": [
"## Is this implementation correct?\n",
"\n",
"In the cell above, the result of the parallel code was provably the same as for the sequential code. However, is this sufficient to assert that the code is correct? Unfortunately no. In fact, the parallel code we implemented is not correct. There is no guarantee that this code computes the correct result. This is why:\n",
"In the cell above, the result of the parallel code was probably the same as for the sequential code. However, is this sufficient to assert that the code is correct? Unfortunately, it is not. In fact, **the parallel code we implemented is not correct!** There is no guarantee that this code computes the correct result. This is why:\n",
"\n",
"In MPI, point-to-point messages are *non-overtaking* between a given sender and receiver. Say that process 1 sends several messages to process 3. All these will arrive in FIFO order. This is according to section 3.5 of the MPI standard 4.0.\n",
"Unfortunately, this is not enough in our case. The messages could arrive in the wrong order *from different senders*. If process 1 sends messages to process 3, and then process 2 sends other messages to process 3, it is not granted that process 3 will receive first the messages from process 1 and then from process 2 (see figure below)."
@ -1113,7 +1113,7 @@
"\n",
"### Exercise 1\n",
"\n",
"Modify the `floyd_iterations!` function so that it is guaranteed that the result is computed correctly. Use `MPI.Bcast!` to solve the synchronization problem. Note: only use `MPI.Bcast!`in `floyd_iterations!`, do not use other MPI directives. "
"Modify the `floyd_iterations!` function so that it is guaranteed that the result is computed correctly. Use `MPI.Bcast!` to solve the synchronization problem. Note: only use `MPI.Bcast!`in `floyd_iterations!`, do not use other MPI directives. You can assume that the number of rows is a multiple of the number of processes."
]
},
{

File diff suppressed because it is too large Load Diff

Before

Width:  |  Height:  |  Size: 6.7 MiB

After

Width:  |  Height:  |  Size: 6.8 MiB