Merge pull request #39 from fverdugo/francesc

Minor in matrix matrix multiplication.
This commit is contained in:
Francesc Verdugo 2024-08-27 09:44:30 +02:00 committed by GitHub
commit 5e44a1946f
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -55,18 +55,10 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": null,
"id": "2f8ba040",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"🥳 Well done! \n"
]
}
],
"outputs": [],
"source": [
"using Distributed\n",
"using BenchmarkTools\n",
@ -290,7 +282,7 @@
"\n",
"The matrix-matrix multiplication is an example of [embarrassingly parallel algorithm](https://en.wikipedia.org/wiki/Embarrassingly_parallel). An embarrassingly parallel (also known as trivially parallel) algorithm is an algorithm that can be split in parallel tasks with no (or very few) dependences between them. Such algorithms are typically easy to parallelize.\n",
"\n",
"Which parts of an algorithm are completely independent and thus trivially parallel? To answer this question, it is useful to inspect the for loops, which are potential sources parallelism. If the iterations are independent of each other, then they are trivial to parallelize. An easy check to find out if the iterations are dependent or not is to change their order (for instance changing `for j in 1:n` by `for j in n:-1:1`, i.e. doing the loop in reverse). If the result changes, then the iterations are not independent.\n",
"Which parts of an algorithm are completely independent and thus trivially parallel? To answer this question, it is useful to inspect the for loops, which are potential sources of parallelism. If the iterations are independent of each other, then they are trivial to parallelize. An easy check to find out if the iterations are dependent or not is to change their order (for instance changing `for j in 1:n` by `for j in n:-1:1`, i.e. doing the loop in reverse). If the result changes, then the iterations are not independent.\n",
"\n",
"Look at the three nested loops in the sequential implementation of the matrix-matrix product:\n",
"\n",
@ -320,7 +312,7 @@
"source": [
"### Parallel algorithms\n",
"\n",
"Parallelizing the loops over `i` and `j` means that all the entries of matrix C can be potentially computed in parallel. However, *which it the most efficient solution to solve all these entries in parallel in a distributed system?* To find this we will consider different parallelization strategies:\n",
"The loops over `i` and `j` are trivially parallel implies that all the entries of matrix C can be potentially computed in parallel. However, *which it the most efficient solution to solve all these entries in parallel in a distributed system?* To find this we will consider different parallelization strategies:\n",
"\n",
"- Algorithm 1: each worker computes a single entry of C\n",
"- Algorithm 2: each worker computes a single row of C\n",
@ -352,7 +344,7 @@
"source": [
"### Data dependencies\n",
"\n",
"Moving data through the network is expensive and reducing data movement is one of the key points in a distributed algorithm. To this end, we need to determine which is the minimum data needed by a worker to perform its computations. These are called the *data dependencies*. This will give us later information about the performance of the parallel algorithm.\n",
"Moving data through the network is expensive and reducing data movement is one of the key points to design efficient distributed algorithms. To this end, we need to determine which is the minimum data needed by a worker to perform its computations. These are called the *data dependencies*. This will give us later information about the performance of the parallel algorithm.\n",
"\n",
"In algorithm 1, each worker computes only an entry of the result matrix C."
]
@ -403,7 +395,7 @@
"Taking into account the data dependencies, the parallel algorithm 1 can be efficiently implemented following these steps from the worker perspective:\n",
"\n",
"1. The worker receives the data dependencies, i.e., the corresponding row A[i,:] and column B[:,j] from the master process\n",
"2. The worker computes the dot product of A[i,:] and B[:,j]\n",
"2. The worker computes the dot product of A[i,:] and B[:,j] locally\n",
"3. The worker sends back the result of C[i,j] to the master process"
]
},
@ -423,7 +415,7 @@
"id": "9d22ccea",
"metadata": {},
"source": [
"A possible implementation of this algorithm in Julia is as follows:"
"A possible implementation of this algorithm in Julia is as follows. Try to understand why `@sync` and `@async` are needed here."
]
},
{
@ -483,7 +475,8 @@
"A = rand(N,N)\n",
"B = rand(N,N)\n",
"C = similar(A)\n",
"@test matmul_dist_1!(C,A,B) ≈ A*B"
"matmul_dist_1!(C,A,B)\n",
"@test C ≈ A*B"
]
},
{