Some improvements in the first notebooks

This commit is contained in:
Francesc Verdugo
2024-08-19 15:58:09 +02:00
parent 047d6feadb
commit 5abdc088d2
5 changed files with 307 additions and 203 deletions

View File

@@ -1060,107 +1060,6 @@
"println(\"Efficiency = \", 100*(T1/TP)/P, \"%\")"
]
},
{
"cell_type": "markdown",
"id": "fa8d7f40",
"metadata": {},
"source": [
"### Exercise 2"
]
},
{
"cell_type": "markdown",
"id": "0e7c607e",
"metadata": {},
"source": [
"The implementation of algorithm 1 is very impractical. One needs as many processors as entries in the result matrix C. For 1000 times 1000 matrix one would need a supercomputer with one million processes! We can easily fix this problem by using less processors and spawning the computation of an entry in any of the available processes.\n",
"See the following code:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "023b20d1",
"metadata": {},
"outputs": [],
"source": [
"function matmul_dist_1_v2!(C, A, B)\n",
" m = size(C,1)\n",
" n = size(C,2)\n",
" l = size(A,2)\n",
" @assert size(A,1) == m\n",
" @assert size(B,2) == n\n",
" @assert size(B,1) == l\n",
" z = zero(eltype(C))\n",
" @sync for j in 1:n\n",
" for i in 1:m\n",
" Ai = A[i,:]\n",
" Bj = B[:,j]\n",
" ftr = @spawnat :any begin\n",
" Cij = z\n",
" for k in 1:l\n",
" @inbounds Cij += Ai[k]*Bj[k]\n",
" end\n",
" Cij\n",
" end\n",
" @async C[i,j] = fetch(ftr)\n",
" end\n",
" end\n",
" C\n",
"end"
]
},
{
"cell_type": "markdown",
"id": "52005ca1",
"metadata": {},
"source": [
"With this new implementation, we can multiply matrices of arbitrary size with a fixed number of workers. Test it:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c1d3595b",
"metadata": {},
"outputs": [],
"source": [
"using Test\n",
"N = 50\n",
"A = rand(N,N)\n",
"B = rand(N,N)\n",
"C = similar(A)\n",
"@test matmul_dist_1_v2!(C,A,B) ≈ A*B"
]
},
{
"cell_type": "markdown",
"id": "ab609c18",
"metadata": {},
"source": [
"Run the next cell to check the performance of this implementation. Note that we are far away from the optimal speed up. Why? To answer this question compute the theoretical communication over computation ratio for this implementation and reason about the obtained result. Hint: the number of times a worker is spawned in this implementation is N^2/P on average."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d7d31710",
"metadata": {},
"outputs": [],
"source": [
"N = 100\n",
"A = rand(N,N)\n",
"B = rand(N,N)\n",
"C = similar(A)\n",
"P = nworkers()\n",
"T1 = @belapsed matmul_seq!(C,A,B)\n",
"C = similar(A)\n",
"TP = @belapsed matmul_dist_1_v2!(C,A,B)\n",
"println(\"Speedup = \", T1/TP)\n",
"println(\"Optimal speedup = \", P)\n",
"println(\"Efficiency = \", 100*(T1/TP)/P, \"%\")"
]
},
{
"cell_type": "markdown",
"id": "8e171362",
@@ -1175,15 +1074,15 @@
],
"metadata": {
"kernelspec": {
"display_name": "Julia 1.9.1",
"display_name": "Julia 1.10.0",
"language": "julia",
"name": "julia-1.9"
"name": "julia-1.10"
},
"language_info": {
"file_extension": ".jl",
"mimetype": "application/julia",
"name": "julia",
"version": "1.9.1"
"version": "1.10.0"
}
},
"nbformat": 4,