diff --git a/docs/src/solutions_for_all_notebooks.md b/docs/src/solutions_for_all_notebooks.md index 12340c3..b27d433 100644 --- a/docs/src/solutions_for_all_notebooks.md +++ b/docs/src/solutions_for_all_notebooks.md @@ -2,19 +2,6 @@ ## Julia Basics -### NB1-Q1 - -In the first, line we assign a variable to a value. In the second line, we assign another variable to the same value. Thus,we have 2 variables associated with the same value. In line 3, we associate `y` to a new value (re-assignment). Thus, we have 2 variables associated with 2 different values. Variable `x` is still associated with its original value. Thus, the value at the final line is `x=1`. - -### NB1-Q2 - -It will be `1` for very similar reasons as in the previous questions: we are reassigning a local variable, not the global variable defined outside the function. - -### NB1-Q3 - -It will be `6`. In the returned function `f2`, `x` is equal to `2`. Thus, when calling `f2(3)` we compute `2*3`. - - ### Exercise 1 ```julia @@ -50,77 +37,10 @@ heatmap(x,y,(i,j)->mandel(i,j,max_iters)) ## Asynchronous programming in Julia -### NB2-Q1 - -Evaluating `compute_π(100_000_000)` takes about 0.25 seconds. Thus, the loop would take about 2.5 seconds since we are calling the function 10 times. - -### NB2-Q2 - -The time in doing the loop will be almost zero since the loop just schedules 10 tasks, which should be very fast. - -### NB2-Q3 - -It will take 2.5 seconds, like in question 1. The `@sync` macro forces to wait for all tasks we have generated with the `@async` macro. Since we have created 10 tasks and each of them takes about 0.25 seconds, the total time will be about 2.5 seconds. - -### NB2-Q4 - -It will take about 3 seconds. The channel has buffer size 4, thus the call to `put!`will not block. The call to `take!` will not block neither since there is a value stored in the channel. The taken value is 3 and therefore we will wait for 3 seconds. - -### NB2-Q5 - -The channel is not buffered and therefore the call to `put!` will block. The cell will run forever, since there is no other task that calls `take!` on this channel. - ## Distributed computing in Julia -### NB3-Q1 - -We send the matrix (16 entries) and then we receive back the result (1 extra integer). Thus, the total number of transferred integers in 17. - -### NB3-Q2 - -Even though we only use a single entry of the matrix in the remote worker, the entire matrix is captured and sent to the worker. Thus, we will transfer 17 integers like in Question 1. - -### NB3-Q3 - -The value of `x` will still be zero since the worker receives a copy of the matrix and it modifies this copy, not the original one. - -### NB3-Q4 - -In this case, the code `a[2]=2` is executed in the main process. Since the matrix is already in the main process, it is not needed to create and send a copy of it. Thus, the code modifies the original matrix and the value of `x` will be 2. - -## Distributed computing with MPI - ### Exercise 1 -```julia -using MPI -MPI.Init() -comm = MPI.Comm_dup(MPI.COMM_WORLD) -rank = MPI.Comm_rank(comm) -nranks = MPI.Comm_size(comm) -buffer = Ref(0) -if rank == 0 - msg = 2 - buffer[] = msg - println("msg = $(buffer[])") - MPI.Send(buffer,comm;dest=rank+1,tag=0) - MPI.Recv!(buffer,comm;source=nranks-1,tag=0) - println("msg = $(buffer[])") -else - dest = if (rank != nranks-1) - rank+1 - else - 0 - end - MPI.Recv!(buffer,comm;source=rank-1,tag=0) - buffer[] += 1 - println("msg = $(buffer[])") - MPI.Send(buffer,comm;dest,tag=0) -end -``` - -### Exercise 2 - ```julia f = () -> Channel{Int}(1) chnls = [ RemoteChannel(f,w) for w in workers() ] @@ -160,6 +80,38 @@ end msg = 2 @fetchfrom 2 work(msg) ``` + +## MPI (Point-to-point) + +### Exercise 1 + +```julia +using MPI +MPI.Init() +comm = MPI.Comm_dup(MPI.COMM_WORLD) +rank = MPI.Comm_rank(comm) +nranks = MPI.Comm_size(comm) +buffer = Ref(0) +if rank == 0 + msg = 2 + buffer[] = msg + println("msg = $(buffer[])") + MPI.Send(buffer,comm;dest=rank+1,tag=0) + MPI.Recv!(buffer,comm;source=nranks-1,tag=0) + println("msg = $(buffer[])") +else + dest = if (rank != nranks-1) + rank+1 + else + 0 + end + MPI.Recv!(buffer,comm;source=rank-1,tag=0) + buffer[] += 1 + println("msg = $(buffer[])") + MPI.Send(buffer,comm;dest,tag=0) +end +``` + ## Matrix-matrix multiplication ### Exercise 1 @@ -209,10 +161,6 @@ end end ``` -### Exercise 2 - -At each call to @spawnat we will communicate O(N) and compute O(N) in a worker process just like in algorithm 1. However, we will do this work N^2/P times on average at each worker. Thus, the total communication and computation on a worker will be O(N^3/P) for both communication and computation. Thus, the communication over computation ratio will still be O(1) and thus the communication will dominate in practice, making the algorithm inefficient. - ## Jacobi method ### Exercise 1 diff --git a/notebooks/julia_async.ipynb b/notebooks/julia_async.ipynb index 971d0f7..0862493 100644 --- a/notebooks/julia_async.ipynb +++ b/notebooks/julia_async.ipynb @@ -28,6 +28,56 @@ "Understanding these concepts is important to learn distributed computing later." ] }, + { + "cell_type": "markdown", + "id": "cde5ee75", + "metadata": {}, + "source": [ + "
\n", + "Note: Do not forget to execute the next cell before starting this notebook! \n", + "
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0b0496c7", + "metadata": {}, + "outputs": [], + "source": [ + "function why_q1()\n", + " msg = \"\"\"\n", + " Evaluating compute_π(100_000_000) takes about 0.25 seconds on the teacher's laptop. Thus, the loop would take about 2.5 seconds since we are calling the function 10 times.\n", + " \"\"\"\n", + " println(msg)\n", + "end\n", + "function why_q2()\n", + " msg = \"\"\"\n", + " The time in doing the loop will be almost zero since the loop just schedules 10 tasks, which should be very fast.\n", + " \"\"\"\n", + " println(msg)\n", + "end\n", + "function why_q3()\n", + " msg = \"\"\"\n", + " It will take 2.5 seconds, like in question 1. The @sync macro forces to wait for all tasks we have generated with the @async macro. Since we have created 10 tasks and each of them takes about 0.25 seconds, the total time will be about 2.5 seconds.\n", + " \"\"\"\n", + " println(msg)\n", + "end\n", + "function why_q4()\n", + " msg = \"\"\"\n", + " It will take about 3 seconds. The channel has buffer size 4, thus the call to put!will not block. The call to take! will not block neither since there is a value stored in the channel. The taken value is 3 and therefore we will wait for 3 seconds.\n", + " \"\"\"\n", + " println(msg)\n", + "end\n", + "function why_q5()\n", + " msg = \"\"\"\n", + " The channel is not buffered and therefore the call to put! will block. The cell will run forever, since there is no other task that calls take! on this channel.\n", + " \"\"\"\n", + " println(msg)\n", + "end\n", + "println(\"🥳 Well done! \")" + ] + }, { "cell_type": "markdown", "id": "caf64254", @@ -726,6 +776,16 @@ "end" ] }, + { + "cell_type": "code", + "execution_count": null, + "id": "d6b8382e", + "metadata": {}, + "outputs": [], + "source": [ + "why_q1()" + ] + }, { "cell_type": "markdown", "id": "5f19d38c", @@ -754,6 +814,16 @@ "end" ] }, + { + "cell_type": "code", + "execution_count": null, + "id": "edff9747", + "metadata": {}, + "outputs": [], + "source": [ + "why_q2()" + ] + }, { "cell_type": "markdown", "id": "5041c355", @@ -781,6 +851,16 @@ "end" ] }, + { + "cell_type": "code", + "execution_count": null, + "id": "87bc7c5c", + "metadata": {}, + "outputs": [], + "source": [ + "why_q3()" + ] + }, { "cell_type": "markdown", "id": "841b690e", @@ -821,6 +901,16 @@ "end" ] }, + { + "cell_type": "code", + "execution_count": null, + "id": "a18a0a7d", + "metadata": {}, + "outputs": [], + "source": [ + "why_q4()" + ] + }, { "cell_type": "markdown", "id": "df663f11", @@ -860,6 +950,26 @@ "end" ] }, + { + "cell_type": "code", + "execution_count": null, + "id": "d8923fae", + "metadata": {}, + "outputs": [], + "source": [ + "why_q5()" + ] + }, + { + "cell_type": "markdown", + "id": "0ee77abe", + "metadata": {}, + "source": [ + "
\n", + "Note: If for some reason a cell keeps running forever, we can stop it with Kernel > Interrupt or Kernel > Restart (see tabs above).\n", + "
" + ] + }, { "cell_type": "markdown", "id": "a5d3730b", diff --git a/notebooks/julia_basics.ipynb b/notebooks/julia_basics.ipynb index 2bbf253..c403981 100644 --- a/notebooks/julia_basics.ipynb +++ b/notebooks/julia_basics.ipynb @@ -147,6 +147,44 @@ "foo()" ] }, + { + "cell_type": "markdown", + "id": "d18e679d", + "metadata": {}, + "source": [ + "### A very easy first exercise\n", + "\n", + "Run the following cell. It contains definitions used later in the notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "81678b3d", + "metadata": {}, + "outputs": [], + "source": [ + "function why_q1()\n", + " msg = \"\"\"\n", + " In the first line, we assign a variable to a value. In the second line, we assign another variable to the same value. Thus, we have 2 variables associated with the same value. In line 3, we associate y to a new value (re-assignment). Thus, we have 2 variables associated with 2 different values. Variable x is still associated with its original value. Thus, the value at the final line is x=1.\n", + " \"\"\"\n", + " println(msg)\n", + "end\n", + "function why_q2()\n", + " msg = \"\"\"\n", + " It will be 1 for very similar reasons as in the previous questions: we are reassigning a local variable, not the global variable defined outside the function.\n", + " \"\"\"\n", + " println(msg)\n", + "end\n", + "function why_q3()\n", + " msg = \"\"\"\n", + " It will be 6. In the returned function f2, x is equal to 2. Thus, when calling f2(3) we compute 2*3.\n", + " \"\"\"\n", + " println(msg)\n", + "end\n", + "println(\"🥳 Well done! \")" + ] + }, { "cell_type": "markdown", "id": "92112bd1", @@ -467,6 +505,24 @@ "x" ] }, + { + "cell_type": "markdown", + "id": "a2f94960", + "metadata": {}, + "source": [ + "Run next cell to get an explanation of this question." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fc562337", + "metadata": {}, + "outputs": [], + "source": [ + "why_q1()" + ] + }, { "cell_type": "markdown", "id": "4d2cb752", @@ -586,6 +642,24 @@ "x" ] }, + { + "cell_type": "markdown", + "id": "f69108c2", + "metadata": {}, + "source": [ + "Run next cell to get an explanation of this question." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "05c62aa3", + "metadata": {}, + "outputs": [], + "source": [ + "why_q2()" + ] + }, { "cell_type": "markdown", "id": "4fc5eb9b", @@ -1068,6 +1142,24 @@ "x" ] }, + { + "cell_type": "markdown", + "id": "062ff145", + "metadata": {}, + "source": [ + "Run next cell to get an explanation of this question." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6bf7818e", + "metadata": {}, + "outputs": [], + "source": [ + "why_q3()" + ] + }, { "cell_type": "markdown", "id": "bc8e9bcf", @@ -1257,16 +1349,6 @@ "squares[2:3] = [4,9]" ] }, - { - "cell_type": "markdown", - "id": "92248ad5", - "metadata": {}, - "source": [ - "
\n", - "Tip: Note that Julia array indexing is 1-based (like in Fortran, but unlike C,C++,Python). Love it or hate it. 🙂\n", - "
" - ] - }, { "cell_type": "markdown", "id": "f64021ab", diff --git a/notebooks/julia_distributed.ipynb b/notebooks/julia_distributed.ipynb index 0831a6e..94acbbd 100644 --- a/notebooks/julia_distributed.ipynb +++ b/notebooks/julia_distributed.ipynb @@ -60,7 +60,32 @@ " end |> println\n", "end\n", "q_1_check(answer) = answer_checker(answer,\"a\")\n", - "q_2_check(answer) = answer_checker(answer,\"b\")" + "q_2_check(answer) = answer_checker(answer,\"b\")\n", + "function why_q1()\n", + " msg = \"\"\"\n", + " We send the matrix (16 entries) and then we receive back the result (1 extra integer). Thus, the total number of transferred integers in 17.\n", + " \"\"\"\n", + " display(msg)\n", + "end\n", + "function why_q2()\n", + " msg = \"\"\"\n", + " Even though we only use a single entry of the matrix in the remote worker, the entire matrix is captured and sent to the worker. Thus, we will transfer 17 integers like in Question 1.\n", + " \"\"\"\n", + " display(msg)\n", + "end\n", + "function why_q3()\n", + " msg = \"\"\"\n", + " The value of x will still be zero since the worker receives a copy of the matrix and it modifies this copy, not the original one.\n", + " \"\"\"\n", + " display(msg)\n", + "end\n", + "function why_q4()\n", + " msg = \"\"\"\n", + " In this case, the code a[2]=2 is executed in the main process. Since the matrix is already in the main process, it is not needed to create and send a copy of it. Thus, the code modifies the original matrix and the value of x will be 2.\n", + " \"\"\"\n", + " display(msg)\n", + "end\n", + "println(\"🥳 Well done! \")" ] }, { @@ -791,7 +816,7 @@ "source": [ "\n", "
\n", - "Question: How many integers are transferred between master and worker? Including both directions. \n", + "Question (NB3-Q1): How many integers are transferred between master and worker? Including both directions. \n", "
\n", "\n", "\n", @@ -819,13 +844,23 @@ "q_1_check(answer)" ] }, + { + "cell_type": "code", + "execution_count": null, + "id": "9c4d4900", + "metadata": {}, + "outputs": [], + "source": [ + "why_q1()" + ] + }, { "cell_type": "markdown", "id": "dbe373d1", "metadata": {}, "source": [ "
\n", - "Question: How many integers are transferred between master and worker? Including both directions. \n", + "Question (NB3-Q2): How many integers are transferred between master and worker? Including both directions. \n", "
\n", "\n", "\n", @@ -853,6 +888,16 @@ "q_2_check(answer)" ] }, + { + "cell_type": "code", + "execution_count": null, + "id": "e7c25fc4", + "metadata": {}, + "outputs": [], + "source": [ + "why_q2()" + ] + }, { "cell_type": "markdown", "id": "c561a73d", @@ -860,7 +905,7 @@ "source": [ "\n", "
\n", - "Question: Which value will be the value of `x` ? \n", + "Question (NB3-Q3): Which value will be the value of `x` ? \n", "
\n" ] }, @@ -878,13 +923,23 @@ "x" ] }, + { + "cell_type": "code", + "execution_count": null, + "id": "7b25a83f", + "metadata": {}, + "outputs": [], + "source": [ + "why_q3()" + ] + }, { "cell_type": "markdown", "id": "835080aa", "metadata": {}, "source": [ "
\n", - "Question: Which value will be the value of `x` ? \n", + "Question (NB3-Q4): Which value will be the value of `x` ? \n", "
\n", "\n", "Which value will be the value of `x` ?" @@ -904,6 +959,16 @@ "x" ] }, + { + "cell_type": "code", + "execution_count": null, + "id": "96b84cb5", + "metadata": {}, + "outputs": [], + "source": [ + "why_q4()" + ] + }, { "cell_type": "markdown", "id": "9e985c61", diff --git a/notebooks/matrix_matrix.ipynb b/notebooks/matrix_matrix.ipynb index 117df06..7efc0dc 100644 --- a/notebooks/matrix_matrix.ipynb +++ b/notebooks/matrix_matrix.ipynb @@ -1060,107 +1060,6 @@ "println(\"Efficiency = \", 100*(T1/TP)/P, \"%\")" ] }, - { - "cell_type": "markdown", - "id": "fa8d7f40", - "metadata": {}, - "source": [ - "### Exercise 2" - ] - }, - { - "cell_type": "markdown", - "id": "0e7c607e", - "metadata": {}, - "source": [ - "The implementation of algorithm 1 is very impractical. One needs as many processors as entries in the result matrix C. For 1000 times 1000 matrix one would need a supercomputer with one million processes! We can easily fix this problem by using less processors and spawning the computation of an entry in any of the available processes.\n", - "See the following code:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "023b20d1", - "metadata": {}, - "outputs": [], - "source": [ - "function matmul_dist_1_v2!(C, A, B)\n", - " m = size(C,1)\n", - " n = size(C,2)\n", - " l = size(A,2)\n", - " @assert size(A,1) == m\n", - " @assert size(B,2) == n\n", - " @assert size(B,1) == l\n", - " z = zero(eltype(C))\n", - " @sync for j in 1:n\n", - " for i in 1:m\n", - " Ai = A[i,:]\n", - " Bj = B[:,j]\n", - " ftr = @spawnat :any begin\n", - " Cij = z\n", - " for k in 1:l\n", - " @inbounds Cij += Ai[k]*Bj[k]\n", - " end\n", - " Cij\n", - " end\n", - " @async C[i,j] = fetch(ftr)\n", - " end\n", - " end\n", - " C\n", - "end" - ] - }, - { - "cell_type": "markdown", - "id": "52005ca1", - "metadata": {}, - "source": [ - "With this new implementation, we can multiply matrices of arbitrary size with a fixed number of workers. Test it:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "c1d3595b", - "metadata": {}, - "outputs": [], - "source": [ - "using Test\n", - "N = 50\n", - "A = rand(N,N)\n", - "B = rand(N,N)\n", - "C = similar(A)\n", - "@test matmul_dist_1_v2!(C,A,B) ≈ A*B" - ] - }, - { - "cell_type": "markdown", - "id": "ab609c18", - "metadata": {}, - "source": [ - "Run the next cell to check the performance of this implementation. Note that we are far away from the optimal speed up. Why? To answer this question compute the theoretical communication over computation ratio for this implementation and reason about the obtained result. Hint: the number of times a worker is spawned in this implementation is N^2/P on average." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "d7d31710", - "metadata": {}, - "outputs": [], - "source": [ - "N = 100\n", - "A = rand(N,N)\n", - "B = rand(N,N)\n", - "C = similar(A)\n", - "P = nworkers()\n", - "T1 = @belapsed matmul_seq!(C,A,B)\n", - "C = similar(A)\n", - "TP = @belapsed matmul_dist_1_v2!(C,A,B)\n", - "println(\"Speedup = \", T1/TP)\n", - "println(\"Optimal speedup = \", P)\n", - "println(\"Efficiency = \", 100*(T1/TP)/P, \"%\")" - ] - }, { "cell_type": "markdown", "id": "8e171362", @@ -1175,15 +1074,15 @@ ], "metadata": { "kernelspec": { - "display_name": "Julia 1.9.1", + "display_name": "Julia 1.10.0", "language": "julia", - "name": "julia-1.9" + "name": "julia-1.10" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", - "version": "1.9.1" + "version": "1.10.0" } }, "nbformat": 4,