diff --git a/notebooks/matrix_matrix.ipynb b/notebooks/matrix_matrix.ipynb index 6f93974..1cc299c 100644 --- a/notebooks/matrix_matrix.ipynb +++ b/notebooks/matrix_matrix.ipynb @@ -72,9 +72,10 @@ " \"It's not correct. Keep trying! 💪\"\n", " end |> println\n", "end\n", + "alg_0_comp_check(answer) = answer_checker(answer, \"d\")\n", "alg_1_deps_check(answer) = answer_checker(answer,\"b\")\n", - "alg_1_comm_overhead_check(answer) = answer_checker(answer, \"c\")\n", - "alg_1_comp_check(answer) = answer_checker(answer, \"a\")\n", + "alg_1_comm_overhead_check(answer) = answer_checker(answer, \"b\")\n", + "alg_1_comp_check(answer) = answer_checker(answer, \"b\")\n", "alg_2_complex_check(answer) = answer_checker(answer, \"b\")\n", "alg_2_deps_check(answer) = answer_checker(answer,\"d\")\n", "alg_3_deps_check(answer) = answer_checker(answer, \"c\")\n", @@ -88,7 +89,7 @@ "source": [ "## Problem Statement\n", "\n", - "Let us consider the (dense) matrix-matrix product `C=A*B`." + "Given $A$ and $B$ two $N$-by-$N$ matrices, compute the matrix-matrix product $C=AB$. Compute it in parallel and efficiently." ] }, { @@ -157,7 +158,7 @@ "source": [ "## Serial implementation\n", "\n", - "We start by considering the (naive) sequential algorithm:" + "We start by considering the (naive) sequential algorithm, which is based on the math definition of the matrix-matrix product $C_{ij} = \\sum_k A_{ik} B_{kj}$" ] }, { @@ -188,6 +189,30 @@ "end" ] }, + { + "cell_type": "markdown", + "id": "e3b86457", + "metadata": {}, + "source": [ + "Run next cell to test the implementation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c5caf799", + "metadata": {}, + "outputs": [], + "source": [ + "using Test\n", + "N = 10\n", + "A = rand(N,N)\n", + "B = rand(N,N)\n", + "C = similar(A)\n", + "matmul_seq!(C,A,B)\n", + "@test C ≈ A*B" + ] + }, { "cell_type": "markdown", "id": "f967d2ea", @@ -216,6 +241,32 @@ "@btime mul!(C,A,B);" ] }, + { + "cell_type": "markdown", + "id": "0ca2fbd4", + "metadata": {}, + "source": [ + "
\n", + "Question: Which is the complexity (number of operations) of the serial algorithm? Assume that all matrices are $N$-by-$N$ matrices. \n", + "
\n", + "\n", + " a) O(1)\n", + " b) O(N)\n", + " c) O(N²)\n", + " d) O(N³)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "078e974e", + "metadata": {}, + "outputs": [], + "source": [ + "answer = \"x\" # replace x with a, b, c, or d \n", + "alg_0_comp_check(answer)" + ] + }, { "cell_type": "markdown", "id": "0eedd28a", @@ -489,10 +540,10 @@ "Question: How many scalars are communicated from and to a worker? Assume that matrices A, B, and C are N by N matrices.\n", "\n", "\n", - " a) 3N\n", - " b) 2N + 2\n", - " c) 2N + 1\n", - " d) N² + 1" + " a) O(1)\n", + " b) O(N)\n", + " c) O(N²)\n", + " d) O(N³)" ] }, { @@ -515,9 +566,10 @@ "Question: How many operations are done in a worker? \n", "\n", "\n", - " a) O(N)\n", - " b) O(N²)\n", - " c) O(N³)" + " a) O(1)\n", + " b) O(N)\n", + " c) O(N²)\n", + " d) O(N³)" ] }, { @@ -905,9 +957,9 @@ "\n", "| Algorithm | Parallelism
(#workers) | Communication
per worker | Computation
per worker | Ratio communication/
computation |\n", "|---|---|---|---|---|\n", - "| 1 | N² | 2N + 1 | N | O(1) |\n", - "| 2 | N | 2N + N² | N² | O(1) |\n", - "| 3 | P | N² + 2N²/P | N³/P | O(P/N) |\n", + "| 1 | N² | O(N) | O(N) | O(1) |\n", + "| 2 | N | O(N²) | O(N²) | O(1) |\n", + "| 3 | P | O(N²) | O(N³/P) | O(P/N) |\n", "\n", "\n", "- Matrix-matrix multiplication is trivially parallelizable (all entries in the result matrix can be computed in parallel, at least in theory)\n", @@ -1086,7 +1138,7 @@ "id": "ab609c18", "metadata": {}, "source": [ - "Run the next cell to check the performance of this implementation. Note that we are far away from the optimal speed up. Why? To answer this question compute the theoretical communication over computation ratio for this implementation and reason about the obtained result. Hint: the number of times a worker is spawned in this implementation is N^3/P on average." + "Run the next cell to check the performance of this implementation. Note that we are far away from the optimal speed up. Why? To answer this question compute the theoretical communication over computation ratio for this implementation and reason about the obtained result. Hint: the number of times a worker is spawned in this implementation is N^2/P on average." ] }, {