diff --git a/notebooks/LEQ.ipynb b/notebooks/LEQ.ipynb index d9a4e5d..55a266f 100644 --- a/notebooks/LEQ.ipynb +++ b/notebooks/LEQ.ipynb @@ -37,10 +37,21 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 1, "id": "7e93809a", "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "text/plain": [ + "ge_dep_check (generic function with 1 method)" + ] + }, + "execution_count": 1, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "using Printf\n", "function answer_checker(answer,solution)\n", @@ -149,10 +160,21 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 2, "id": "e4070214", "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "text/plain": [ + "gaussian_elimination! (generic function with 1 method)" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "function gaussian_elimination!(B)\n", " n,m = size(B)\n", @@ -182,12 +204,34 @@ "" ] }, + { + "cell_type": "markdown", + "id": "992301dd", + "metadata": {}, + "source": [ + "You can verify that the algorithm computes the upper triangular matrix correctly for the example in the introduction by running the following code cell. " + ] + }, { "cell_type": "code", - "execution_count": null, + "execution_count": 4, "id": "eb30df0d", "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "text/plain": [ + "3×4 Matrix{Float64}:\n", + " 1.0 3.0 1.0 9.0\n", + " 0.0 1.0 2.0 8.0\n", + " 0.0 0.0 1.0 4.0" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "A = Float64[1 3 1; 1 2 -1; 3 11 5]\n", "b = Float64[9,1,35]\n", @@ -195,14 +239,6 @@ "gaussian_elimination!(B)" ] }, - { - "cell_type": "markdown", - "id": "8d941741", - "metadata": {}, - "source": [ - "The result is an upper triangular matrix which can be used to solve the system by backward substitution. " - ] - }, { "cell_type": "markdown", "id": "39f2e8ef", @@ -277,7 +313,7 @@ "The outer loop of the algorithm is not parallelizable, since the iterations depend on the results of the previous iterations. However, we can extract parallelism from the inner loops. Let's have a look at two different parallelization schemes. \n", "\n", "1. **Block-wise partitioning**: Each processor gets a block of subsequent rows. \n", - "2. **Cyclic partitioning**: The rows are cyclicly distributed among the processors. " + "2. **Cyclic partitioning**: The rows are alternately assigned to different processors. " ] }, { @@ -302,7 +338,9 @@ "source": [ "## What is the work per process at iteration k?\n", "To evaluate the efficiency of both partitioning schemes, consider how much work the processors do in the following example. \n", - "In any iteration k, which part of the matrix is updated in the inner loops? " + "In any iteration k, which part of the matrix is updated in the inner loops? \n", + "\n", + "### Block-wise partition" ] }, { @@ -325,7 +363,7 @@ "id": "d9d29899", "metadata": {}, "source": [ - "It is clear from the code that at a given iteration `k`, the matrix is updated from row `k` to `n` and from column `k` to `m`. If we look at how that reflects the distribution of work over the processes, we can see that CPU 1 does not have any work, whereas CPU 2 does a little work and CPU 3 and 4 do a lot of work. Thus, the work load is _imbalanced_ across the different processes. " + "It is clear from the code that at any given iteration `k`, the matrix is updated from row `k` to `n` and from column `k` to `m`. If we look at how that reflects the distribution of work over the processes, we can see that CPU 1 does not have any work, whereas CPU 2 does a little work and CPU 3 and 4 do a lot of work. " ] }, { @@ -350,14 +388,12 @@ "source": [ "### Load imbalance\n", "\n", - "- CPUs with rows \n", - "Question: What are the data dependencies of this partitioning?\n", + "Question: What are the data dependencies in the block-wise partitioning?\n", "\n", "\n", " a) CPUs with rows >k need all rows <=k\n", @@ -408,7 +444,7 @@ "metadata": {}, "source": [ "## Conclusion\n", - "Cyclic partitioning tends to work well in problems with predictable load imbalance. It is a form of **static load balancing** which means using a pre-defined load schedule based on prior information about the algorithm (as opposed to **dynamic load balancing** which can schedule loads more flexibly during runtime). The data dependencies are the same as for the 1d block partitioning.\n", + "Cyclic partitioning tends to work well in problems with predictable load imbalance. It is a form of **static load balancing** which means using a pre-defined load schedule based on prior information about the algorithm (as opposed to **dynamic load balancing** which can schedule loads flexibly during runtime). The data dependencies are the same as for the 1d block partitioning.\n", "\n", "At the same time, cyclic partitioning is not suitable for all communication patterns. For example, it can lead to a large communication overhead in the parallel Jacobi method, since the computation of each value depends on its neighbouring elements." ] diff --git a/notebooks/asp.ipynb b/notebooks/asp.ipynb index a6ddd7d..9fce552 100644 --- a/notebooks/asp.ipynb +++ b/notebooks/asp.ipynb @@ -69,7 +69,7 @@ }, { "cell_type": "markdown", - "id": "9e0f9545", + "id": "24b5c21a", "metadata": {}, "source": [ "## The All Pairs of Shortest Paths (ASP) problem\n", @@ -84,7 +84,6 @@ ] }, { - "attachments": {}, "cell_type": "markdown", "id": "ade31d26", "metadata": {}, @@ -410,7 +409,7 @@ }, { "cell_type": "markdown", - "id": "ebb1f4d7", + "id": "5f26f9b5", "metadata": {}, "source": [ "### Parallelization strategy\n", @@ -620,7 +619,7 @@ "id": "6993b9d0", "metadata": {}, "source": [ - "In summary, the send/computation ratio is $O(P^2/N)$ and the receive/computation ratio is $O(P/N)$. Therefore, algorithm is potentially scalable if $P<