Improve explanations ASP and LEQ notebooks

2025-11-09 00:24:25 +01:00 · 2023-11-14 11:16:37 +01:00 · 2023-11-14 11:16:37 +01:00 · 685d3db6a5
commit 685d3db6a5
parent 123053b9c7
3 changed files with 130 additions and 81 deletions
--- a/notebooks/LEQ.ipynb
+++ b/notebooks/LEQ.ipynb
@ -37,10 +37,21 @@
  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 1,
   "id": "7e93809a",
   "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "ge_dep_check (generic function with 1 method)"
+      ]
+     },
+     "execution_count": 1,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
   "source": [
    "using Printf\n",
    "function answer_checker(answer,solution)\n",
@ -149,10 +160,21 @@
  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 2,
   "id": "e4070214",
   "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "gaussian_elimination! (generic function with 1 method)"
+      ]
+     },
+     "execution_count": 2,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
   "source": [
    "function gaussian_elimination!(B)\n",
    "    n,m = size(B)\n",
@ -182,12 +204,34 @@
    "</div>"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "id": "992301dd",
+   "metadata": {},
+   "source": [
+    "You can verify that the algorithm computes the upper triangular matrix correctly for the example in the introduction by running the following code cell. "
+   ]
+  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 4,
   "id": "eb30df0d",
   "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "3×4 Matrix{Float64}:\n",
+       " 1.0  3.0  1.0  9.0\n",
+       " 0.0  1.0  2.0  8.0\n",
+       " 0.0  0.0  1.0  4.0"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
   "source": [
    "A = Float64[1 3 1; 1 2 -1; 3 11 5]\n",
    "b = Float64[9,1,35]\n",
@ -195,14 +239,6 @@
    "gaussian_elimination!(B)"
   ]
  },
-  {
-   "cell_type": "markdown",
-   "id": "8d941741",
-   "metadata": {},
-   "source": [
-    "The result is an upper triangular matrix which can be used to solve the system by backward substitution. "
-   ]
-  },
  {
   "cell_type": "markdown",
   "id": "39f2e8ef",
@ -277,7 +313,7 @@
    "The outer loop of the algorithm is not parallelizable, since the iterations depend on the results of the previous iterations. However, we can extract parallelism from the inner loops. Let's have a look at two different parallelization schemes. \n",
    "\n",
    "1. **Block-wise partitioning**: Each processor gets a block of subsequent rows. \n",
-    "2. **Cyclic partitioning**: The rows are cyclicly distributed among the processors. "
+    "2. **Cyclic partitioning**: The rows are alternately assigned to different processors. "
   ]
  },
  {
@ -302,7 +338,9 @@
   "source": [
    "## What is the work per process at iteration k?\n",
    "To evaluate the efficiency of both partitioning schemes, consider how much work the processors do in the following example. \n",
-    "In any iteration k, which part of the matrix is updated in the inner loops? "
+    "In any iteration k, which part of the matrix is updated in the inner loops? \n",
+    "\n",
+    "### Block-wise partition"
   ]
  },
  {
@ -325,7 +363,7 @@
   "id": "d9d29899",
   "metadata": {},
   "source": [
-    "It is clear from the code that at a given iteration `k`, the matrix is updated from row `k` to `n` and from column `k` to `m`. If we look at how that reflects the distribution of work over the processes, we can see that CPU 1 does not have any work, whereas CPU 2 does a little work and CPU 3 and 4 do a lot of work. Thus, the work load is _imbalanced_ across the different processes. "
+    "It is clear from the code that at any given iteration `k`, the matrix is updated from row `k` to `n` and from column `k` to `m`. If we look at how that reflects the distribution of work over the processes, we can see that CPU 1 does not have any work, whereas CPU 2 does a little work and CPU 3 and 4 do a lot of work. "
   ]
  },
  {
@ -350,14 +388,12 @@
   "source": [
    "### Load imbalance\n",
    "\n",
-    "- CPUs with rows <k are idle during iteration k\n",
-    "- Bad load balance means bad speedups, as some CPUs are waiting instead of doing useful work\n",
-    "- Solution: cyclic partition  \n",
+    "The block-wise partitioning scheme leads to load imbalance across the processes: CPUs with rows $<k$ are idle during any iteration $k$. The bad load balance leads to bad speedups, as some CPUs are waiting instead of doing useful work. \n",
    "                    \n",
    "### Data dependencies\n",
    "                    \n",
    "<div class=\"alert alert-block alert-success\">\n",
-    "<b>Question:</b>  What are the data dependencies of this partitioning?\n",
+    "<b>Question:</b>  What are the data dependencies in the block-wise partitioning?\n",
    "</div>\n",
    "\n",
    "    a) CPUs with rows >k need all rows <=k\n",
@ -408,7 +444,7 @@
   "metadata": {},
   "source": [
    "## Conclusion\n",
-    "Cyclic partitioning tends to work well in problems with predictable load imbalance. It is a form of **static load balancing** which means using a pre-defined load schedule based on prior information about the algorithm (as opposed to **dynamic load balancing** which can schedule loads more flexibly during runtime). The data dependencies are the same as for the 1d block partitioning.\n",
+    "Cyclic partitioning tends to work well in problems with predictable load imbalance. It is a form of **static load balancing** which means using a pre-defined load schedule based on prior information about the algorithm (as opposed to **dynamic load balancing** which can schedule loads flexibly during runtime). The data dependencies are the same as for the 1d block partitioning.\n",
    "\n",
    "At the same time, cyclic partitioning is not suitable for all communication patterns. For example, it can lead to a large communication overhead in the parallel Jacobi method, since the computation of each value depends on its neighbouring elements."
   ]
--- a/notebooks/asp.ipynb
+++ b/notebooks/asp.ipynb
@ -69,7 +69,7 @@
  },
  {
   "cell_type": "markdown",
-   "id": "9e0f9545",
+   "id": "24b5c21a",
   "metadata": {},
   "source": [
    "## The All Pairs of Shortest Paths (ASP) problem\n",
@ -84,7 +84,6 @@
   ]
  },
  {
-   "attachments": {},
   "cell_type": "markdown",
   "id": "ade31d26",
   "metadata": {},
@ -410,7 +409,7 @@
  },
  {
   "cell_type": "markdown",
-   "id": "ebb1f4d7",
+   "id": "5f26f9b5",
   "metadata": {},
   "source": [
    "### Parallelization strategy\n",
@ -620,7 +619,7 @@
   "id": "6993b9d0",
   "metadata": {},
   "source": [
-    "In summary, the send/computation ratio is $O(P^2/N)$ and the receive/computation ratio is $O(P/N)$. Therefore, algorithm is potentially scalable if $P<<N$."
+    "In summary, the send/computation ratio is $O(P^2/N)$ and the receive/computation ratio is $O(P/N)$. The algorithm is potentially scalable if $P<<N$."
   ]
  },
  {
@ -675,7 +674,7 @@
   "source": [
    "### Code\n",
    "\n",
-    "We split the code into two functions. The first function is called on the main process (the process running this notebook). It splits the input matrix into blocks of rows. Then, we call `floyd_worker!` (see below) to remotely compute Floyd's algorithm in each worker with its corresponding block of rows.\n"
+    "We split the code into two functions. The first function is called on the main process (the process running this notebook). It splits the input matrix into blocks of rows. Then, we use a remotecall to compute Floyd's algorithm in each worker with its corresponding block of rows.\n"
   ]
  },
  {
@ -889,52 +888,10 @@
   "source": [
    "### Possible solutions\n",
    "\n",
-    "- Use synchronous send MPI_SSEND (less efficient). Note that the blocking send MPI_SEND used above does not guarantee that the message was received. \n",
-    "- Barrier at the end of each iteration over $k$ (simple solution, but synchronization overhead)\n",
-    "- Order incoming messages (buffering and extra user code needed)\n",
-    "- Use a specific rank id instead of `MPI.ANY_SOURCE` or use `MPI.Bcast!` (one needs to know which are the rows owned by the other ranks)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "aab3fbfb",
-   "metadata": {},
-   "source": [
-    "### Exercise 1\n",
-    "Rewrite the function `floyd_worker!()` such that it runs correctly using `MPI.Bcast!`. "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "aab1900f",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "@everywhere function floyd_worker!(Cw,rows_w)\n",
-    "    comm = MPI.Comm_dup(MPI.COMM_WORLD)\n",
-    "    rank = MPI.Comm_rank(comm)\n",
-    "    nranks = MPI.Comm_size(comm)\n",
-    "    m,n = size(Cw)\n",
-    "    C_k = similar(Cw,n)\n",
-    "    # TODO: calculate order of source processor ranks\n",
-    "    for k in 1:n\n",
-    "        root_id = # TODO: calculate the rank of processor that owns k\n",
-    "        if k in rows_w\n",
-    "            myk = (k-first(rows_w))+1\n",
-    "            C_k .= view(Cw,myk,:)\n",
-    "        end\n",
-    "        MPI.Bcast!(C_k, root=root_id, comm)\n",
-    "        # TODO: enqueue C_k in list or array\n",
-    "        # TODO: dequeue next C_k from array/list\n",
-    "        for j in 1:n\n",
-    "            for i in 1:m\n",
-    "                @inbounds Cw[i,j] = min(Cw[i,j],Cw[i,k]+C_k[j])\n",
-    "            end\n",
-    "        end\n",
-    "    end\n",
-    "    Cw\n",
-    "end"
+    "1. **Synchronous sends**: Use synchronous send MPI_SSEND. This is less efficient because we spend time waiting until each message is received. Note that the blocking send MPI_SEND used above does not guarantee that the message was received. \n",
+    "2. **MPI.Barrier**: Use a barrier at the end of each iteration over $k$. This is easy to implement, but we get a  synchronization overhead.\n",
+    "3. **Order incoming messages**: The receiver orders the incoming messages, e.g. according to MPI.Status or the sender rank. This requires buffering and extra user code.\n",
+    "4. **MPI.Bcast!**: Communicate row k using `MPI.Bcast!`. One needs to know which are the rows owned by the other ranks."
   ]
  },
  {
--- a/notebooks/tsp.ipynb
+++ b/notebooks/tsp.ipynb
@ -212,10 +212,21 @@
  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 1,
   "id": "a50706bc",
   "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "sort_neighbors (generic function with 1 method)"
+      ]
+     },
+     "execution_count": 1,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
   "source": [
    "function sort_neighbors(C)\n",
    "    n = size(C,1)\n",
@ -230,10 +241,25 @@
  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 2,
   "id": "2eeecdd6",
   "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "4×4 Matrix{Int64}:\n",
+       " 0  2  3  2\n",
+       " 2  0  4  1\n",
+       " 3  4  0  3\n",
+       " 2  1  3  0"
+      ]
+     },
+     "execution_count": 2,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
   "source": [
    "C = [\n",
    "    0 2 3 2\n",
@ -253,10 +279,25 @@
  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 3,
   "id": "6dd0288e",
   "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "4-element Vector{Vector{Tuple{Int64, Int64}}}:\n",
+       " [(1, 0), (2, 2), (4, 2), (3, 3)]\n",
+       " [(2, 0), (4, 1), (1, 2), (3, 4)]\n",
+       " [(3, 0), (1, 3), (4, 3), (2, 4)]\n",
+       " [(4, 0), (2, 1), (1, 2), (3, 3)]"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
   "source": [
    "C_sorted = sort_neighbors(C)"
   ]
@ -271,12 +312,27 @@
  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 4,
   "id": "00608e1d",
   "metadata": {
    "scrolled": true
   },
-   "outputs": [],
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "4-element Vector{Tuple{Int64, Int64}}:\n",
+       " (3, 0)\n",
+       " (1, 3)\n",
+       " (4, 3)\n",
+       " (2, 4)"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
   "source": [
    "city = 3\n",
    "C_sorted[city]"