Merge pull request #11 from fverdugo/francesc

More notebooks
2025-11-08 22:14:23 +01:00 · 2023-08-30 12:22:55 +02:00 · 2023-08-30 12:22:55 +02:00 · abd74c81a8
commit abd74c81a8
parent cabefa5acd 9429f1e1e3
3 changed files with 482 additions and 468 deletions
--- a/docs/make.jl
+++ b/docs/make.jl
@ -121,8 +121,8 @@ makedocs(;
                         "Distributed computing in Julia" => "julia_distributed.md",
                         "Distributed computing with MPI" => "mpi_tutorial.md",
                         "Matrix-matrix multiplication"=>"matrix_matrix.md",
-           #              "Jacobi" => "jacobi_method.md",
-           #              "ASP" => "asp.md",
+                         "Jacobi method" => "jacobi_method.md",
+                         "All pairs of shortest paths" => "asp.md",
           #              "Solutions" => "solutions.md",
                         ],
          ],
--- a/notebooks/asp.ipynb
+++ b/notebooks/asp.ipynb
@ -65,33 +65,22 @@
   "source": [
    "### Floyd's sequential algoritm\n",
    "\n",
-    "The ASP problem can be solved with the  [Floyd–Warshall algorithm](https://en.wikipedia.org/wiki/Floyd%E2%80%93Warshall_algorithm). A sequential implementation of this algorithm is given in this function."
+    "The ASP problem can be solved with the  [Floyd–Warshall algorithm](https://en.wikipedia.org/wiki/Floyd%E2%80%93Warshall_algorithm). A sequential implementation of this algorithm is given in the following function:"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": null,
   "id": "4fe447c5",
   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "floyd! (generic function with 1 method)"
-      ]
-     },
-     "execution_count": 1,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
+   "outputs": [],
   "source": [
    "function floyd!(C)\n",
    "  n = size(C,1)\n",
    "  @assert size(C,2) == n\n",
    "  for k in 1:n\n",
-    "    for i in 1:n\n",
-    "      for j in 1:n\n",
+    "    for j in 1:n\n",
+    "      for i in 1:n\n",
    "        @inbounds C[i,j] = min(C[i,j],C[i,k]+C[k,j])\n",
    "      end\n",
    "    end\n",
@ -110,25 +99,10 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": null,
   "id": "860e537c",
   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "4×4 Matrix{Int64}:\n",
-       "  0  9  6  1\n",
-       "  2  0  8  3\n",
-       "  5  3  0  6\n",
-       " 10  8  5  0"
-      ]
-     },
-     "execution_count": 2,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
+   "outputs": [],
   "source": [
    "inf = 1000\n",
    "C = [\n",
@ -154,8 +128,8 @@
    "```julia\n",
    "n = size(C,1)\n",
    "for k in 1:n\n",
-    "    for i in 1:n\n",
-    "        for j in 1:n\n",
+    "    for j in 1:n\n",
+    "        for i in 1:n\n",
    "            C[i,j] = min(C[i,j],C[i,k]+C[k,j])\n",
    "        end\n",
    "    end\n",
@ -248,6 +222,69 @@
    "</div>"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "id": "c7027ac3",
+   "metadata": {},
+   "source": [
+    "### Serial performance\n",
+    "\n",
+    "This algorithm is memory bound, meaning that the main cost is in getting and setting data from the input matrix `C`. In this situations, the order in which we traverse the entries of matrix `C` has a significant performance impact.\n",
+    "\n",
+    "The following function computes the same result as for the previous function `floyd!`, but the nesting of loops over i and j is changed.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "75cac17e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "function floyd2!(C)\n",
+    "  n = size(C,1)\n",
+    "  @assert size(C,2) == n\n",
+    "  for k in 1:n\n",
+    "    for i in 1:n\n",
+    "      for j in 1:n\n",
+    "        @inbounds C[i,j] = min(C[i,j],C[i,k]+C[k,j])\n",
+    "      end\n",
+    "    end\n",
+    "  end\n",
+    "  C\n",
+    "end"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "399385e8",
+   "metadata": {},
+   "source": [
+    " Compare the performance of both implementations (run the cell several times)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "907bc8c9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "n = 1000\n",
+    "C = rand(n,n)\n",
+    "@time floyd!(C)\n",
+    "C = rand(n,n)\n",
+    "@time floyd2!(C);"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ad811b10",
+   "metadata": {},
+   "source": [
+    "The performance difference is significant. Matrices in Julia are stored in memory in column-major order (like in Fortran, unlike in C). It means that it is more efficient to access the data also in column-major order (like in function `floyd!`). See this section of [Julia's performance tips](https://docs.julialang.org/en/v1/manual/performance-tips/#man-performance-column-major) if you are interested in further details."
+   ]
+  },
  {
   "cell_type": "markdown",
   "id": "0c95ea88",
@ -264,8 +301,8 @@
    "```julia\n",
    "n = size(C,1)\n",
    "for k in 1:n\n",
-    "    for i in 1:n\n",
-    "        for j in 1:n\n",
+    "    for j in 1:n\n",
+    "        for i in 1:n\n",
    "            C[i,j] = min(C[i,j],C[i,k]+C[k,j])\n",
    "        end\n",
    "    end\n",
@ -401,8 +438,12 @@
   "source": [
    "- Each process updates $N^2/P$ entries per iteration\n",
    "- 1 process broadcasts a message of length $N$ to $P-1$ processes per iteration\n",
+    "- The send cost in this process is $O(N P)$ per iteration (if we use send/receive instead of broadcast)\n",
    "- $P-1$ processes receive one message of length $N$ per iteration\n",
-    "- The receive/computation ration is $O(P/N)$ which would be small if $P<<N$"
+    "- The receive cost is $O(N)$ per iteration at each process\n",
+    "- The send/computation ratio is $O(P^2/N)$\n",
+    "- The receive/computation ratio is $O(P/N)$\n",
+    "- The algorithm is potentially scalable if $P<<N$"
   ]
  },
  {
@ -425,50 +466,7 @@
   "id": "494aa965",
   "metadata": {},
   "source": [
-    "## Parallel Implementation"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "c624722a",
-   "metadata": {},
-   "source": [
-    "### Generating test data"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "09937668",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "function rand_distance_table(n)\n",
-    "  threshold = 0.4\n",
-    "  mincost = 3\n",
-    "  maxcost = 10\n",
-    "  infinity = 10000*maxcost\n",
-    "  C = fill(infinity,n,n)\n",
-    "  for j in 1:n\n",
-    "    for i in 1:n\n",
-    "      if rand() > threshold\n",
-    "        C[i,j] = rand(mincost:maxcost)\n",
-    "      end\n",
-    "    end\n",
-    "    C[j,j] = 0\n",
-    "  end\n",
-    "  C\n",
-    "end"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "3116096c",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "rand_distance_table(10)"
+    "## Parallel Implementation\n"
   ]
  },
  {
@ -511,7 +509,9 @@
   "id": "680e56cf",
   "metadata": {},
   "source": [
-    "### Code"
+    "### Code\n",
+    "\n",
+    "We split the code in two functions. The first function is called on the main process (the process running this notebook). It splits the input matrix into blocks of rows. Then, we call `floyd_worker!` (see below) remotely on each worker using the corresponding block of rows.\n"
   ]
  },
  {
@ -534,6 +534,14 @@
    "end"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "id": "9fc3be11",
+   "metadata": {},
+   "source": [
+    "The second function is the one run on the workers. Note that we considered MPI for communication in this case."
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": null,
@ -560,8 +568,8 @@
    "        else\n",
    "            MPI.Recv!(C_k,comm,source=MPI.ANY_SOURCE,tag=0)\n",
    "        end\n",
-    "        for i in 1:m\n",
-    "            for j in 1:n\n",
+    "        for j in 1:n\n",
+    "            for i in 1:m\n",
    "                @inbounds Cw[i,j] = min(Cw[i,j],Cw[i,k]+C_k[j])\n",
    "            end\n",
    "        end\n",
@ -570,6 +578,39 @@
    "end"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "id": "c624722a",
+   "metadata": {},
+   "source": [
+    "### Testing the implementation"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "09937668",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "function rand_distance_table(n)\n",
+    "  threshold = 0.4\n",
+    "  mincost = 3\n",
+    "  maxcost = 10\n",
+    "  infinity = 10000*maxcost\n",
+    "  C = fill(infinity,n,n)\n",
+    "  for j in 1:n\n",
+    "    for i in 1:n\n",
+    "      if rand() > threshold\n",
+    "        C[i,j] = rand(mincost:maxcost)\n",
+    "      end\n",
+    "    end\n",
+    "    C[j,j] = 0\n",
+    "  end\n",
+    "  C\n",
+    "end"
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": null,
@ -654,21 +695,25 @@
    "- Use synchronous send MPI_SSEND (less efficient). Note that the blocking send MPI_SEND used above does not guarantee that the message was received.\n",
    "- Barrier at the end of each iteration over $k$ (simple solution, but synchronization overhead)\n",
    "- Order incoming messages (buffering and extra user code needed)\n",
-    "- Use a specific rank id instead of `MPI.ANY_SOURCE` (one needs to know which are the rows owned by the other ranks)"
+    "- Use a specific rank id instead of `MPI.ANY_SOURCE` or use `MPI.Bcast!` (one needs to know which are the rows owned by the other ranks)"
   ]
  },
  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "db2b586f",
+   "cell_type": "markdown",
+   "id": "c789dc7a",
   "metadata": {},
-   "outputs": [],
-   "source": []
+   "source": [
+    "# License\n",
+    "\n",
+    "\n",
+    "\n",
+    "This notebook is part of the course [Programming Large Scale Parallel Systems](https://www.francescverdugo.com/XM_40017) at Vrije Universiteit Amsterdam and may be used under a [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) license."
+   ]
  }
 ],
 "metadata": {
  "kernelspec": {
-   "display_name": "Julia 1.9.1",
+   "display_name": "Julia 1.9.0",
   "language": "julia",
   "name": "julia-1.9"
  },
@ -676,7 +721,7 @@
   "file_extension": ".jl",
   "mimetype": "application/julia",
   "name": "julia",
-   "version": "1.9.1"
+   "version": "1.9.0"
  }
 },
 "nbformat": 4,
--- a/notebooks/jacobi_method.ipynb
+++ b/notebooks/jacobi_method.ipynb