diff --git a/notebooks/jacobi_method.ipynb b/notebooks/jacobi_method.ipynb
index 1c91c5f..08c01da 100644
--- a/notebooks/jacobi_method.ipynb
+++ b/notebooks/jacobi_method.ipynb
@@ -27,9 +27,9 @@
     "\n",
     "In this notebook, we will learn\n",
     "\n",
-    "- How to paralleize a Jacobi method\n",
+    "- How to paralleize the Jacobi method\n",
     "- How the data partition can impact the performance of a distributed algorithm\n",
-    "- How to use latency hiding\n",
+    "- How to use latency hiding to improve parallel performance\n",
     "\n"
    ]
   },
@@ -93,9 +93,12 @@
    "id": "93e84ff8",
    "metadata": {},
    "source": [
-    "When solving a Laplace equation in 1D, the Jacobi method leads to the following iterative scheme: The entry $i$ of vector $u$ at iteration $t+1$ is computed as:\n",
+    "When solving a [Laplace equation](https://en.wikipedia.org/wiki/Laplace%27s_equation) in 1D, the Jacobi method leads to the following iterative scheme: The entry $i$ of vector $u$ at iteration $t+1$ is computed as:\n",
     "\n",
-    "$u^{t+1}_i = \\dfrac{u^t_{i-1}+u^t_{i+1}}{2}$"
+    "$u^{t+1}_i = \\dfrac{u^t_{i-1}+u^t_{i+1}}{2}$\n",
+    "\n",
+    "\n",
+    "This iterative is yet simple but shares fundamental challenges with many other algorithms used in scientific computing. This is why we are studying it here.\n"
    ]
   },
   {
@@ -130,6 +133,14 @@
     "end"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "432bd862",
+   "metadata": {},
+   "source": [
+    "If you run it for zero iterations, we will see the initial condition."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -140,14 +151,78 @@
     "jacobi(5,0)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "c75cb9a6",
+   "metadata": {},
+   "source": [
+    "If you run it for enough iterations, you will see the expected solution of the Laplace equation: values that vary linearly form -1 to 1."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b52be374",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "jacobi(5,100)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "22fda724",
+   "metadata": {},
+   "source": [
+    "In our version of the jacobi method, we return after a given number of iterations. Other stopping criteria are possible. For instance, iterate until the difference between u and u_new is below a tolerance:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "15de7bf5",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "using LinearAlgebra: norm\n",
+    "function jacobi_with_tol(n,tol)\n",
+    "    u = zeros(n+2)\n",
+    "    u[1] = -1\n",
+    "    u[end] = 1\n",
+    "    u_new = copy(u)\n",
+    "    increment = similar(u)\n",
+    "    while true\n",
+    "        for i in 2:(n+1)\n",
+    "            u_new[i] = 0.5*(u[i-1]+u[i+1])\n",
+    "        end\n",
+    "        increment .= u_new .- u\n",
+    "        if norm(increment)/norm(u_new) < tol\n",
+    "            return u_new\n",
+    "        end\n",
+    "        u, u_new = u_new, u\n",
+    "    end\n",
+    "    u\n",
+    "end"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "697ad307",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "n = 5\n",
+    "tol = 1e-9\n",
+    "jacobi_with_tol(n,tol)"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "6e085701",
    "metadata": {},
    "source": [
-    "<div class=\"alert alert-block alert-info\">\n",
-    "<b>Note:</b> In our version of the jacobi method, we return after a given number of iterations. Other stopping criteria are possible. For instance, iterate until the difference between u and u_new is below a tolerance.\n",
-    "</div>"
+    "However, we are not going to parallelize this more complex in this notebook (we will consider it later in this course)."
    ]
   },
   {
@@ -156,7 +231,7 @@
    "metadata": {},
    "source": [
     "\n",
-    "### Where can we exploit parallelism?\n",
+    "## Where can we exploit parallelism?\n",
     "\n",
     "Look at the two nested loops in the sequential implementation:\n",
     "\n",
@@ -169,8 +244,8 @@
     "end\n",
     "```\n",
     "\n",
-    "- The outer loop cannot be parallelized. The value of `u` at step `t+1` depends on the value at the previous step `t`.\n",
-    "- The inner loop can be parallelized.\n",
+    "- The outer loop over `t` cannot be parallelized. The value of `u` at step `t+1` depends on the value at the previous step `t`.\n",
+    "- The inner loop is trivially parallel. The loop iterations are independent (any order is possible).\n",
     "\n"
    ]
   },
@@ -386,19 +461,11 @@
    "source": [
     "### Communication overhead\n",
     "- We update $N/P$ entries in each process at each iteration, where $N$ is the total length of the vector and $P$ the number of processes\n",
+    "- Thus, computation complexity is $O(N/P)$\n",
     "- We need to get remote entries from 2 neighbors (2 messages per iteration)\n",
     "- We need to communicate 1 entry per message\n",
-    "- Communication/computation ration is $O(P/N)$ (potentially scalable if $P<<N$)\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f6b54b7b",
-   "metadata": {},
-   "source": [
-    "## 1D Implementation\n",
-    "\n",
-    "We consider the implementation using MPI. The programming model of MPI is generally better suited for data-parallel algorithms like this one than the task-based model provided by Distributed.jl. In any case, one can also implement it using Distributed, but it requires some extra effort to setup the remote channels right for the communication between neighbor processes."
+    "- Thus, communication complexity is $O(1)$\n",
+    "- Communication/computation ration is $O(P/N)$, making the algorithm potentially scalable if $P<<N$.\n"
    ]
   },
   {
@@ -457,8 +524,10 @@
    "id": "8ed4129c",
    "metadata": {},
    "source": [
-    "### MPI Code\n",
+    "## MPI implementation\n",
     "\n",
+    "We consider the implementation using MPI. The programming model of MPI is generally better suited for data-parallel algorithms like this one than the task-based model provided by Distributed.jl. In any case, one can also implement it using Distributed.jl, but it requires some extra effort to setup the remote channels right for the communication between neighbor processes.\n",
+    " \n",
     "Take a look at the implementation below and try to understand it.\n"
    ]
   },
@@ -749,7 +818,7 @@
     "```\n",
     "\n",
     "- The outer loop cannot be parallelized (like in the 1d case). \n",
-    "- The two inner loops can be parallelized\n"
+    "- The two inner loops are trivially parallel\n"
    ]
   },
   {
diff --git a/notebooks/matrix_matrix.ipynb b/notebooks/matrix_matrix.ipynb
index 275712c..eb7dd6c 100644
--- a/notebooks/matrix_matrix.ipynb
+++ b/notebooks/matrix_matrix.ipynb
@@ -285,9 +285,14 @@
    "id": "0eedd28a",
    "metadata": {},
    "source": [
-    "### Where can we exploit parallelism?\n",
+    "## Where can we exploit parallelism?\n",
     "\n",
-    "Look at the three nested loops in the sequential implementation:\n",
+    "\n",
+    "The matrix-matrix multiplication is an example of [embarrassingly parallel algorithm](https://en.wikipedia.org/wiki/Embarrassingly_parallel). An embarrassingly parallel (also known as trivially parallel) algorithm is an algorithm that can be split in parallel tasks with no (or very few) dependences between them. Such algorithms are typically easy to parallelize.\n",
+    "\n",
+    "Which parts of an algorithm are completely independent and thus trivially parallel? To answer this question, it is useful to inspect the for loops, which are potential sources parallelism. If the iterations are independent of each other, then they are trivial to parallelize. An easy check to find out if the iterations are dependent or not is to change their order (for instance changing `for j in 1:n` by `for j in n:-1:1`, i.e. doing the loop in reverse). If the result changes, then the iterations are not independent.\n",
+    "\n",
+    "Look at the three nested loops in the sequential implementation of the matrix-matrix product:\n",
     "\n",
     "```julia\n",
     "for j in 1:n\n",
@@ -301,12 +306,10 @@
     "end\n",
     "```\n",
     "\n",
-    "To find out which parts of an algorithm can be parallelized it is useful to start by looking into the for loops. We can run the iterations of the for loop in parallel if the iterations are independent of each other and do not cause any side effect. An easy check to find out if the iterations are independent is checking what happens if we change their order (for instance changing `for j in 1:n` by `for j in n:-1:1`, i.e. doing the loop in reverse). Is the result independent of the loop order? Then one says that the iteration order is *overspecified* and the iterations are parallelizable (if there are not side effects).\n",
+    "Note that:\n",
     "\n",
-    "In our case:\n",
-    "\n",
-    "- Loops over `i` and `j` are parallelizable.\n",
-    "- The loop over `k` can be parallelized but it requires a reduction. Note that this loop causes a side effect on the outer variable `Cij`. This is why parallelizing this loop is not as easy as the other cases. We are not going to parallelize this loop in this notebook.\n",
+    "- Loops over `i` and `j` are trivially parallel.\n",
+    "- The loop over `k` is not trivially parallel. The accumulation into the reduction variable `Cij` introduces extra dependences. In addition, remember that the addition of floating point numbers is not strictly associative due to rounding errors. Thus, the result of this loop may change with the loop order when using floating point numbers. In any case, this loop can also be parallelized, but it requires a parallel *fold* or a parallel *reduction*.\n",
     "\n"
    ]
   },