mirror of
https://github.com/fverdugo/XM_40017.git
synced 2025-11-24 09:24:32 +01:00
Improve explanations ASP and LEQ notebooks
This commit is contained in:
@@ -37,10 +37,21 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 1,
|
||||
"id": "7e93809a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"ge_dep_check (generic function with 1 method)"
|
||||
]
|
||||
},
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"using Printf\n",
|
||||
"function answer_checker(answer,solution)\n",
|
||||
@@ -149,10 +160,21 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 2,
|
||||
"id": "e4070214",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"gaussian_elimination! (generic function with 1 method)"
|
||||
]
|
||||
},
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"function gaussian_elimination!(B)\n",
|
||||
" n,m = size(B)\n",
|
||||
@@ -182,12 +204,34 @@
|
||||
"</div>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "992301dd",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can verify that the algorithm computes the upper triangular matrix correctly for the example in the introduction by running the following code cell. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 4,
|
||||
"id": "eb30df0d",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"3×4 Matrix{Float64}:\n",
|
||||
" 1.0 3.0 1.0 9.0\n",
|
||||
" 0.0 1.0 2.0 8.0\n",
|
||||
" 0.0 0.0 1.0 4.0"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"A = Float64[1 3 1; 1 2 -1; 3 11 5]\n",
|
||||
"b = Float64[9,1,35]\n",
|
||||
@@ -195,14 +239,6 @@
|
||||
"gaussian_elimination!(B)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "8d941741",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The result is an upper triangular matrix which can be used to solve the system by backward substitution. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "39f2e8ef",
|
||||
@@ -277,7 +313,7 @@
|
||||
"The outer loop of the algorithm is not parallelizable, since the iterations depend on the results of the previous iterations. However, we can extract parallelism from the inner loops. Let's have a look at two different parallelization schemes. \n",
|
||||
"\n",
|
||||
"1. **Block-wise partitioning**: Each processor gets a block of subsequent rows. \n",
|
||||
"2. **Cyclic partitioning**: The rows are cyclicly distributed among the processors. "
|
||||
"2. **Cyclic partitioning**: The rows are alternately assigned to different processors. "
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -302,7 +338,9 @@
|
||||
"source": [
|
||||
"## What is the work per process at iteration k?\n",
|
||||
"To evaluate the efficiency of both partitioning schemes, consider how much work the processors do in the following example. \n",
|
||||
"In any iteration k, which part of the matrix is updated in the inner loops? "
|
||||
"In any iteration k, which part of the matrix is updated in the inner loops? \n",
|
||||
"\n",
|
||||
"### Block-wise partition"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -325,7 +363,7 @@
|
||||
"id": "d9d29899",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"It is clear from the code that at a given iteration `k`, the matrix is updated from row `k` to `n` and from column `k` to `m`. If we look at how that reflects the distribution of work over the processes, we can see that CPU 1 does not have any work, whereas CPU 2 does a little work and CPU 3 and 4 do a lot of work. Thus, the work load is _imbalanced_ across the different processes. "
|
||||
"It is clear from the code that at any given iteration `k`, the matrix is updated from row `k` to `n` and from column `k` to `m`. If we look at how that reflects the distribution of work over the processes, we can see that CPU 1 does not have any work, whereas CPU 2 does a little work and CPU 3 and 4 do a lot of work. "
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -350,14 +388,12 @@
|
||||
"source": [
|
||||
"### Load imbalance\n",
|
||||
"\n",
|
||||
"- CPUs with rows <k are idle during iteration k\n",
|
||||
"- Bad load balance means bad speedups, as some CPUs are waiting instead of doing useful work\n",
|
||||
"- Solution: cyclic partition \n",
|
||||
"The block-wise partitioning scheme leads to load imbalance across the processes: CPUs with rows $<k$ are idle during any iteration $k$. The bad load balance leads to bad speedups, as some CPUs are waiting instead of doing useful work. \n",
|
||||
" \n",
|
||||
"### Data dependencies\n",
|
||||
" \n",
|
||||
"<div class=\"alert alert-block alert-success\">\n",
|
||||
"<b>Question:</b> What are the data dependencies of this partitioning?\n",
|
||||
"<b>Question:</b> What are the data dependencies in the block-wise partitioning?\n",
|
||||
"</div>\n",
|
||||
"\n",
|
||||
" a) CPUs with rows >k need all rows <=k\n",
|
||||
@@ -408,7 +444,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Conclusion\n",
|
||||
"Cyclic partitioning tends to work well in problems with predictable load imbalance. It is a form of **static load balancing** which means using a pre-defined load schedule based on prior information about the algorithm (as opposed to **dynamic load balancing** which can schedule loads more flexibly during runtime). The data dependencies are the same as for the 1d block partitioning.\n",
|
||||
"Cyclic partitioning tends to work well in problems with predictable load imbalance. It is a form of **static load balancing** which means using a pre-defined load schedule based on prior information about the algorithm (as opposed to **dynamic load balancing** which can schedule loads flexibly during runtime). The data dependencies are the same as for the 1d block partitioning.\n",
|
||||
"\n",
|
||||
"At the same time, cyclic partitioning is not suitable for all communication patterns. For example, it can lead to a large communication overhead in the parallel Jacobi method, since the computation of each value depends on its neighbouring elements."
|
||||
]
|
||||
|
||||
Reference in New Issue
Block a user