This commit is contained in:
Francesc Verdugo 2023-07-26 11:25:39 +02:00
parent 413e27df23
commit 5b47a090a5

View File

@ -13,7 +13,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "b4fef0c2",
"id": "b4d5f1a1",
"metadata": {},
"outputs": [],
"source": [
@ -22,23 +22,14 @@
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": null,
"id": "2f8ba040",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"alg_2_deps_check (generic function with 1 method)"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"using Distributed\n",
"using BenchmarkTools\n",
"using Printf\n",
"if procs() == workers()\n",
" addprocs(4)\n",
"end\n",
@ -72,8 +63,7 @@
"\n",
"- Parallelize a simple algorithm\n",
"- Study the performance of different parallelization strategies\n",
"- Implement them using Julia\n",
"- Learn concepts such as communication overhead and parallel speedup. "
"- Implement them using Julia"
]
},
{
@ -103,7 +93,7 @@
},
{
"cell_type": "markdown",
"id": "4cb6e98f",
"id": "a358ee60",
"metadata": {},
"source": [
"### Goals\n",
@ -167,7 +157,7 @@
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": null,
"id": "af8dfb37",
"metadata": {},
"outputs": [],
@ -208,12 +198,11 @@
{
"cell_type": "code",
"execution_count": null,
"id": "899235d1",
"id": "725387f6",
"metadata": {},
"outputs": [],
"source": [
"using LinearAlgebra\n",
"using BenchmarkTools\n",
"N = 1000\n",
"A = rand(N,N)\n",
"B = rand(N,N)\n",
@ -290,7 +279,7 @@
"source": [
"### Data dependencies\n",
"\n",
"Moving data through the network is expensive and reducing data movement is one of the key points in distributed algorithm. To this end, we determine which is the minimum data needed by a worker to perform its computations.\n",
"Moving data through the network is expensive and reducing data movement is one of the key points in a distributed algorithm. To this end, we determine which is the minimum data needed by a worker to perform its computations.\n",
"\n",
"In algorithm 1, each worker computes only an entry of the result matrix C."
]
@ -312,7 +301,7 @@
},
{
"cell_type": "markdown",
"id": "28c04679",
"id": "be3c4a01",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-success\">\n",
@ -328,7 +317,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "d350220b",
"id": "a8b7d1e1",
"metadata": {},
"outputs": [],
"source": [
@ -463,7 +452,7 @@
},
{
"cell_type": "markdown",
"id": "7192ee22",
"id": "b8eb224d",
"metadata": {},
"source": [
"### Experimental speedup\n",
@ -474,7 +463,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "8fd20dac",
"id": "cc698aa8",
"metadata": {},
"outputs": [],
"source": [
@ -493,7 +482,7 @@
},
{
"cell_type": "markdown",
"id": "044c4d97",
"id": "dac6a50b",
"metadata": {},
"source": [
"### Communication overhead\n",
@ -507,14 +496,14 @@
"3. The worker sends back the result of C[i,j] to the master process\n",
"\n",
"<div class=\"alert alert-block alert-success\">\n",
"<b>Question:</b> How many scalars are communicated from an to a worker? Assume that matrices A, B, and C are N by N matrices.\n",
"<b>Question:</b> How many scalars are communicated from and to a worker? Assume that matrices A, B, and C are N by N matrices.\n",
"</div>\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b05e43f9",
"id": "e78cbc7b",
"metadata": {},
"outputs": [],
"source": [
@ -523,7 +512,7 @@
},
{
"cell_type": "markdown",
"id": "e661d4f9",
"id": "b27a4d3f",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-success\">\n",
@ -534,7 +523,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "2f60d9fc",
"id": "fcc9b903",
"metadata": {},
"outputs": [],
"source": [
@ -543,7 +532,7 @@
},
{
"cell_type": "markdown",
"id": "55eb3ff5",
"id": "d4c301de",
"metadata": {},
"source": [
"From these results we can conclude:\n",
@ -696,7 +685,7 @@
},
{
"cell_type": "markdown",
"id": "c13dd6af",
"id": "8de835b9",
"metadata": {},
"source": [
"Test it using next cell"
@ -719,7 +708,7 @@
},
{
"cell_type": "markdown",
"id": "e8553faa",
"id": "f1f30faf",
"metadata": {},
"source": [
"### Experimental speedup"
@ -728,7 +717,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "b3c3ffb7",
"id": "fe42f069",
"metadata": {},
"outputs": [],
"source": [
@ -781,7 +770,7 @@
},
{
"cell_type": "markdown",
"id": "1c54b0ae",
"id": "a2038e04",
"metadata": {},
"source": [
"The communication and computation cost are still of the same order of magnitude even though we have increased the grain size. "
@ -789,7 +778,7 @@
},
{
"cell_type": "markdown",
"id": "63f5e59f",
"id": "71088fb9",
"metadata": {},
"source": [
"## Parallel algorithm 3\n",
@ -804,7 +793,7 @@
}
},
"cell_type": "markdown",
"id": "0c38af09",
"id": "f1b8c712",
"metadata": {},
"source": [
"<div>\n",
@ -814,7 +803,7 @@
},
{
"cell_type": "markdown",
"id": "db099c49",
"id": "4d456fed",
"metadata": {},
"source": [
"### Data dependencies"
@ -822,7 +811,7 @@
},
{
"cell_type": "markdown",
"id": "b5290fcf",
"id": "67b65ea6",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-success\">\n",
@ -833,7 +822,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "ac898985",
"id": "2825385a",
"metadata": {},
"outputs": [],
"source": [
@ -842,12 +831,12 @@
},
{
"cell_type": "markdown",
"id": "4f8dbc8c",
"id": "429faa32",
"metadata": {},
"source": [
"### Implementation\n",
"\n",
"These are the main steps of the implementation of algorithm 2:\n",
"These are the main steps of the implementation of algorithm 3:\n",
"\n",
"1. The worker receives the corresponding rows A[rows,:] and matrix B from the master process\n",
"2. The worker computes the product of A[rows,:] times B\n",
@ -861,7 +850,7 @@
}
},
"cell_type": "markdown",
"id": "188ce727",
"id": "c14ebcb3",
"metadata": {},
"source": [
"<div>\n",
@ -871,7 +860,7 @@
},
{
"cell_type": "markdown",
"id": "e5e4eafa",
"id": "7d7952c1",
"metadata": {},
"source": [
"The implementation of this variant is let as an exercise (see below)."
@ -879,7 +868,7 @@
},
{
"cell_type": "markdown",
"id": "dce52b5b",
"id": "0323d6d8",
"metadata": {},
"source": [
"### Communication overhead\n",
@ -894,7 +883,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "d6dbbf50",
"id": "50b8bf53",
"metadata": {},
"outputs": [],
"source": [
@ -903,7 +892,7 @@
},
{
"cell_type": "markdown",
"id": "ba449065",
"id": "3f0d99e6",
"metadata": {},
"source": [
"In this case, the ratio between communication and computation is O(P/N). If the matrix size N is much larger than the number of workers P, then the communication overhead O(P/N) would be negligible. This opens the door to an scalable implementation."
@ -911,7 +900,7 @@
},
{
"cell_type": "markdown",
"id": "706cb6ea",
"id": "7bb65f2e",
"metadata": {},
"source": [
"## Summary\n",
@ -934,7 +923,7 @@
},
{
"cell_type": "markdown",
"id": "8a1048b3",
"id": "8b83e744",
"metadata": {},
"source": [
"## Exercises"
@ -942,7 +931,7 @@
},
{
"cell_type": "markdown",
"id": "3e3a0c49",
"id": "a628a1df",
"metadata": {},
"source": [
"### Implementation of algorithm 3\n",
@ -953,7 +942,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "c2941c87",
"id": "8e50b923",
"metadata": {},
"outputs": [],
"source": [
@ -984,7 +973,7 @@
},
{
"cell_type": "markdown",
"id": "9810107f",
"id": "4506dcfb",
"metadata": {},
"source": [
"Use test-driven development to implement the algorithm. Use this test:"
@ -993,7 +982,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "d2df240f",
"id": "28cde36a",
"metadata": {},
"outputs": [],
"source": [
@ -1009,7 +998,7 @@
},
{
"cell_type": "markdown",
"id": "968d8237",
"id": "03952b0b",
"metadata": {},
"source": [
"Measure the performance of your implementation by running next cell. Do you get close to the optimal speedup?"
@ -1018,7 +1007,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "c0d43911",
"id": "b3aa2b7c",
"metadata": {},
"outputs": [],
"source": [
@ -1055,21 +1044,10 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": null,
"id": "023b20d1",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"matmul_dist_1_v2! (generic function with 1 method)"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"function matmul_dist_1_v2!(C, A, B)\n",
" m = size(C,1)\n",
@ -1107,21 +1085,10 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": null,
"id": "c1d3595b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\u001b[32m\u001b[1mTest Passed\u001b[22m\u001b[39m"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"using Test\n",
"N = 50\n",
@ -1133,7 +1100,7 @@
},
{
"cell_type": "markdown",
"id": "b49ee366",
"id": "ab609c18",
"metadata": {},
"source": [
"Run the next cell to check the performance of this implementation. Note that we are far away from the optimal speed up. Why? To answer this question compute the theoretical communication over computation ratio for this implementation and reason about the obtained result. Hint: the number of times a worker is spawned in this implementation is N^3/P on average."
@ -1141,20 +1108,10 @@
},
{
"cell_type": "code",
"execution_count": 18,
"id": "9a4c526c",
"execution_count": null,
"id": "d7d31710",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Speedup = 0.0012612337176382187\n",
"Optimal speedup = 4\n",
"Efficiency = 0.03153084294095547%\n"
]
}
],
"outputs": [],
"source": [
"N = 100\n",
"A = rand(N,N)\n",
@ -1172,7 +1129,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "4f2d0d9b",
"id": "cd31d955",
"metadata": {},
"outputs": [],
"source": []