{ "cells": [ { "cell_type": "markdown", "id": "287c5272", "metadata": {}, "source": [ "\n", "\n", "### Programming large-scale parallel systems\n" ] }, { "cell_type": "markdown", "id": "2133c064", "metadata": {}, "source": [ "# Distributed computing in Julia\n" ] }, { "cell_type": "markdown", "id": "a7b64d5a", "metadata": {}, "source": [ "## Contents\n", "\n", "In this notebook, we will learn the basics of distributed computing in Julia. In particular, we will focus on the Distributed module available in the Julia standard library. The main topics we are going to cover are:\n", "\n", "- How to create Julia processes\n", "- How to execute code remotely\n", "- How to send and receive data\n", "\n", "With this knowledge you will be able to implement simple and complex parallel algorithms in Julia." ] }, { "cell_type": "markdown", "id": "ec225103", "metadata": {}, "source": [ "
\n", "Note: Do not forget to execute the next cell before starting this notebook! \n", "
" ] }, { "cell_type": "code", "execution_count": null, "id": "aebf6482", "metadata": {}, "outputs": [], "source": [ "using Printf\n", "function answer_checker(answer,solution)\n", " if answer == solution\n", " \"🥳 Well done! \"\n", " else\n", " \"It's not correct. Keep trying! 💪\"\n", " end |> println\n", "end\n", "q_1_check(answer) = answer_checker(answer,\"a\")\n", "q_2_check(answer) = answer_checker(answer,\"b\")" ] }, { "cell_type": "markdown", "id": "01af032c", "metadata": {}, "source": [ "## How to create Julia processes\n", "\n", "First of all, we need several processes in order to run parallel algorithms *in parallel*. In this section, we discuss different ways to create new processes in Julia." ] }, { "cell_type": "markdown", "id": "036a25d7", "metadata": {}, "source": [ "### Adding processes locally\n", "\n", " The simplest way of creating processes for parallel computing is to add them locally in the current Julia session. This is done by using the following commands.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "c6fed889", "metadata": {}, "outputs": [], "source": [ "using Distributed" ] }, { "cell_type": "code", "execution_count": null, "id": "d16faba9", "metadata": {}, "outputs": [], "source": [ "addprocs(3)" ] }, { "cell_type": "markdown", "id": "f07ac76c", "metadata": {}, "source": [ "Last cell created 3 new Julia processes. By default, they run locally in the same computer as the current Julia session, using multiple cores if possible. However, it is also possible to start the new processes in other machines as long as they are interconnected (more details on this later).\n", "\n", "\n", "
\n", "Tip: We can also start new processes when launching Julia from the command line by suing the `-p` command-line argument. E.g., `$ julia -p 3 ` would launch Julia with 3 extra processes.\n", "
\n" ] }, { "cell_type": "markdown", "id": "d2e507dc", "metadata": {}, "source": [ "### Each process runs a separated Julia instance\n", "\n", "When adding the new processes, you can imagine that 3 new Julia REPLs have started under the hood (see figure below). The main point of the Distributed module is to provide a way of coordinating all these Julia processes to run code in parallel. It is important to note that each process runs in a separated Julia instance. This means that each process has its own memory space and therefore they do not share memory. This results in distributed-memory parallelism, and allows one to run processes in different machines.\n", " \n", "
\n", "\n", "
\n", " \n", " " ] }, { "cell_type": "markdown", "id": "da2ae9aa", "metadata": {}, "source": [ "### Basic info about processes\n", "\n", "The following functions provide basic information about the underlying processes. If more than one process is available, the first process is called the *main* or *master* and the other the *workers*. If only a single process is available, it is the master and the first worker simultaneously." ] }, { "cell_type": "code", "execution_count": null, "id": "9214029b", "metadata": {}, "outputs": [], "source": [ "procs()" ] }, { "cell_type": "code", "execution_count": null, "id": "7a8498e6", "metadata": {}, "outputs": [], "source": [ "workers()" ] }, { "cell_type": "code", "execution_count": null, "id": "c6a8d8e4", "metadata": {}, "outputs": [], "source": [ "nprocs()" ] }, { "cell_type": "code", "execution_count": null, "id": "6f8b2888", "metadata": {}, "outputs": [], "source": [ "nworkers()" ] }, { "cell_type": "code", "execution_count": null, "id": "8583dee0", "metadata": {}, "outputs": [], "source": [ "myid()" ] }, { "cell_type": "code", "execution_count": null, "id": "17f1541f", "metadata": {}, "outputs": [], "source": [ "@everywhere println(myid())" ] }, { "cell_type": "markdown", "id": "44633d04", "metadata": {}, "source": [ "In previous cell, we have used the macro `@everywhere` that evaluates the given code on all processes. As a result, each process will print its own process id." ] }, { "cell_type": "markdown", "id": "2d104013", "metadata": {}, "source": [ "### Creating workers in other machines\n", "\n", "For large parallel computations, one typically needs to use different computers in parallel. Function `addprocs` also provides a low-level method to start workers in other machines. Next code example would create 3 workers in `server1` and 4 new workers in server `server2` (see figure below). Under the hood, Julia connects via ssh to the other machines and starts the new processes there. In order this to work, the local computer and the remote servers need to be properly configured (see the Julia manual for details). \n", "\n", "\n", "\n", "\n", "```julia\n", "using Distributed\n", "machines = [(\"user@server1\",3),(\"user@server2\",4)]\n", "addprocs(machines)\n", "```\n", "\n", "\n", "\n" ] }, { "cell_type": "markdown", "id": "3869f1f7", "metadata": {}, "source": [ "\n", "
\n", "\n", "
\n", "\n" ] }, { "cell_type": "markdown", "id": "6c09b2e5", "metadata": {}, "source": [ "### Adding workers with ClusterManagers.jl\n", "\n", "Previous way of starting workers in other machines is very low-level. Happily, there is a Julia package called [ClusterManagers.jl](https://github.com/JuliaParallel/ClusterManagers.jl) that helps to create workers remotely in number of usual scenarios. For instance, when running the following code from the login node in a computer cluster, it will submit a job to the cluster queue allocating 128 threads. A worker will be generated for each one of these threads. If the compute node have 64 cores, 2 compute nodes will be used to create to contain the 128 workers (see below).\n", "\n", "\n", "```julia\n", "using Distributed\n", "using ClusterManagers\n", "addprocs(SlurmManager(128), partition=\"debug\", t=\"00:5:00\")\n", "```\n" ] }, { "cell_type": "markdown", "id": "7c470ed9", "metadata": {}, "source": [ "\n", "
\n", "\n", "
" ] }, { "cell_type": "markdown", "id": "8c8bc619", "metadata": {}, "source": [ "## Executing code remotely\n", "\n", "We have added new processes to our Julia session. Let's start using them!" ] }, { "cell_type": "markdown", "id": "76a8bf7a", "metadata": {}, "source": [ "### Function `remotecall`\n", "\n", "The most basic thing we can do with a remote processor is to execute a given function on it. This is done by using function `remotecall`. To make clear how local and remote executions compare, let's call a function locally and then remotely. Next cell uses function `ones` to create a matrix locally.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "e2d0803a", "metadata": {}, "outputs": [], "source": [ "a = ones(2,3)" ] }, { "cell_type": "markdown", "id": "a87fe170", "metadata": {}, "source": [ "The next cell does the same operation, but remotely on process 2. Note that `remotecall` takes the function we want to execute remotely, the process id where we want to execute it and, finally, the function arguments." ] }, { "cell_type": "code", "execution_count": null, "id": "78010eb4", "metadata": {}, "outputs": [], "source": [ "proc = 2\n", "ftr = remotecall(ones,proc,2,3)" ] }, { "cell_type": "markdown", "id": "b931f9c5", "metadata": {}, "source": [ "Note that `remotecall` does not return the result of the underlying function, but a `Future`. This object represents a reference to a task running on the remote process. To move a copy of the result to the current process we can use `fetch`." ] }, { "cell_type": "code", "execution_count": null, "id": "e8163967", "metadata": {}, "outputs": [], "source": [ "fetch(ftr)" ] }, { "cell_type": "markdown", "id": "613c4a20", "metadata": {}, "source": [ "### `remotecall` is asynchronous\n", "\n", "It is important to note that `remotecall` does not wait for the remote process to finish. It turns immediately. This can be checked be calling remotely the following function that sleeps for 10 secods and then generates a matrix." ] }, { "cell_type": "code", "execution_count": null, "id": "d918200b", "metadata": {}, "outputs": [], "source": [ "fun = (m,n) -> (sleep(10); ones(m,n))" ] }, { "cell_type": "markdown", "id": "d5fba3bc", "metadata": {}, "source": [ "When running next cell, it will return immediately, event though the remote process will sleep for 10 seconds. We can even run code in parallel. To try this execute the second next cell while the remote call is running in the worker." ] }, { "cell_type": "code", "execution_count": null, "id": "339e425b", "metadata": {}, "outputs": [], "source": [ "proc = 2\n", "ftr = remotecall(fun,proc,2,3)" ] }, { "cell_type": "code", "execution_count": null, "id": "8ff8dc16", "metadata": {}, "outputs": [], "source": [ "1+1" ] }, { "cell_type": "markdown", "id": "40f9d6c4", "metadata": {}, "source": [ "However, when fetching the result, the current process blocks waiting until the result is available in the remote process and arrives to its destination." ] }, { "cell_type": "code", "execution_count": null, "id": "76705f73", "metadata": {}, "outputs": [], "source": [ "fetch(ftr)" ] }, { "cell_type": "markdown", "id": "75cb8b37", "metadata": {}, "source": [ "### Useful macro: `@spawnat`\n", "\n", "You have provably realized that in order to use `remotecall` we have written auxiliary anonymous functions. They are needed to wrap the code we want to execute remotely. Writing these functions can be tedious. Happily, the macro `@spawnat` generates an auxiliary function from the given block of code and calls `remotecall` for us. For instance, the two following cells are equivalent." ] }, { "cell_type": "code", "execution_count": null, "id": "4b550925", "metadata": {}, "outputs": [], "source": [ "@spawnat proc ones(2,3)" ] }, { "cell_type": "code", "execution_count": null, "id": "2394cc7f", "metadata": {}, "outputs": [], "source": [ "fun = () -> ones(2,3)\n", "remotecall(fun,proc)" ] }, { "cell_type": "markdown", "id": "8631fd6f", "metadata": {}, "source": [ "### `@async` vs `@spawnat`\n", "\n", "The relation between `@async` and `@pawnat` is obvious. From the user perspective they work almost in the same way. However, `@async` generates a task that runs asynchronously in the current process, whereas `@spawnat` executes a task in a remote process in parallel. In both cases, the result is obtained using `fetch`. " ] }, { "cell_type": "code", "execution_count": null, "id": "7d681ed9", "metadata": {}, "outputs": [], "source": [ "tsk = @async begin\n", " sleep(3)\n", " zeros(2)\n", "end\n", "fetch(tsk)" ] }, { "cell_type": "code", "execution_count": null, "id": "c8141cff", "metadata": {}, "outputs": [], "source": [ "ftr = @spawnat :any begin\n", " sleep(3)\n", " zeros(2)\n", "end\n", "fetch(ftr)" ] }, { "cell_type": "markdown", "id": "10899cd4", "metadata": {}, "source": [ "### Another usefull macro: `@fetchfrom`\n", "\n", "Macro `@fetchfrom` is the blocking version of `@spawnat`. It blocks and returns the corresponding result instead of a `Future` object. " ] }, { "cell_type": "code", "execution_count": null, "id": "f17da98e", "metadata": { "scrolled": true }, "outputs": [], "source": [ "a = @fetchfrom proc begin\n", " sleep(3)\n", " zeros(2)\n", "end" ] }, { "cell_type": "markdown", "id": "e9d37079", "metadata": {}, "source": [ "## Data movement\n", "\n", "Data movement is a crucial part in distributed-memory computations and it is usually one of its main computational\n", "bottlenecks. Being aware of the data we are moving when using functions such as `remotecall` is important to write efficient distributed algorithms in Julia. Julia also provides a special type of channel, called remote channel, to send and receive data between processes." ] }, { "cell_type": "markdown", "id": "6d6e04d9", "metadata": {}, "source": [ "### Data movement in `remotecall` / `fetch`\n", "\n", "When usig `remotecall` we send to the remote process a function and its arguments. In this example, we send function name `+` and matrices `a` and `b` to proc 4. When fetching the result we receive a copy of the matrix from proc 4." ] }, { "cell_type": "code", "execution_count": null, "id": "fd901fcc", "metadata": {}, "outputs": [], "source": [ "proc = 4\n", "a = rand(10,10)\n", "b = rand(10,10)\n", "ftr = remotecall(+,proc,a,b)\n", "fetch(ftr);" ] }, { "cell_type": "markdown", "id": "eda62267", "metadata": {}, "source": [ "### Implicit data movement\n", "\n", "Be aware that data movements can be implicit. This usually happens when we execute remotely functions that capture variables. In the following example, we are also sending matrices `a` and `b` to proc 4, even though they do not appear as arguments in the remote call. These variables are captured by the anonymous function and will be sent to proc 4." ] }, { "cell_type": "code", "execution_count": null, "id": "fe9866ac", "metadata": {}, "outputs": [], "source": [ "proc = 4\n", "a = rand(10,10)\n", "b = rand(10,10)\n", "fun = () -> a+b\n", "ftr = remotecall(fun,proc)\n", "fetch(ftr);" ] }, { "cell_type": "markdown", "id": "3860df18", "metadata": {}, "source": [ "### Data movement with remote channels\n", "\n", "Another way of moving data between processes is to use remote channels. Their usage is very similar to conventional channels for moving data between tasks, but there are some important differences. In the next cell, we create a remote channel. Process 4 puts several values and closes the channel. Like for conventional channels, calls to `put!` are blocking, but next cell is not blocking the master process since the call to `put!` runs asynchronously on process 4." ] }, { "cell_type": "code", "execution_count": null, "id": "98260fd3", "metadata": {}, "outputs": [], "source": [ "fun = ()->Channel{Int}()\n", "chnl = RemoteChannel(fun)" ] }, { "cell_type": "code", "execution_count": null, "id": "0a729a00", "metadata": {}, "outputs": [], "source": [ "@spawnat 4 begin\n", " for i in 1:5\n", " put!(chnl,i)\n", " end\n", " close(chnl)\n", "end;" ] }, { "cell_type": "markdown", "id": "c2ce01e0", "metadata": {}, "source": [ "We can take values from the remote channel form any process using `take!`. Run next cell several times. The sixth time it should raise and error since the channel was closed." ] }, { "cell_type": "code", "execution_count": null, "id": "82fbbb88", "metadata": {}, "outputs": [], "source": [ "take!(chnl)" ] }, { "cell_type": "markdown", "id": "1b3f8dcc", "metadata": {}, "source": [ "### Remote channels can be buffered\n", "\n", "Just like conventional channels, remote channels can be buffered. The buffer is stored in the process that owns the remote channel. By default this corresponds to process that creates the remote channel, but it can be a different one. For instance, process 3 will be the owner in the following example." ] }, { "cell_type": "code", "execution_count": null, "id": "426274ac", "metadata": {}, "outputs": [], "source": [ "buffer_size = 2\n", "owner = 3\n", "fun = ()->Channel{Int}(buffer_size)\n", "chnl = RemoteChannel(fun,owner)" ] }, { "cell_type": "code", "execution_count": null, "id": "db3b6fd5", "metadata": {}, "outputs": [], "source": [ "@spawnat 4 begin\n", " println(\"start\")\n", " for i in 1:5\n", " put!(chnl,i)\n", " println(\"I have put $i\")\n", " end\n", " close(chnl)\n", " println(\"stop\")\n", "end;" ] }, { "cell_type": "markdown", "id": "7e85654e", "metadata": {}, "source": [ "Note that since the channel is buffered, worker 4 can start putting values into it before any call to `take!`. Run next cell several times until the channel is closed." ] }, { "cell_type": "code", "execution_count": null, "id": "a8500649", "metadata": {}, "outputs": [], "source": [ "take!(chnl)" ] }, { "cell_type": "markdown", "id": "b8b411f1", "metadata": {}, "source": [ "### Remote channels are not iterable\n", "\n", "One main difference with respect to conventional channels is that remote channels cannot be iterated. Let's repeat the example above." ] }, { "cell_type": "code", "execution_count": null, "id": "18c60a68", "metadata": {}, "outputs": [], "source": [ "fun = ()->Channel{Int}()\n", "chnl = RemoteChannel(fun)" ] }, { "cell_type": "code", "execution_count": null, "id": "73088996", "metadata": {}, "outputs": [], "source": [ "@spawnat 4 begin\n", " for i in 1:5\n", " put!(chnl,i)\n", " end\n", " close(chnl)\n", "end;" ] }, { "cell_type": "markdown", "id": "4bb35283", "metadata": {}, "source": [ "Now, try to iterate over the channel in a for loop. It will result in an error since channels are not iterable." ] }, { "cell_type": "code", "execution_count": null, "id": "1a9bcca0", "metadata": {}, "outputs": [], "source": [ "for j in chnl\n", " @show j\n", "end" ] }, { "cell_type": "markdown", "id": "8eaea2fa", "metadata": {}, "source": [ "If we want to take values form a remote channel and stop automatically when the channel is closed, we can combine a while loop and a try-catch statement. This works since `take!` raises an error if the channel is closed, which will execute the `catch` block and breaks the loop." ] }, { "cell_type": "code", "execution_count": null, "id": "72acd664", "metadata": {}, "outputs": [], "source": [ "while true\n", " try\n", " j = take!(chnl)\n", " @show j\n", " catch\n", " break\n", " end\n", "end" ] }, { "cell_type": "markdown", "id": "1a3986c9", "metadata": {}, "source": [ "## Questions" ] }, { "cell_type": "markdown", "id": "fd22b74b", "metadata": {}, "source": [ "\n", "
\n", "Question (Q1): How many integers are transferred between master and worker? Including both directions. \n", "
\n", "\n", "\n", "\n", "```julia\n", "a = rand(Int,4,4)\n", "proc = 4\n", "@fetchfrom proc sum(a^2)\n", "```\n", "\n", " a) 17\n", " b) 32\n", " c) 16^2\n", " d) 65" ] }, { "cell_type": "code", "execution_count": null, "id": "de4e32eb", "metadata": {}, "outputs": [], "source": [ "answer = \"x\" #Replace x with a, b, c, or d\n", "q_1_check(answer)" ] }, { "cell_type": "markdown", "id": "dbe373d1", "metadata": {}, "source": [ "
\n", "Question (Q2): How many integers are transferred between master and worker? Including both directions. \n", "
\n", "\n", "\n", "\n", "```julia\n", "a = rand(Int,4,4)\n", "proc = 4\n", "@fetchfrom proc sum(a[2,2]^2)\n", "```\n", "\n", " a) 2\n", " b) 17\n", " c) 5\n", " d) 32" ] }, { "cell_type": "code", "execution_count": null, "id": "a234ceaa", "metadata": {}, "outputs": [], "source": [ "answer = \"x\" #Replace x with a, b, c, or d\n", "q_2_check(answer)" ] }, { "cell_type": "markdown", "id": "c561a73d", "metadata": {}, "source": [ "\n", "
\n", "Question (Q3): Which value will be the value of `x` ? \n", "
\n" ] }, { "cell_type": "code", "execution_count": null, "id": "b71a2171", "metadata": {}, "outputs": [], "source": [ "a = zeros(Int,3)\n", "proc = 3\n", "@sync @spawnat proc a[2] = 2\n", "x = a[2]\n", "x" ] }, { "cell_type": "markdown", "id": "835080aa", "metadata": {}, "source": [ "
\n", "Question (Q4): Which value will be the value of `x` ? \n", "
\n", "\n", "Which value will be the value of `x` ?" ] }, { "cell_type": "code", "execution_count": null, "id": "b8678fd1", "metadata": {}, "outputs": [], "source": [ "a = zeros(Int,3)\n", "proc = myid()\n", "@sync @spawnat proc a[2] = 2\n", "x = a[2]\n", "x" ] }, { "cell_type": "markdown", "id": "9e985c61", "metadata": {}, "source": [ "## Remember: each process runs in a separated Julia instance\n", "\n", "In particular, this means that each process can load different functions or packages. In consequence, it is important to make sure that the code we run is defined in the corresponding process.\n", "\n" ] }, { "cell_type": "markdown", "id": "cdc07cba", "metadata": {}, "source": [ "### Functions are defined in a single process\n", "\n", "This is a very common pitfall when running parallel code. If we define a function in a process, it is not automatically available in the other processes. This is illustrated in the next example. The remote call in the last line in next cell will fail since the function `sleep_ones` is only being defined in the local process. " ] }, { "cell_type": "code", "execution_count": null, "id": "4544ca4c", "metadata": {}, "outputs": [], "source": [ "function sleep_ones(m,n)\n", " sleep(4)\n", " ones(m,n)\n", "end" ] }, { "cell_type": "code", "execution_count": null, "id": "b54a1a84", "metadata": {}, "outputs": [], "source": [ "proc = 3\n", "remotecall_fetch(sleep_ones,proc,3,4)" ] }, { "cell_type": "markdown", "id": "9ec5d8fe", "metadata": {}, "source": [ "To fix this, we can define the function on all processes with the `@everywhere` macro." ] }, { "cell_type": "code", "execution_count": null, "id": "7eb1e210", "metadata": {}, "outputs": [], "source": [ "@everywhere function sleep_ones(m,n)\n", " sleep(4)\n", " ones(m,n)\n", "end" ] }, { "cell_type": "code", "execution_count": null, "id": "b55cbedf", "metadata": {}, "outputs": [], "source": [ "proc = 3\n", "remotecall_fetch(sleep_ones,proc,3,4)" ] }, { "cell_type": "markdown", "id": "9f71e7fd", "metadata": {}, "source": [ "### Anonymous functions are available everywhere\n", "\n", "If a function has a name, Julia only sends the function name to the corresponding process. Then, Julia looks for the corresponding function code in the remote process and executes it. This is why the function needs to be defined also in the remote process. However, if a function is anonymous, Julia needs to send the complete function definition to the remote process. This is why anonymous functions do not need to be defined with the macro `@everywhere` to work in a remote call.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "d9734913", "metadata": {}, "outputs": [], "source": [ "fun = (m,n) -> (sleep(4);ones(m,n))" ] }, { "cell_type": "code", "execution_count": null, "id": "f08131c3", "metadata": {}, "outputs": [], "source": [ "proc = 3\n", "remotecall_fetch(fun,proc,3,4)" ] }, { "cell_type": "markdown", "id": "ebec4f5e", "metadata": {}, "source": [ "### Each proc uses packages independently\n", "\n", "When using a package in a process, it is not available in the other ones. For instance, if we load the `LinearAlgebra` package in the current process and use one of its exported functions in another process, we will get an error." ] }, { "cell_type": "code", "execution_count": null, "id": "e0cf3545", "metadata": {}, "outputs": [], "source": [ "using LinearAlgebra" ] }, { "cell_type": "code", "execution_count": null, "id": "0498748c", "metadata": {}, "outputs": [], "source": [ "@fetchfrom 3 norm([1,2,3])" ] }, { "cell_type": "markdown", "id": "97462cbd", "metadata": {}, "source": [ "To fix this, we can load the package on all processors with the `@everywhere` macro." ] }, { "cell_type": "code", "execution_count": null, "id": "14ee3498", "metadata": {}, "outputs": [], "source": [ "@everywhere using LinearAlgebra" ] }, { "cell_type": "code", "execution_count": null, "id": "f8b3ba73", "metadata": {}, "outputs": [], "source": [ "@fetchfrom 3 norm([1,2,3])" ] }, { "cell_type": "markdown", "id": "3743ae18", "metadata": {}, "source": [ "### Each process has its own active package environment\n", "\n", "This is another very common source of errors. You can check that if you activate the current directory, this will have no effect in the other processes." ] }, { "cell_type": "code", "execution_count": null, "id": "abf3afe5", "metadata": {}, "outputs": [], "source": [ "] activate ." ] }, { "cell_type": "markdown", "id": "4d8bfa90", "metadata": {}, "source": [ "We have activated the current folder. Now let's see which is the active project in another process, say process 2. You will see that process 2 is provably still using the global package environment." ] }, { "cell_type": "code", "execution_count": null, "id": "3c6f29fb", "metadata": {}, "outputs": [], "source": [ "@everywhere using Pkg" ] }, { "cell_type": "code", "execution_count": null, "id": "ee04b3cc", "metadata": {}, "outputs": [], "source": [ "@spawnat 2 Pkg.status();" ] }, { "cell_type": "markdown", "id": "8244ae89", "metadata": {}, "source": [ "To fix this, you need to activate the current directory on all processes." ] }, { "cell_type": "code", "execution_count": null, "id": "99991a93", "metadata": {}, "outputs": [], "source": [ "@everywhere Pkg.activate(\".\")" ] }, { "cell_type": "code", "execution_count": null, "id": "dfc8ea20", "metadata": {}, "outputs": [], "source": [ "@spawnat 2 Pkg.status();" ] }, { "cell_type": "markdown", "id": "4bfbe073", "metadata": {}, "source": [ "## Easy ways of parallelizing code\n", "\n", "A part from the low-level parallel routines we have seen so-far, Julia also provides much more simple ways to parallelizing loops and maps." ] }, { "cell_type": "markdown", "id": "89e4b22b", "metadata": {}, "source": [ "### Useful macro: @distributed\n", "\n", "This macro is used when we want to perform a very large for loops made of independent small iterations. To illustrate this, let's consider again the function that computes $\\pi$ with Leibniz formula." ] }, { "cell_type": "code", "execution_count": null, "id": "f5e27d24", "metadata": {}, "outputs": [], "source": [ "function compute_Ï€(n)\n", " s = 1.0\n", " for i in 1:n\n", " s += (isodd(i) ? -1 : 1) / (i*2+1)\n", " end\n", " 4*s\n", "end" ] }, { "cell_type": "markdown", "id": "d0921cb9", "metadata": {}, "source": [ "Paralelizing this function might require some work with low-level functions like `remotecall`, but it is trivial using the macro `@distributed`. This macro runs the for loop using the available processes and optionally reduces the result using a given reduction function (`+` in this case)." ] }, { "cell_type": "code", "execution_count": null, "id": "d4efb990", "metadata": {}, "outputs": [], "source": [ "function compute_Ï€_dist(n)\n", " s = 1.0\n", " r = @distributed (+) for i in 1:n\n", " (isodd(i) ? -1 : 1) / (i*2+1)\n", " end\n", " 4*(s+r)\n", "end" ] }, { "cell_type": "markdown", "id": "8dbc2240", "metadata": {}, "source": [ "Run next cell to measure the performance of the serial function for a large value of `n`. Run it at least 2 times to get rid of compilation times." ] }, { "cell_type": "code", "execution_count": null, "id": "ec604251", "metadata": {}, "outputs": [], "source": [ "@time compute_Ï€(4_000_000_000)" ] }, { "cell_type": "markdown", "id": "c21b12b5", "metadata": {}, "source": [ "Run next cell to measure the performance of the parallel function." ] }, { "cell_type": "code", "execution_count": null, "id": "60fca9ae", "metadata": {}, "outputs": [], "source": [ "@time compute_Ï€_dist(4_000_000_000)" ] }, { "cell_type": "markdown", "id": "f996ec0b", "metadata": {}, "source": [ "### Useful function: `pmap`\n", "\n", "This function is used when we want to call a very expensive function a small number of evaluations and we want to distribute these evaluations over the available processes. To illustrate the usage of `pmap` consider the following example. Next cell generates sixty 30x30 matrices. The goal is to compute the singular value decomposition of all of them. This operation is known to be expensive for large matrices. Thus, this is a perfect scenario for `pmap`." ] }, { "cell_type": "code", "execution_count": null, "id": "3d137f35", "metadata": {}, "outputs": [], "source": [ "a = [ rand(300,300) for i in 1:60];" ] }, { "cell_type": "markdown", "id": "b95d3558", "metadata": {}, "source": [ "First, lets measure the serial performance" ] }, { "cell_type": "code", "execution_count": null, "id": "9f6e302b", "metadata": {}, "outputs": [], "source": [ "using LinearAlgebra" ] }, { "cell_type": "code", "execution_count": null, "id": "74f16257", "metadata": {}, "outputs": [], "source": [ "@time svd.(a);" ] }, { "cell_type": "markdown", "id": "25de96e4", "metadata": {}, "source": [ "If we use `pmap` instead of broadcast, the different calls to `svd` will be distributed over the available processes." ] }, { "cell_type": "code", "execution_count": null, "id": "4f9c5c2d", "metadata": {}, "outputs": [], "source": [ "@time pmap(svd,a);" ] }, { "cell_type": "markdown", "id": "ad00cf8f", "metadata": {}, "source": [ "## Summary\n", "\n", "We have seen the basics of distributed computing in Julia. The programming model is essentially an extension of tasks and channels to parallel computations on multiple machines. The low-level functions are `remotecall` and `RemoteChannel`, but there are other functions and macros like `pmap` and `@distributed` that simplify the implementation of parallel algorithms." ] }, { "cell_type": "code", "execution_count": null, "id": "49d094e4", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Julia 1.9.0", "language": "julia", "name": "julia-1.9" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "1.9.0" } }, "nbformat": 4, "nbformat_minor": 5 }