XM_40017/notebooks/julia_intro.ipynb
2023-08-11 12:28:43 +02:00

747 lines
16 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"id": "ae2a0512",
"metadata": {},
"source": [
"\n",
"# Is Julia fast?"
]
},
{
"cell_type": "markdown",
"id": "e040636a",
"metadata": {},
"source": [
"## Contents\n",
"\n",
"With this notebook, you will learn\n",
"\n",
"- Two basic julia concepts related performance:\n",
" - type-inference\n",
" - JIT compilation\n",
"- Some Julia syntax\n",
"- Some useful Julia packages"
]
},
{
"cell_type": "markdown",
"id": "beef4d5e",
"metadata": {},
"source": [
"## Using Jupyter notebooks in Julia\n",
"\n",
"We are going to use Jupyter notebooks in this and other lectures. You provably have worked with notebooks (in Python). If not, here are the basic concepts you need to know to follow the lessons.\n",
"\n",
"<div class=\"alert alert-block alert-info\">\n",
"<b>Tip:</b> Did you know that Jupyter stands for Julia, Python and R?\n",
"</div>\n",
"\n",
"### How to start a Jupyter nootebook in Julia\n",
"\n",
"To run a Julia Jupyther notebook, open a Julia REPL and type\n",
"\n",
"```julia\n",
"julia> ]\n",
"pkg> add IJulia\n",
"julia> using IJulia\n",
"julia> notebook()\n",
"```\n",
"A new browser window will open. Navigate to the corresponding notebook and open it.\n",
"\n",
"<div class=\"alert alert-block alert-warning\">\n",
"<b>Warning:</b> Make sure that the notebook is using the same Julia version as the one you used to launch `IJulia`. If it is not the same, go to Kernel > Change Kernel and choose the right version.\n",
"</div>\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "7cdd5195",
"metadata": {},
"source": [
"### Running a cell\n",
"\n",
"To run a cell, click on a cell and press `Shift` + `Enter`. You can also use the \"Run\" button in the toolbar above."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ec0f9041",
"metadata": {},
"outputs": [],
"source": [
"1+3\n",
"4*5"
]
},
{
"cell_type": "markdown",
"id": "a8474251",
"metadata": {},
"source": [
"As you can see from the output of previous cell, the value of the last line is displayed. We can suppress the output with a semicolon. Try it. Execute next cell."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fb3eaec4",
"metadata": {},
"outputs": [],
"source": [
"1+3\n",
"4*5;"
]
},
{
"cell_type": "markdown",
"id": "0823876a",
"metadata": {},
"source": [
"### Cell order is important\n",
"\n",
"Running the two cells below in reverse order won't work (try it). "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bc1da6cb",
"metadata": {},
"outputs": [],
"source": [
"foo() = \"Well done!\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ddea57d8",
"metadata": {},
"outputs": [],
"source": [
"foo()"
]
},
{
"cell_type": "markdown",
"id": "629189d9",
"metadata": {},
"source": [
"### REPL modes\n",
"\n",
"This is particular to Julia notebooks. You can use package, help, and shell mode just like in the Julia REPL."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "defa851f",
"metadata": {},
"outputs": [],
"source": [
"] add MPI"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1e6fbe66",
"metadata": {},
"outputs": [],
"source": [
"? print"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "123c0f6a",
"metadata": {},
"outputs": [],
"source": [
"; ls"
]
},
{
"cell_type": "markdown",
"id": "44c03fde",
"metadata": {},
"source": [
"## How fast is Julia code?"
]
},
{
"cell_type": "markdown",
"id": "f1e716db",
"metadata": {},
"source": [
"NB. Most of the examples below are taken from the lecture by S.G. Johnson at MIT. See here:\n",
"https://github.com/mitmath/18S096/blob/master/lectures/lecture1/Boxes-and-registers.ipynb"
]
},
{
"cell_type": "markdown",
"id": "6c088d9e",
"metadata": {},
"source": [
"### Example\n",
"\n",
"Sum entries of a given array $a = [a_1,a_2,...,a_n]$\n",
"\n",
" $$s = \\sum_{i=1}^n a_i$$\n",
"\n",
"### "
]
},
{
"cell_type": "markdown",
"id": "34e3c7dd",
"metadata": {},
"source": [
"### Hand-written sum function"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a672edac",
"metadata": {
"code_folding": []
},
"outputs": [],
"source": [
"function sum_hand(a)\n",
" s = zero(eltype(a))\n",
" for ai in a\n",
" s += ai\n",
" end\n",
" s\n",
"end"
]
},
{
"cell_type": "markdown",
"id": "0494a1b3",
"metadata": {},
"source": [
"### Test it\n",
"\n",
"The Julia macro `@test` which is provided in the `Test` package is useful to write (unit) tests in Julia."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ebac7b9e",
"metadata": {},
"outputs": [],
"source": [
"using Test"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "aa5505d4",
"metadata": {},
"outputs": [],
"source": [
"a = rand(5)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fc11923c",
"metadata": {},
"outputs": [],
"source": [
"@test sum_hand(a) ≈ sum(a)"
]
},
{
"cell_type": "markdown",
"id": "e23f3ddf",
"metadata": {},
"source": [
"## Benchmarking\n",
"\n",
"In Julia, the most straight-forward way of measuring the computation time of a piece of code is with the macro `@time`. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "caff0307",
"metadata": {},
"outputs": [],
"source": [
"a = rand(10^7);"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6d073236",
"metadata": {},
"outputs": [],
"source": [
"@time sum_hand(a)"
]
},
{
"cell_type": "markdown",
"id": "443ac8da",
"metadata": {},
"source": [
"Note that `@time` also measures the compile time of a function if it's the first call to that function. So make sure to run `@time` twice on a freshly compiled function in order to get a more meaningful result."
]
},
{
"cell_type": "markdown",
"id": "c664522e",
"metadata": {},
"source": [
"A part of getting rid of compilation time, one typically wants to measure the runtime several times and compute sole. To do this we can call our code in a for-loop and gather the runtimes using the Julia macro `@elapsed`. This measures the runtime of an expression in seconds, just as the `@time` macro, only `@elapsed` discards the result of the computation and returns the elapsed time instead."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bba864c1",
"metadata": {},
"outputs": [],
"source": [
"@elapsed sum_hand(a)"
]
},
{
"cell_type": "markdown",
"id": "70317d33",
"metadata": {},
"source": [
"## BenchmarkTools\n",
"\n",
"The `BenchmarkTools` extension package provides useful macros for sampling runtimes automatically. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "424ffe03",
"metadata": {},
"outputs": [],
"source": [
"using BenchmarkTools"
]
},
{
"cell_type": "markdown",
"id": "079ef7d0",
"metadata": {},
"source": [
"First of all, the `@benchmark` macro runs the code multiple times and gives out a lot of details: the minimum and maximum time, mean time, median time, number of samples taken, memory allocations, etc. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b5bb9a12",
"metadata": {},
"outputs": [],
"source": [
"bch_sum_hand = @benchmark sum_hand($a)"
]
},
{
"cell_type": "markdown",
"id": "c77d454b",
"metadata": {},
"source": [
"For quick sanity checks, one can use the `@btime` macro, which is a convenience wrapper around `@benchmark`. It returns only the minimum execution time and memory allocations. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fee5493d",
"metadata": {},
"outputs": [],
"source": [
"@btime sum_hand($a)"
]
},
{
"cell_type": "markdown",
"id": "3a1a774c",
"metadata": {},
"source": [
"Similar to the `@elapsed` macro, `BenchmarkTool`'s `@belapsed` discards the return value of the function and instead returns the minimum runtime in seconds. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8e0cdb8c",
"metadata": {},
"outputs": [],
"source": [
"@belapsed sum_hand($a)"
]
},
{
"cell_type": "markdown",
"id": "fb530b87",
"metadata": {},
"source": [
"As opposed to `@time` and `@elapsed`, `@btime` and `@belapsed` run the code several times and return the minimum runtime, thus eliminating possible compilation times from the measurement. "
]
},
{
"cell_type": "markdown",
"id": "782be14f",
"metadata": {},
"source": [
"### Built-in sum function"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "07df4939",
"metadata": {},
"outputs": [],
"source": [
"bch_sum = @benchmark sum($a)"
]
},
{
"cell_type": "markdown",
"id": "18972fa7",
"metadata": {},
"source": [
"### Hand-written sum in Python\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "038702c0",
"metadata": {},
"outputs": [],
"source": [
"using PyCall"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8e1dcbfa",
"metadata": {},
"outputs": [],
"source": [
"py\"\"\"\n",
"def sum_py_hand(A):\n",
" s = 0.0\n",
" for a in A:\n",
" s += a\n",
" return s\n",
"\"\"\"\n",
"sum_py_hand = py\"sum_py_hand\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "60ca4517",
"metadata": {},
"outputs": [],
"source": [
"@test sum(a) ≈ sum_py_hand(a)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2f009076",
"metadata": {},
"outputs": [],
"source": [
"bch_sum_py_hand = @benchmark sum_py_hand($a)"
]
},
{
"cell_type": "markdown",
"id": "22d29afa",
"metadata": {},
"source": [
"### Numpy sum "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0f7ffb8e",
"metadata": {},
"outputs": [],
"source": [
"using Conda"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e667fc80",
"metadata": {},
"outputs": [],
"source": [
"numpy = pyimport(\"numpy\")\n",
"sum_numpy = numpy[\"sum\"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ee0361df",
"metadata": {},
"outputs": [],
"source": [
"@test sum_numpy(a) ≈ sum(a)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d2f4e9ea",
"metadata": {},
"outputs": [],
"source": [
"bch_sum_numpy = @benchmark sum_numpy($a)"
]
},
{
"cell_type": "markdown",
"id": "d7c3cbd6",
"metadata": {},
"source": [
"### Sumary of the results\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5ce4c97a",
"metadata": {},
"outputs": [],
"source": [
"timings = [bch_sum_hand,bch_sum,bch_sum_py_hand,bch_sum_numpy]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9d5caabd",
"metadata": {},
"outputs": [],
"source": [
"methods = [\"sum_hand\",\"sum\",\"sum_py_hand\",\"sum_numpy\"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8b7a3661",
"metadata": {},
"outputs": [],
"source": [
"using DataFrames"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "82b11610",
"metadata": {},
"outputs": [],
"source": [
"df = DataFrame(method=methods,time=timings)"
]
},
{
"cell_type": "markdown",
"id": "f03c4281",
"metadata": {},
"source": [
"### Improving the hand-written sum in Julia\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7696719c",
"metadata": {},
"outputs": [],
"source": [
"# ✍️ Exercise 3\n",
"function sum_hand_fast(a)\n",
" s = 0.0\n",
" @simd for ai in a\n",
" s += ai\n",
" end\n",
" s\n",
"end"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0b461036",
"metadata": {},
"outputs": [],
"source": [
"@test sum_hand_fast(a) ≈ sum(a)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fc228cba",
"metadata": {},
"outputs": [],
"source": [
"@benchmark sum_hand_fast($a)"
]
},
{
"cell_type": "markdown",
"id": "328f9128",
"metadata": {},
"source": [
"## Conlcusions so far\n",
"\n",
"- Julia code (for loops) are much faster than in Python\n",
"- Julia code can be as fast as optimized C code"
]
},
{
"cell_type": "markdown",
"id": "f9ce5464",
"metadata": {},
"source": [
"## Why Julia is fast?\n",
"\n",
"- Julia is a compiled language (like C, C++, Fortran)\n",
"- Julia is JIT compiled (C, C++, Fortran are AOT compiled)\n",
"- Type declarations are optional in Julia\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "aa5254e4",
"metadata": {},
"source": [
"# Conclusion: Why we use Julia in this course\n",
"\n",
"- Julia code is fast (it can be as fast as C)\n",
"- Julia is a high-level language with simpler syntax than C \n",
"- Julia supports different parallel programming models\n",
"\n",
"We will look into the third point in a later section of this course. \n"
]
},
{
"cell_type": "markdown",
"id": "d4efafe2",
"metadata": {},
"source": [
"## Solution to the exercises\n",
"\n",
"### Solution to Exercise 1"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "17f73c0f",
"metadata": {},
"outputs": [],
"source": [
"function sum_hand(a)\n",
" s = 0.0\n",
" for ai in a\n",
" s += ai\n",
" end\n",
" s\n",
"end"
]
},
{
"cell_type": "markdown",
"id": "d12d6400",
"metadata": {},
"source": [
"### Solution to Exercise 2"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e39094da",
"metadata": {},
"outputs": [],
"source": [
"using Statistics\n",
"\n",
"a = rand(10^7)\n",
"num_it = 15\n",
"runtimes = zeros(num_it)\n",
"for i in 1:num_it\n",
" runtimes[i] = @elapsed sum_hand(a)\n",
"end\n",
"@show mean(runtimes) \n",
"@show std(runtimes)\n",
"@show minimum(runtimes)\n",
"@show maximum(runtimes);"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7943f1cc",
"metadata": {},
"outputs": [],
"source": [
"# ✍️ Exercise 3\n",
"function sum_hand_fast(a)\n",
" s = 0.0\n",
" @simd for ai in a\n",
" s += ai\n",
" end\n",
" s\n",
"end"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Julia 1.9.0",
"language": "julia",
"name": "julia-1.9"
},
"language_info": {
"file_extension": ".jl",
"mimetype": "application/julia",
"name": "julia",
"version": "1.9.0"
}
},
"nbformat": 4,
"nbformat_minor": 5
}