{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "acfe5cc9", "metadata": {}, "source": [ "\n", "
\n", "\n", "
\n", "\n" ] }, { "cell_type": "markdown", "id": "ae2a0512", "metadata": {}, "source": [ "### Seminars Computational Science\n", "### Julia tutorial\n", "\n", "# Why is Julia fast?\n", "\n", "by Francesc Verdugo (VU Amsterdam)\n", "\n", "2022-11-24" ] }, { "cell_type": "markdown", "id": "e040636a", "metadata": {}, "source": [ "## Contents\n", "\n", "We will learn\n", "\n", "- Basic julia concepts:\n", " - type-inference\n", " - JIT compilation\n", "- Some Julia syntax\n", "- Some useful Julia packages" ] }, { "cell_type": "markdown", "id": "bc1d9ce2", "metadata": {}, "source": [ "## Using Jupyter notebooks in Julia" ] }, { "cell_type": "markdown", "id": "ff199820", "metadata": {}, "source": [ "### Running a cell\n", "\n", "Click on a cell and press `Shift` + `Enter`" ] }, { "cell_type": "code", "execution_count": null, "id": "a8e4fba2", "metadata": {}, "outputs": [], "source": [ "1+3\n", "4*5" ] }, { "cell_type": "markdown", "id": "0e035ee9", "metadata": {}, "source": [ "### Cell order is important" ] }, { "cell_type": "code", "execution_count": null, "id": "187db91d", "metadata": {}, "outputs": [], "source": [ "foo() = \"Well done!\"" ] }, { "cell_type": "code", "execution_count": null, "id": "a29023dc", "metadata": {}, "outputs": [], "source": [ "foo()" ] }, { "cell_type": "markdown", "id": "0b788d3b", "metadata": {}, "source": [ "### Package and help modes" ] }, { "cell_type": "code", "execution_count": null, "id": "23e1803d", "metadata": {}, "outputs": [], "source": [ "] add BenchmarkTools DataFrames PyCall Conda Test" ] }, { "cell_type": "code", "execution_count": null, "id": "4353e6ec", "metadata": {}, "outputs": [], "source": [ "? print" ] }, { "cell_type": "markdown", "id": "44c03fde", "metadata": {}, "source": [ "## How fast is Julia code?" ] }, { "cell_type": "markdown", "id": "6c088d9e", "metadata": {}, "source": [ "### Example\n", "\n", "Sum entries of a given array $a = [a_1,a_2,...,a_n]$\n", "\n", " $$s = \\sum_{i=1}^n a_i$$\n", "\n", "### " ] }, { "cell_type": "markdown", "id": "34e3c7dd", "metadata": {}, "source": [ "### Hand-written sum function\n", "\n", "### Exercise\n", "Write a function that computes the sum of all elements in array `a`. You can view the solution at the bottom of the notebook." ] }, { "cell_type": "code", "execution_count": null, "id": "a672edac", "metadata": { "code_folding": [] }, "outputs": [], "source": [ "# ✍️ Exercise 1\n", "function sum_hand(a)\n", " # TODO\n", "end" ] }, { "cell_type": "markdown", "id": "0494a1b3", "metadata": {}, "source": [ "### Test-driven development\n", "Next, you can test your solution. You can use the Julia macro `@test` which is provided in the `Test` package." ] }, { "cell_type": "code", "execution_count": null, "id": "ebac7b9e", "metadata": {}, "outputs": [], "source": [ "using Test" ] }, { "cell_type": "code", "execution_count": null, "id": "aa5505d4", "metadata": {}, "outputs": [], "source": [ "a = rand(5)" ] }, { "cell_type": "code", "execution_count": null, "id": "fc11923c", "metadata": {}, "outputs": [], "source": [ "@test sum_hand(a) ≈ sum(a)" ] }, { "cell_type": "markdown", "id": "e23f3ddf", "metadata": {}, "source": [ "## Benchmarking\n", "In order to track the performance of your code, it is useful to time the execution of single functions. In Julia, the most conventional way of measuring the computation time is the macro `@time`. " ] }, { "cell_type": "code", "execution_count": null, "id": "caff0307", "metadata": {}, "outputs": [], "source": [ "a = rand(10^7);" ] }, { "cell_type": "code", "execution_count": null, "id": "6d073236", "metadata": {}, "outputs": [], "source": [ "@time sum_hand(a)" ] }, { "cell_type": "markdown", "id": "c664522e", "metadata": {}, "source": [ "Note that `@time` also measures the compile time of a function if it's the first call to that function. So make sure to run `@time` twice on a freshly compiled function in order to get a more meaningful result. \n", "\n", "\n", "Now in order to benchmark our code, we need to run it several times. To do this we can call our code in a for-loop and gather the runtimes using the Julia macro `@elapsed`. This measures the runtime of an expression in seconds, just as the `@time` macro, only `@elapsed` discards the result of the computation and returns the elapsed time instead. " ] }, { "cell_type": "code", "execution_count": null, "id": "bba864c1", "metadata": {}, "outputs": [], "source": [ "@elapsed sum_hand(a)" ] }, { "cell_type": "markdown", "id": "70317d33", "metadata": {}, "source": [ "## BenchmarkTools\n", "\n", "The `BenchmarkTools` extension package provides useful macros for sampling runtimes automatically. " ] }, { "cell_type": "code", "execution_count": null, "id": "424ffe03", "metadata": {}, "outputs": [], "source": [ "using BenchmarkTools" ] }, { "cell_type": "markdown", "id": "079ef7d0", "metadata": {}, "source": [ "First of all, the `@benchmark` macro runs the code multiple times and gives out a lot of details: the minimum and maximum time, mean time, median time, number of samples taken, memory allocations, etc. " ] }, { "cell_type": "code", "execution_count": null, "id": "b5bb9a12", "metadata": {}, "outputs": [], "source": [ "bch_sum_hand = @benchmark sum_hand($a)" ] }, { "cell_type": "markdown", "id": "c77d454b", "metadata": {}, "source": [ "For quick sanity checks, one can use the `@btime` macro, which is a convenience wrapper around `@benchmark`. It returns only the minimum execution time and memory allocations. " ] }, { "cell_type": "code", "execution_count": null, "id": "fee5493d", "metadata": {}, "outputs": [], "source": [ "@btime sum_hand($a)" ] }, { "cell_type": "markdown", "id": "3a1a774c", "metadata": {}, "source": [ "Similar to the `@elapsed` macro, `BenchmarkTool`'s `@belapsed` discards the return value of the function and instead returns the minimum runtime in seconds. " ] }, { "cell_type": "code", "execution_count": null, "id": "8e0cdb8c", "metadata": {}, "outputs": [], "source": [ "@belapsed sum_hand($a)" ] }, { "cell_type": "markdown", "id": "fb530b87", "metadata": {}, "source": [ "As opposed to `@time` and `@elapsed`, `@btime` and `@belapsed` run the code several times and return the minimum runtime, thus eliminating possible compilation times from the measurement. " ] }, { "cell_type": "markdown", "id": "782be14f", "metadata": {}, "source": [ "### Built-in sum function" ] }, { "cell_type": "code", "execution_count": null, "id": "07df4939", "metadata": {}, "outputs": [], "source": [ "bch_sum = @benchmark sum($a)" ] }, { "cell_type": "markdown", "id": "18972fa7", "metadata": {}, "source": [ "### Hand-written sum in Python\n" ] }, { "cell_type": "code", "execution_count": null, "id": "038702c0", "metadata": {}, "outputs": [], "source": [ "using PyCall" ] }, { "cell_type": "code", "execution_count": null, "id": "8e1dcbfa", "metadata": {}, "outputs": [], "source": [ "py\"\"\"\n", "def sum_py_hand(A):\n", " s = 0.0\n", " for a in A:\n", " s += a\n", " return s\n", "\"\"\"\n", "sum_py_hand = py\"sum_py_hand\"" ] }, { "cell_type": "code", "execution_count": null, "id": "60ca4517", "metadata": {}, "outputs": [], "source": [ "@test sum(a) ≈ sum_py_hand(a)" ] }, { "cell_type": "code", "execution_count": null, "id": "2f009076", "metadata": {}, "outputs": [], "source": [ "bch_sum_py_hand = @benchmark sum_py_hand($a)" ] }, { "cell_type": "markdown", "id": "22d29afa", "metadata": {}, "source": [ "### Numpy sum " ] }, { "cell_type": "code", "execution_count": null, "id": "0f7ffb8e", "metadata": {}, "outputs": [], "source": [ "using Conda" ] }, { "cell_type": "code", "execution_count": null, "id": "e667fc80", "metadata": {}, "outputs": [], "source": [ "numpy = pyimport(\"numpy\")\n", "sum_numpy = numpy[\"sum\"]" ] }, { "cell_type": "code", "execution_count": null, "id": "ee0361df", "metadata": {}, "outputs": [], "source": [ "@test sum_numpy(a) ≈ sum(a)" ] }, { "cell_type": "code", "execution_count": null, "id": "d2f4e9ea", "metadata": {}, "outputs": [], "source": [ "bch_sum_numpy = @benchmark sum_numpy($a)" ] }, { "cell_type": "markdown", "id": "d7c3cbd6", "metadata": {}, "source": [ "### Sumary of the results\n" ] }, { "cell_type": "code", "execution_count": null, "id": "5ce4c97a", "metadata": {}, "outputs": [], "source": [ "timings = [bch_sum_hand,bch_sum,bch_sum_py_hand,bch_sum_numpy]" ] }, { "cell_type": "code", "execution_count": null, "id": "9d5caabd", "metadata": {}, "outputs": [], "source": [ "methods = [\"sum_hand\",\"sum\",\"sum_py_hand\",\"sum_numpy\"]" ] }, { "cell_type": "code", "execution_count": null, "id": "8b7a3661", "metadata": {}, "outputs": [], "source": [ "using DataFrames" ] }, { "cell_type": "code", "execution_count": null, "id": "82b11610", "metadata": {}, "outputs": [], "source": [ "df = DataFrame(method=methods,time=timings)" ] }, { "cell_type": "markdown", "id": "f03c4281", "metadata": {}, "source": [ "### Improving the hand-written sum in Julia\n" ] }, { "cell_type": "code", "execution_count": null, "id": "7696719c", "metadata": {}, "outputs": [], "source": [ "# ✍️ Exercise 3\n", "function sum_hand_fast(a)\n", " s = 0.0\n", " @simd for ai in a\n", " s += ai\n", " end\n", " s\n", "end" ] }, { "cell_type": "code", "execution_count": null, "id": "0b461036", "metadata": {}, "outputs": [], "source": [ "@test sum_hand_fast(a) ≈ sum(a)" ] }, { "cell_type": "code", "execution_count": null, "id": "fc228cba", "metadata": {}, "outputs": [], "source": [ "@benchmark sum_hand_fast($a)" ] }, { "cell_type": "markdown", "id": "328f9128", "metadata": {}, "source": [ "## Conlcusions so far\n", "\n", "- Julia code (for loops) are much faster than in Python\n", "- Julia code can be as fast as optimized C code" ] }, { "cell_type": "markdown", "id": "f9ce5464", "metadata": {}, "source": [ "## Why Julia is fast?\n", "\n", "- Julia is a compiled language (like C, C++, Fortran)\n", "- Julia is JIT compiled (C, C++, Fortran are AOT compiled)\n", "- Type declarations are optional in Julia\n", "\n" ] }, { "cell_type": "markdown", "id": "aa5254e4", "metadata": {}, "source": [ "# Conclusion: Why we use Julia in this course\n", "\n", "- Julia code is fast (it can be as fast as C)\n", "- Julia is a high-level language with simpler syntax than C \n", "- Julia supports different parallel programming models\n", "\n", "We will look into the third point in a later section of this course. \n" ] }, { "cell_type": "markdown", "id": "d4efafe2", "metadata": {}, "source": [ "## Solution to the exercises\n", "\n", "### Solution to Exercise 1" ] }, { "cell_type": "code", "execution_count": null, "id": "17f73c0f", "metadata": {}, "outputs": [], "source": [ "function sum_hand(a)\n", " s = 0.0\n", " for ai in a\n", " s += ai\n", " end\n", " s\n", "end" ] }, { "cell_type": "markdown", "id": "d12d6400", "metadata": {}, "source": [ "### Solution to Exercise 2" ] }, { "cell_type": "code", "execution_count": null, "id": "e39094da", "metadata": {}, "outputs": [], "source": [ "using Statistics\n", "\n", "a = rand(10^7)\n", "num_it = 15\n", "runtimes = zeros(num_it)\n", "for i in 1:num_it\n", " runtimes[i] = @elapsed sum_hand(a)\n", "end\n", "@show mean(runtimes) \n", "@show std(runtimes)\n", "@show minimum(runtimes)\n", "@show maximum(runtimes);" ] }, { "cell_type": "code", "execution_count": null, "id": "7943f1cc", "metadata": {}, "outputs": [], "source": [ "# ✍️ Exercise 3\n", "function sum_hand_fast(a)\n", " s = 0.0\n", " @simd for ai in a\n", " s += ai\n", " end\n", " s\n", "end" ] } ], "metadata": { "kernelspec": { "display_name": "Julia 1.8.5", "language": "julia", "name": "julia-1.8" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "1.8.5" } }, "nbformat": 4, "nbformat_minor": 5 }