Multi-Threading

    By default, Julia starts up with a single thread of execution. This can be verified by using the command :

    The number of execution threads is controlled either by using the /--threads command line argument or by using the JULIA_NUM_THREADS environment variable. When both are specified, then -t/--threads takes precedence.

    Julia 1.5

    The -t/--threads command line argument requires at least Julia 1.5. In older versions you must use the environment variable instead.

    Lets start Julia with 4 threads:

    1. $ julia --threads 4

    Let’s verify there are 4 threads at our disposal.

    1. julia> Threads.nthreads()
    2. 4

    But we are currently on the master thread. To check, we use the function

    1. julia> Threads.threadid()
    2. 1

    Note

    If you prefer to use the environment variable you can set it as follows in Bash (Linux/macOS):

    1. export JULIA_NUM_THREADS=4

    C shell on Linux/macOS, CMD on Windows:

    Powershell on Windows:

    1. $env:JULIA_NUM_THREADS=4

    Note that this must be done before starting Julia.

    The number of threads specified with -t/--threads is propagated to worker processes that are spawned using the -p/--procs or --machine-file command line options. For example, julia -p2 -t2 spawns 1 main process with 2 worker processes, and all three processes have 2 threads enabled. For more fine grained control over worker threads use addprocs and pass -t/--threads as exeflags.

    Data-race freedom

    You are entirely responsible for ensuring that your program is data-race free, and nothing promised here can be assumed if you do not observe that requirement. The observed results may be highly unintuitive.

    The best way to ensure this is to acquire a lock around any access to data that can be observed from multiple threads. For example, in most cases you should use the following code pattern:

    1. julia> lock(lk) do
    2. use(a)
    3. end
    4. julia> begin
    5. lock(lk)
    6. try
    7. use(a)
    8. finally
    9. unlock(lk)
    10. end
    11. end

    where lk is a lock (e.g. ReentrantLock()) and a data.

    Additionally, Julia is not memory safe in the presence of a data race. Be very careful about reading any data if another thread might write to it! Instead, always use the lock pattern above when changing data (such as assigning to a global or closure variable) accessed by other threads.

    1. Thread 1:
    2. global b = false
    3. global a = rand()
    4. global b = true
    5. Thread 2:
    6. while !b; end
    7. bad_read1(a) # it is NOT safe to access `a` here!
    8. Thread 3:
    9. while !@isdefined(a); end
    10. bad_read2(a) # it is NOT safe to access `a` here

    Let’s work a simple example using our native threads. Let us create an array of zeros:

    1. julia> a = zeros(10)
    2. 10-element Vector{Float64}:
    3. 0.0
    4. 0.0
    5. 0.0
    6. 0.0
    7. 0.0
    8. 0.0
    9. 0.0
    10. 0.0

    Let us operate on this array simultaneously using 4 threads. We’ll have each thread write its thread ID into each location.

    Julia supports parallel loops using the macro. This macro is affixed in front of a for loop to indicate to Julia that the loop is a multi-threaded region:

    The iteration space is split among the threads, after which each thread writes its thread ID to its assigned locations:

    1. julia> a
    2. 10-element Array{Float64,1}:
    3. 1.0
    4. 1.0
    5. 1.0
    6. 2.0
    7. 2.0
    8. 2.0
    9. 3.0
    10. 3.0
    11. 4.0
    12. 4.0

    Note that Threads.@threads does not have an optional reduction parameter like .

    Julia supports accessing and modifying values atomically, that is, in a thread-safe way to avoid race conditions. A value (which must be of a primitive type) can be wrapped as to indicate it must be accessed in this way. Here we can see an example:

    1. julia> i = Threads.Atomic{Int}(0);
    2. julia> ids = zeros(4);
    3. julia> old_is = zeros(4);
    4. julia> Threads.@threads for id in 1:4
    5. old_is[id] = Threads.atomic_add!(i, id)
    6. ids[id] = id
    7. end
    8. julia> old_is
    9. 4-element Array{Float64,1}:
    10. 0.0
    11. 1.0
    12. 7.0
    13. 3.0
    14. julia> ids
    15. 4-element Array{Float64,1}:
    16. 1.0
    17. 2.0
    18. 3.0
    19. 4.0

    Had we tried to do the addition without the atomic tag, we might have gotten the wrong answer due to a race condition. An example of what would happen if we didn’t avoid the race:

    1. julia> using Base.Threads
    2. julia> nthreads()
    3. 4
    4. julia> acc = Ref(0)
    5. Base.RefValue{Int64}(0)
    6. julia> @threads for i in 1:1000
    7. acc[] += 1
    8. end
    9. julia> acc[]
    10. 926
    11. julia> acc = Atomic{Int64}(0)
    12. Atomic{Int64}(0)
    13. atomic_add!(acc, 1)
    14. end
    15. julia> acc[]
    16. 1000

    Not all primitive types can be wrapped in an Atomic tag. Supported types are Int8, Int16, Int32, Int64, Int128, UInt8, UInt16, UInt32, UInt64, UInt128, Float16, Float32, and Float64. Additionally, Int128 and UInt128 are not supported on AAarch32 and ppc64le.

    When using multi-threading we have to be careful when using functions that are not pure as we might get a wrong answer. For instance functions that have a by convention modify their arguments and thus are not pure.

    External libraries, such as those called via ccall, pose a problem for Julia’s task-based I/O mechanism. If a C library performs a blocking operation, that prevents the Julia scheduler from executing any other tasks until the call returns. (Exceptions are calls into custom C code that call back into Julia, which may then yield, or C code that calls jl_yield(), the C equivalent of .)

    The @threadcall macro provides a way to avoid stalling execution in such a scenario. It schedules a C function for execution in a separate thread. A threadpool with a default size of 4 is used for this. The size of the threadpool is controlled via environment variable UV_THREADPOOL_SIZE. While waiting for a free thread, and during function execution once a thread is available, the requesting task (on the main Julia event loop) yields to other tasks. Note that @threadcall does not return until the execution is complete. From a user point of view, it is therefore a blocking call like other Julia APIs.

    It is very important that the called function does not call back into Julia, as it will segfault.

    @threadcall may be removed/changed in future versions of Julia.

    At this time, most operations in the Julia runtime and standard libraries can be used in a thread-safe manner, if the user code is data-race free. However, in some areas work on stabilizing thread support is ongoing. Multi-threaded programming has many inherent difficulties, and if a program using threads exhibits unusual or undesirable behavior (e.g. crashes or mysterious results), thread interactions should typically be suspected first.

    There are a few specific limitations and warnings to be aware of when using threads in Julia:

    • Base collection types require manual locking if used simultaneously by multiple threads where at least one thread modifies the collection (common examples include push! on arrays, or inserting items into a Dict).
    • After a task starts running on a certain thread (e.g. via @spawn), it will always be restarted on the same thread after blocking. In the future this limitation will be removed, and tasks will migrate between threads.
    • @threads currently uses a static schedule, using all threads and assigning equal iteration counts to each. In the future the default schedule is likely to change to be dynamic.
    • The schedule used by @spawn is nondeterministic and should not be relied on.
    • Compute-bound, non-memory-allocating tasks can prevent garbage collection from running in other threads that are allocating memory. In these cases it may be necessary to insert a manual call to GC.safepoint() to allow GC to run. This limitation will be removed in the future.
    • Avoid running top-level operations, e.g. include, or eval of type, method, and module definitions in parallel.
    • Be aware that finalizers registered by a library may break if threads are enabled. This may require some transitional work across the ecosystem before threading can be widely adopted with confidence. See the next section for further details.

    Safe use of Finalizers

    Because finalizers can interrupt any code, they must be very careful in how they interact with any global state. Unfortunately, the main reason that finalizers are used is to update global state (a pure function is generally rather pointless as a finalizer). This leads us to a bit of a conundrum. There are a few approaches to dealing with this problem:

    1. When single-threaded, code could call the internal jl_gc_enable_finalizers C function to prevent finalizers from being scheduled inside a critical region. Internally, this is used inside some functions (such as our C locks) to prevent recursion when doing certain operations (incremental package loading, codegen, etc.). The combination of a lock and this flag can be used to make finalizers safe.

    2. A second strategy, employed by Base in a couple places, is to explicitly delay a finalizer until it may be able to acquire its lock non-recursively. The following example demonstrates how this strategy could be applied to Distributed.finalize_ref:

      1. function finalize_ref(r::AbstractRemoteRef)
      2. if r.where > 0 # Check if the finalizer is already run
      3. if islocked(client_refs) || !trylock(client_refs)
      4. # delay finalizer for later if we aren't free to acquire the lock
      5. finalizer(finalize_ref, r)
      6. return nothing
      7. end
      8. try # `lock` should always be followed by `try`
      9. if r.where > 0 # Must check again here
      10. # Do actual cleanup here
      11. r.where = 0
      12. end
      13. finally
      14. unlock(client_refs)
      15. end
      16. end
      17. nothing
    3. A related third strategy is to use a yield-free queue. We don’t currently have a lock-free queue implemented in Base, but Base.InvasiveLinkedListSynchronized{T} is suitable. This can frequently be a good strategy to use for code with event loops. For example, this strategy is employed by Gtk.jl to manage lifetime ref-counting. In this approach, we don’t do any explicit work inside the finalizer, and instead add it to a queue to run at a safer time. In fact, Julia’s task scheduler already uses this, so defining the finalizer as x -> @spawn do_cleanup(x) is one example of this approach. Note however that this doesn’t control which thread runs on, so do_cleanup would still need to acquire a lock. That doesn’t need to be true if you implement your own queue, as you can explicitly only drain that queue from your thread.