Synoema

Thread-based Concurrency

spawn · scope · pmap · race · gather · channels

Synoema has two concurrency layers. This page covers the thread-based layer: real OS threads, deep-copy message passing, and parallel map. For the async/await layer (stackless state machines, IO reactor, async TCP), see /async.

Numbers on this page are measured, not estimated. pmap speedup: 2.04× in JIT, 3.53× in interpreter on 10 cores (PM-4 stress test, list size 1024, 400-iter per-element loop). Source: docs/benchmarks.md §Concurrency.

spawn & scope pmap speedup

1. spawn — fire-and-forget OS thread

spawn starts a new OS thread and returns immediately. The caller does not wait for the spawned thread to finish. Use it for background tasks that communicate results via a channel.

PrimitiveTypeDescription
spawn f(Unit -> a) -> UnitStart f () in a new OS thread; caller continues immediately
-- Fire-and-forget: print from a background thread
main = scope {
  spawn (\_ -> print "hello from background")
  print "hello from main"
  -- Both lines print; order is not guaranteed
}

spawn is only valid inside a scope block. All spawned threads are joined when the scope closes, so values computed in a scope can safely outlive the threads that produced them.

spawn with channels

-- worker sends a triangular number over a channel
tri n acc = ? n == 0 -> acc : tri (n - 1) (acc + n)

worker ch n = send ch (tri n 0)

main = scope {
  ch = chan
  spawn (worker ch 15)   -- tri 15 = 120
  spawn (worker ch 16)   -- tri 16 = 136
  a = recv ch
  b = recv ch
  a + b    -- → 256
}

Channels are the primary way to return values from spawned threads. chan creates an unbounded MPSC channel; send ch v enqueues a value; recv ch blocks the caller until a value is available.

2. scope — scoped parallel block

scope { ... } creates a structured concurrency region. All spawn calls inside the block are joined before scope returns. The result of the scope block is the value of the last expression.

PrimitiveTypeDescription
scope { ... }aStructured block; all spawned threads are joined on exit; evaluates to the last expression
-- scope_parallel.sno — 4 parallel workers, channel-aggregated sum
-- tri 15 = 120, tri 16 = 136, tri 31 = 496, tri 32 = 528
-- Expected: 120 + 136 + 496 + 528 = 1280

tri n acc = ? n == 0 -> acc : tri (n - 1) (acc + n)

worker ch n = send ch (tri n 0)

main = scope {
  ch = chan
  spawn (worker ch 15)
  spawn (worker ch 16)
  spawn (worker ch 31)
  spawn (worker ch 32)
  a = recv ch
  b = recv ch
  c = recv ch
  d = recv ch
  a + b + c + d
}

Deep-copy across thread boundaries

When a value crosses a thread boundary (via send, or captured in a spawned closure), it is deep-copied. There is no shared mutable state between threads. Each thread owns its data independently. This eliminates data races by design — no locks, no reference counting across arenas.

Internally each thread gets its own bump-allocation arena. When send transmits a value, the runtime walks the value graph and copies every node into a fresh arena before the receiving thread touches it. This is the same Perceus-based RC strategy used for WASM heap management.

3. pmap — parallel map

pmap applies a function to each element of a list in parallel across all available cores. It is a drop-in replacement for map when the per-element work is CPU-bound and the list is long enough to amortize thread overhead.

PrimitiveTypeDescription
pmap f xs(a -> b) -> [a] -> [b]Apply f to each element of xs in parallel; return results in original order
-- parallel_compute.sno — CPU-bound pmap example
-- heavy x: sum_{i=1..400} (i*i - i) — result 21333200 independent of x
-- list [1..129] — 129 elements, well above PMAP_PAR_THRESHOLD=64
-- expected sum: 129 * 21333200 = 2751982800

heavy n acc i = ? i == 0 -> acc : heavy n (acc + (i * i - i)) (i - 1)

main = sum (pmap (\x -> heavy x 0 400) [1..129])

Measured speedup

BackendList sizePer-elem workCoresmap (sequential)pmap (parallel)Speedup
JIT1024400-iter heavy loop10120 ms60 ms2.04×
Interpreter12820-iter heavy loop102055 ms582 ms3.53×

Source: PM-4 stress test in docs/benchmarks.md §Concurrency.

Threshold and nested pmap

  • PMAP_PAR_THRESHOLD = 64 — lists shorter than 64 elements run sequentially. pmap on a short list is identical in result to map but carries a small closure-cloning overhead. Use map for short lists (PM-5).
  • Nested pmap falls back to sequential — if a function passed to pmap itself calls pmap, the inner call runs sequentially. This prevents unbounded thread explosion (PM-8).
  • Workersstd::thread::Builder with 64 MB stacks across available_parallelism() cores. Worker count is determined once at program start and is not configurable at runtime.

4. race and gather

race and gather are task combinators that run a list of async tasks in parallel. They are part of the async layer (Phase H1) but serve as natural complements to the thread-based primitives.

CombinatorTypeSemantics
race tasks[Task a] -> Task aRun all tasks in parallel; return the first to complete (others are dropped)
gather tasks[Task a] -> Task [a]Run all tasks in parallel; return a list of results in original order
-- race and gather in action
-- Run: sno jit examples/async_race.sno

async fn fast_task = 1
async fn slow_task = _ = await (async_sleep 200); 2

async fn main =
  -- race: first to complete wins; slow_task result is discarded
  winner = await (race [slow_task fast_task])

  -- gather: both finish, results collected in order
  both = await (gather [fast_task fast_task])

  winner   -- → 1

race uses an AtomicBool winner flag so only the first completion is propagated. gather uses an AtomicUsize completion counter and collects results in original index order. Each sub-task gets its own OS thread. race and gather require async fn / await — see /async for the full async stack.

5. Channels

Channels are the primary synchronization primitive. Values sent over a channel are deep-copied, so the sender retains ownership of its copy and the receiver gets a fresh independent copy.

PrimitiveTypeDescription
chanChan aCreate an unbounded channel
bounded_chan nInt -> Chan aCreate a bounded channel with capacity n; send blocks when full
send ch vChan a -> a -> UnitSend value v (deep-copied); non-blocking on unbounded, blocking on bounded when full
recv chChan a -> aReceive next value; blocks until one is available
try_send ch vChan a -> a -> BoolNon-blocking send; returns false if the bounded channel is full
try_recv chChan a -> Maybe aNon-blocking receive; returns Nothing if the channel is empty
recv_timeout ms chInt -> Chan a -> Maybe aReceive with timeout in milliseconds; returns Nothing on timeout
select chans[Chan a] -> aBlock until any channel in the list has a value; return the first received value
-- bounded producer/consumer
main = scope {
  ch = bounded_chan 4     -- capacity 4; send blocks when full
  spawn (\_ ->
    send ch 1
    send ch 2
    send ch 3
    send ch 4
    send ch 5)            -- 5th send blocks until consumer recvs
  a = recv ch
  b = recv ch
  c = recv ch
  d = recv ch
  e = recv ch
  a + b + c + d + e       -- → 15
}
-- select: fan-in from multiple channels
main = scope {
  ch1 = chan
  ch2 = chan
  spawn (\_ -> send ch1 "from ch1")
  spawn (\_ -> send ch2 "from ch2")
  first = select [ch1 ch2]   -- whichever arrives first
  first
}

6. Deep-copy semantics

Synoema's concurrency model has no shared mutable state. Every value that crosses a thread boundary is deep-copied. This is a deliberate design choice that enables the compiler to guarantee data-race freedom without requiring locks, atomic types, or borrow checking.

-- Deep-copy example: sender retains its own copy
main = scope {
  ch = chan
  xs = [1, 2, 3, 4, 5]
  spawn (\_ -> send ch xs)   -- xs is deep-copied into ch
  -- xs is still valid here — the send did not move it
  received = recv ch
  -- received is an independent copy of [1, 2, 3, 4, 5]
  sum received               -- → 15
}

Consequences:

  • No data races — by construction, not by convention
  • Large values (big lists, nested records) incur a copy cost on send — use channels for results, not for streaming large data inside a tight loop
  • Each thread has its own bump-allocation arena; the GC (Perceus RC) never crosses thread boundaries
  • This is the same model used for WASM v2/v3 heap management

7. When to use what

SituationPrimitiveNotes
Background task, result returned via channelscope { spawn ... ; recv ch }Standard pattern; all threads joined on scope exit
CPU-bound parallel transform on a listpmap f xsUse when list has >64 elements and per-element work is significant
Parallel I/O or async operationsasync fn / awaitStackless; no OS thread per task; see /async
First result wins, others discardedrace [t1 t2 ...]Requires async fn; each sub-task gets its own OS thread
All results needed in parallel, orderedgather [t1 t2 ...]Requires async fn; results returned in original list order
Producer/consumer with backpressurebounded_chan nBlocks producer when capacity n is full
Fan-in from multiple channelsselect [ch1 ch2 ...]Returns the first value available across all channels
Non-blocking send/receivetry_send / try_recvReturns Bool / Maybe instead of blocking
Receive with deadlinerecv_timeout ms chReturns Nothing after ms milliseconds
Short lists (<64 elements)map f xspmap overhead exceeds benefit below threshold

8. Limitations (honest)

  • spawn is only valid inside scope. Calling spawn outside a scope block is a runtime error. The structured-concurrency model requires a join point.
  • pmap threshold is fixed at 64. There is no API to override PMAP_PAR_THRESHOLD. Lists shorter than 64 elements always run sequentially.
  • Nested pmap is sequential. The inner pmap call inside a parallel worker falls back to map. This is intentional to prevent unbounded thread creation (PM-8).
  • Deep-copy cost. Large values sent over channels incur a full deep-copy. This is not a problem for typical channel use (sending results, not streaming bulk data in a tight loop).
  • race/gather require async fn. The race and gather combinators work on Task a values. Pure synchronous functions must be wrapped in async fn to use them.
  • No shared memory. There are no atomics, mutexes, or shared references in user code. All cross-thread communication goes through channels.