Thread-based Concurrency
spawn · scope · pmap · race · gather · channels
Synoema has two concurrency layers. This page covers the thread-based layer: real OS threads, deep-copy message passing, and parallel map. For the async/await layer (stackless state machines, IO reactor, async TCP), see /async.
Numbers on this page are measured, not estimated. pmap speedup: 2.04× in JIT, 3.53× in interpreter on 10 cores (PM-4 stress test, list size 1024, 400-iter per-element loop). Source: docs/benchmarks.md §Concurrency.
1. spawn — fire-and-forget OS thread
spawn starts a new OS thread and returns immediately. The caller does not wait for the spawned thread to finish. Use it for background tasks that communicate results via a channel.
| Primitive | Type | Description |
|---|---|---|
spawn f | (Unit -> a) -> Unit | Start f () in a new OS thread; caller continues immediately |
-- Fire-and-forget: print from a background thread
main = scope {
spawn (\_ -> print "hello from background")
print "hello from main"
-- Both lines print; order is not guaranteed
}
spawn is only valid inside a scope block. All spawned threads are joined when the scope closes, so values computed in a scope can safely outlive the threads that produced them.
spawn with channels
-- worker sends a triangular number over a channel
tri n acc = ? n == 0 -> acc : tri (n - 1) (acc + n)
worker ch n = send ch (tri n 0)
main = scope {
ch = chan
spawn (worker ch 15) -- tri 15 = 120
spawn (worker ch 16) -- tri 16 = 136
a = recv ch
b = recv ch
a + b -- → 256
}
Channels are the primary way to return values from spawned threads. chan creates an unbounded MPSC channel; send ch v enqueues a value; recv ch blocks the caller until a value is available.
2. scope — scoped parallel block
scope { ... } creates a structured concurrency region. All spawn calls inside the block are joined before scope returns. The result of the scope block is the value of the last expression.
| Primitive | Type | Description |
|---|---|---|
scope { ... } | a | Structured block; all spawned threads are joined on exit; evaluates to the last expression |
-- scope_parallel.sno — 4 parallel workers, channel-aggregated sum
-- tri 15 = 120, tri 16 = 136, tri 31 = 496, tri 32 = 528
-- Expected: 120 + 136 + 496 + 528 = 1280
tri n acc = ? n == 0 -> acc : tri (n - 1) (acc + n)
worker ch n = send ch (tri n 0)
main = scope {
ch = chan
spawn (worker ch 15)
spawn (worker ch 16)
spawn (worker ch 31)
spawn (worker ch 32)
a = recv ch
b = recv ch
c = recv ch
d = recv ch
a + b + c + d
}
Deep-copy across thread boundaries
When a value crosses a thread boundary (via send, or captured in a spawned closure), it is deep-copied. There is no shared mutable state between threads. Each thread owns its data independently. This eliminates data races by design — no locks, no reference counting across arenas.
Internally each thread gets its own bump-allocation arena. When send transmits a value, the runtime walks the value graph and copies every node into a fresh arena before the receiving thread touches it. This is the same Perceus-based RC strategy used for WASM heap management.
3. pmap — parallel map
pmap applies a function to each element of a list in parallel across all available cores. It is a drop-in replacement for map when the per-element work is CPU-bound and the list is long enough to amortize thread overhead.
| Primitive | Type | Description |
|---|---|---|
pmap f xs | (a -> b) -> [a] -> [b] | Apply f to each element of xs in parallel; return results in original order |
-- parallel_compute.sno — CPU-bound pmap example
-- heavy x: sum_{i=1..400} (i*i - i) — result 21333200 independent of x
-- list [1..129] — 129 elements, well above PMAP_PAR_THRESHOLD=64
-- expected sum: 129 * 21333200 = 2751982800
heavy n acc i = ? i == 0 -> acc : heavy n (acc + (i * i - i)) (i - 1)
main = sum (pmap (\x -> heavy x 0 400) [1..129])
Measured speedup
| Backend | List size | Per-elem work | Cores | map (sequential) | pmap (parallel) | Speedup |
|---|---|---|---|---|---|---|
| JIT | 1024 | 400-iter heavy loop | 10 | 120 ms | 60 ms | 2.04× |
| Interpreter | 128 | 20-iter heavy loop | 10 | 2055 ms | 582 ms | 3.53× |
Source: PM-4 stress test in docs/benchmarks.md §Concurrency.
Threshold and nested pmap
- PMAP_PAR_THRESHOLD = 64 — lists shorter than 64 elements run sequentially.
pmapon a short list is identical in result tomapbut carries a small closure-cloning overhead. Usemapfor short lists (PM-5). - Nested pmap falls back to sequential — if a function passed to
pmapitself callspmap, the inner call runs sequentially. This prevents unbounded thread explosion (PM-8). - Workers —
std::thread::Builderwith 64 MB stacks acrossavailable_parallelism()cores. Worker count is determined once at program start and is not configurable at runtime.
4. race and gather
race and gather are task combinators that run a list of async tasks in parallel. They are part of the async layer (Phase H1) but serve as natural complements to the thread-based primitives.
| Combinator | Type | Semantics |
|---|---|---|
race tasks | [Task a] -> Task a | Run all tasks in parallel; return the first to complete (others are dropped) |
gather tasks | [Task a] -> Task [a] | Run all tasks in parallel; return a list of results in original order |
-- race and gather in action
-- Run: sno jit examples/async_race.sno
async fn fast_task = 1
async fn slow_task = _ = await (async_sleep 200); 2
async fn main =
-- race: first to complete wins; slow_task result is discarded
winner = await (race [slow_task fast_task])
-- gather: both finish, results collected in order
both = await (gather [fast_task fast_task])
winner -- → 1
race uses an AtomicBool winner flag so only the first completion is propagated. gather uses an AtomicUsize completion counter and collects results in original index order. Each sub-task gets its own OS thread. race and gather require async fn / await — see /async for the full async stack.
5. Channels
Channels are the primary synchronization primitive. Values sent over a channel are deep-copied, so the sender retains ownership of its copy and the receiver gets a fresh independent copy.
| Primitive | Type | Description |
|---|---|---|
chan | Chan a | Create an unbounded channel |
bounded_chan n | Int -> Chan a | Create a bounded channel with capacity n; send blocks when full |
send ch v | Chan a -> a -> Unit | Send value v (deep-copied); non-blocking on unbounded, blocking on bounded when full |
recv ch | Chan a -> a | Receive next value; blocks until one is available |
try_send ch v | Chan a -> a -> Bool | Non-blocking send; returns false if the bounded channel is full |
try_recv ch | Chan a -> Maybe a | Non-blocking receive; returns Nothing if the channel is empty |
recv_timeout ms ch | Int -> Chan a -> Maybe a | Receive with timeout in milliseconds; returns Nothing on timeout |
select chans | [Chan a] -> a | Block until any channel in the list has a value; return the first received value |
-- bounded producer/consumer
main = scope {
ch = bounded_chan 4 -- capacity 4; send blocks when full
spawn (\_ ->
send ch 1
send ch 2
send ch 3
send ch 4
send ch 5) -- 5th send blocks until consumer recvs
a = recv ch
b = recv ch
c = recv ch
d = recv ch
e = recv ch
a + b + c + d + e -- → 15
}
-- select: fan-in from multiple channels
main = scope {
ch1 = chan
ch2 = chan
spawn (\_ -> send ch1 "from ch1")
spawn (\_ -> send ch2 "from ch2")
first = select [ch1 ch2] -- whichever arrives first
first
}
6. Deep-copy semantics
Synoema's concurrency model has no shared mutable state. Every value that crosses a thread boundary is deep-copied. This is a deliberate design choice that enables the compiler to guarantee data-race freedom without requiring locks, atomic types, or borrow checking.
-- Deep-copy example: sender retains its own copy
main = scope {
ch = chan
xs = [1, 2, 3, 4, 5]
spawn (\_ -> send ch xs) -- xs is deep-copied into ch
-- xs is still valid here — the send did not move it
received = recv ch
-- received is an independent copy of [1, 2, 3, 4, 5]
sum received -- → 15
}
Consequences:
- No data races — by construction, not by convention
- Large values (big lists, nested records) incur a copy cost on
send— use channels for results, not for streaming large data inside a tight loop - Each thread has its own bump-allocation arena; the GC (Perceus RC) never crosses thread boundaries
- This is the same model used for WASM v2/v3 heap management
7. When to use what
| Situation | Primitive | Notes |
|---|---|---|
| Background task, result returned via channel | scope { spawn ... ; recv ch } | Standard pattern; all threads joined on scope exit |
| CPU-bound parallel transform on a list | pmap f xs | Use when list has >64 elements and per-element work is significant |
| Parallel I/O or async operations | async fn / await | Stackless; no OS thread per task; see /async |
| First result wins, others discarded | race [t1 t2 ...] | Requires async fn; each sub-task gets its own OS thread |
| All results needed in parallel, ordered | gather [t1 t2 ...] | Requires async fn; results returned in original list order |
| Producer/consumer with backpressure | bounded_chan n | Blocks producer when capacity n is full |
| Fan-in from multiple channels | select [ch1 ch2 ...] | Returns the first value available across all channels |
| Non-blocking send/receive | try_send / try_recv | Returns Bool / Maybe instead of blocking |
| Receive with deadline | recv_timeout ms ch | Returns Nothing after ms milliseconds |
| Short lists (<64 elements) | map f xs | pmap overhead exceeds benefit below threshold |
8. Limitations (honest)
- spawn is only valid inside scope. Calling
spawnoutside ascopeblock is a runtime error. The structured-concurrency model requires a join point. - pmap threshold is fixed at 64. There is no API to override
PMAP_PAR_THRESHOLD. Lists shorter than 64 elements always run sequentially. - Nested pmap is sequential. The inner
pmapcall inside a parallel worker falls back tomap. This is intentional to prevent unbounded thread creation (PM-8). - Deep-copy cost. Large values sent over channels incur a full deep-copy. This is not a problem for typical channel use (sending results, not streaming bulk data in a tight loop).
- race/gather require async fn. The
raceandgathercombinators work onTask avalues. Pure synchronous functions must be wrapped inasync fnto use them. - No shared memory. There are no atomics, mutexes, or shared references in user code. All cross-thread communication goes through channels.