JIT & ABI
Tagged pointers, heap layouts, Perceus refcounting, and the 198-function runtime FFI
The Synoema JIT (powered by Cranelift) represents every runtime value as a single i64. Type information lives in the lower 3 bits of each word; the upper 61 bits hold either a payload (for unboxed types) or an 8-byte-aligned heap pointer. This page documents the encoding, the heap node layouts that follow from it, the two-layer memory model (bump arena + Perceus RC), and the standard FFI pattern for adding new runtime capabilities.
Why tagged pointers?
Cranelift's native IR works with machine-word integers. Boxing every value into a heap-allocated object is slow; using separate type metadata is complex. Tagged pointers thread the needle: because all heap allocations are 8-byte aligned (guaranteed by the arena allocator), the lower 3 bits of any heap pointer are zero. Synoema steals those bits for tag information. For non-pointer values (integers, booleans, characters), the value is embedded directly in the upper bits.
The tag table
i64 value layout:
63 3 2 1 0
┌────────────────────────────────┬──┬──┬──┐
│ payload / pointer │t2│t1│t0│
└────────────────────────────────┴──┴──┴──┘
└──────┘
tag = lower 3 bits (v & 7)
| Tag | Type | Encoding |
|---|---|---|
0 | Int / Bool / List | Int: unboxed value; Bool: 0=false, 1=true; List: untagged arena pointer (distinguished by arena_contains_ptr) |
1 (CON_TAG) | ADT constructor | (v & !7) → ConNode* |
2 (STR_TAG) | String | (v & !7) → StrNode* |
3 (RATIONAL_TAG) | Rational | (v & !7) → RationalNode* |
4 (FLOAT_TAG) | Float | (v & !7) → FloatNode* |
5 (RECORD_TAG) | Record | (v & !7) → RecordNode* |
6 (CHAR_TAG) | Character | Unboxed: (codepoint << 3) | 6 |
7 (BYTES_TAG) | Bytes buffer | (v & !7) → StrNode* (same layout, different tag) |
Tag-0 disambiguation: integers, booleans, and list pointers all share tag 0. They are distinguished by value range: 0 = nil list, 0 = false, 1 = true, and arena pointers are ≥ 0x1000 within the arena bounds (verified by arena_contains_ptr()).
Char encoding
Characters are unboxed — no heap allocation. 'a' (codepoint 97) is stored as (97 << 3) | 6 = 776 | 6 = 782. Extract with (v >> 3) as u32.
Heap node layouts
Every heap node starts with rc: i64 at offset 0 (Perceus reference count, initialized to 1). This is a critical ABI invariant — the Perceus synoema_inc/synoema_dec functions read rc at offset 0 without knowing the node type.
ListNode — cons cell
#[repr(C)]
struct ListNode {
rc: i64, // offset 0: Perceus refcount
head: i64, // offset 8: the element (any i64-encoded value)
tail: i64, // offset 16: next ListNode* or 0 (nil)
}
// size: 24 bytes
A list [1 2 3] in memory:
{rc=1, head=1, tail=→} → {rc=1, head=2, tail=→} → {rc=1, head=3, tail=0}
Nil = 0i64 (null pointer with tag=0).
StrNode — string and bytes
#[repr(C)]
struct StrNode {
rc: i64, // offset 0: Perceus refcount
len: i64, // offset 8: byte length
// UTF-8 bytes follow immediately after (inline, not a pointer)
}
// size: 16 + len bytes (padded to 8-byte alignment)
For Bytes values, the layout is identical but the tag is BYTES_TAG=7 instead of STR_TAG=2.
FloatNode
#[repr(C)]
struct FloatNode {
rc: i64, // offset 0: Perceus refcount
bits: i64, // offset 8: f64 bit pattern (via f64::to_bits())
}
// size: 16 bytes
ClosureNode
#[repr(C)]
struct ClosureNode {
rc: i64, // offset 0: Perceus refcount
fn_ptr: i64, // offset 8: Cranelift-compiled function pointer
env_ptr: i64, // offset 16: pointer to array of captured variables
}
// size: 24 bytes
Environment is a separately allocated array of i64 values (one per captured variable). Lambda lifting determines which variables need to be captured.
ConNode — ADT constructor (variable size)
offset 0: rc: i64
offset 8: tag: i64 (constructor index within the ADT)
offset 16: field_0: i64
offset 24: field_1: i64
...
ConNode is not a fixed Rust struct — it's built dynamically with arena allocation proportional to the number of constructor fields.
Memory management — two layers
Layer 1: Arena Allocator (bump allocation)
• 8 MB thread-local buffer
• All JIT heap allocations go here
• O(1) allocation: bump an offset pointer
• Freed in bulk by arena_reset() after each program run
Layer 2: Perceus Reference Counting
• Each heap node has rc: i64 at offset 0
• synoema_inc(val) increments rc
• synoema_dec(val) decrements rc; rc=0 → push to reuse_pool
• Reuse pool: up to 256 dead nodes recycled for future allocs
• Does NOT free to OS — arena is the backing store
Arena lifecycle
You don't manage individual deallocations in JIT runtime functions — just allocate from the arena and let arena_reset() clean up at run boundaries. Perceus is an optimization layer that enables reuse within a single run.
synoema_inc and synoema_dec
// Increment: called when a value is shared
pub extern "C" fn synoema_inc(val: i64) -> i64 {
let raw = heap_ptr_of(val);
if raw != 0 {
unsafe { *(raw as *mut i64) += 1; }
}
0
}
// Decrement: called when a value is consumed
pub extern "C" fn synoema_dec(val: i64) -> i64 {
let raw = heap_ptr_of(val);
if raw != 0 {
unsafe {
let rc = *(raw as *mut i64) - 1;
*(raw as *mut i64) = rc;
if rc == 0 { reuse_pool_push(raw); }
}
}
0
}
The Perceus pass in synoema-core/perceus.rs inserts these calls automatically based on ownership analysis. See Optimizer for how that pass works.
Runtime FFI pattern
Adding a new runtime capability to the JIT requires three steps. This is the standard pattern used for all 198 JIT FFI functions.
Step 1: Implement in runtime.rs
// lang/crates/synoema-codegen/src/runtime.rs
/// Return the byte length of a string.
#[unsafe(no_mangle)]
pub extern "C" fn synoema_str_len(val: i64) -> i64 {
if !is_str(val) { return 0; }
let ptr = (val & !STR_TAG) as *const StrNode;
unsafe { (*ptr).len }
}
Rules: extern "C" + #[unsafe(no_mangle)]; all parameters and return type are i64; access heap nodes via the untagged pointer.
Step 2: Register the symbol in Compiler::new()
builder.symbol("synoema_str_len", runtime::synoema_str_len as *const u8);
Step 3: Declare the function signature
// sig1 = fn(i64) -> i64 (already declared elsewhere; reuse it)
decl(self, "synoema_str_len", "str_len", &sig1)?;
After these three steps, the function is callable from JIT-compiled code using the short name "str_len". Add a doctest in src/lib.rs, ensure the interpreter has an equivalent in synoema-eval/src/eval.rs, run cargo test -p synoema-codegen.
TLS soft-error channel (Phase H3)
Async error propagation in the JIT cannot rely on panic! because Cranelift-generated frames have no Rust unwinding tables — a panic crossing a Cranelift frame is undefined behavior. Phase H3 (scope_result / try_await JIT fix) replaces panic-through-Cranelift with a thread-local soft-error channel:
thread_local! {
static SCOPE_RESULT_ERROR: RefCell<Option<String>> = RefCell::new(None);
static TASK_ERROR: RefCell<Option<String>> = RefCell::new(None);
}
Three rules govern the channel:
synoema_errornever panics. It writes the message intoSCOPE_RESULT_ERROR(orTASK_ERROR) and returns a sentinel value. Cranelift frames see only normal returns.synoema_task_completeand the G1 reactor wake-loop interceptTASK_ERROR. When set, the parent task is markedPanickedinstead ofDone;await_with_timeout/try_awaitlower it back toResult a Error/Maybe aat the language level.compile_and_runchecksSCOPE_RESULT_ERRORafter the JIT main returns. If set, the top-level result is wrapped inErr.
scope_result : (Unit -> a) -> Result a Error and try_await : Task a -> Result a Error are the two builtins that read these channels. Regression test: lang/crates/synoema-codegen/tests/phase_e_errors.rs (E10).
Async reactor handles (Phase G)
The mio-backed reactor (Phase G) introduces three integer handle namespaces that ride alongside the tagged-pointer ABI as plain i64 (tag=0) values:
- Task IDs —
i64keys intoMutex<HashMap<i64, PendingTask>>waker map - TCP socket fds — returned by
async_tcp_connect/async_tcp_listen/async_tcp_accept; passed back toasync_tcp_read/write/close - TLS handles (Phase 27) — opaque
Intvalues; in the interpreter they live in athread_local!RefCell<HashMap<i64, TlsConn>>; in the JIT, in aOnceLock<Mutex<HashMap<i64, TlsConn>>>global. Returned bytls_connect/tls_listen/tls_accept.
These handles deliberately do not get heap-tag bits — they are integers from the language's perspective. The runtime side uses the value as a key to look up the actual socket/connection in its global table.
Cross-references
- Architecture overview
- Type System
- Optimizer — how Perceus annotations are inserted
- Async / Await — how the reactor handles surface in user code
- Canonical doc on GitHub