Architecture
The compilation pipeline — .sno source to native, WASM, or interpreted execution
Synoema turns a .sno source file into one of three runtime forms: a tree-walking interpretation (semantically authoritative), a native binary via Cranelift (3× CPython median), or a WASM v3 module (browser, IoT, capability container). The same six-stage front-end feeds all three back-ends. Async, TLS, and the capability container ride on top of the JIT runtime via FFI — they are orthogonal to the pipeline below.
The pipeline
Source (.sno)
│
▼
┌───────────────────────────────────────────────────┐
│ Stage 1: Lexer (synoema-lexer) │
│ scanner.rs + layout.rs │
│ Source text → token stream + INDENT/DEDENT │
└────────────────────────┬──────────────────────────┘
▼
┌───────────────────────────────────────────────────┐
│ Stage 2: Parser (synoema-parser) │
│ parser.rs — Pratt parsing, 25 ExprKind variants │
│ Token stream → typed AST │
└────────────────────────┬──────────────────────────┘
▼
┌───────────────────────────────────────────────────┐
│ Stage 3: Type Checker (synoema-types) │
│ infer.rs — Algorithm W (Hindley-Milner) │
│ AST → type-annotated AST │
└────────────────────────┬──────────────────────────┘
▼
┌───────────────────────────────────────────────────┐
│ Stage 4: Desugaring → Core IR (synoema-core) │
│ desugar.rs + core_ir.rs │
│ Typed AST → System F-like Core IR │
└────────────────────────┬──────────────────────────┘
▼
┌───────────────────────────────────────────────────┐
│ Stage 5: Optimizer (synoema-core, 4 passes) │
│ constant fold + DCE → e-graph saturation → │
│ region annotation → Perceus RC insertion │
└────────────────┬─────────────────┬────────────────┘
▼ ▼
┌────────────────────┐ ┌─────────────────────────────────────────┐
│ synoema-eval │ │ synoema-codegen │
│ Tree-walking │ │ Cranelift JIT → native (default) │
│ interpreter │ │ AOT native (--native, x86_64 + arm64) │
│ (reference impl) │ │ WASM v3 (sno wasm — IoT, browser) │
└────────────────────┘ └─────────────────────────────────────────┘
Async event loop (Phase G) TLS / HTTPS (Phase 27)
mio reactor + timer wheel + rustls 0.23 ring backend +
bounded file-IO pool + tls_* builtins + http_*
async TCP server (H1/H2) https:// auto-upgrade
Capability Container (Phase 32)
sno:cc-base Docker image + cc-mcp pure-Synoema MCP server +
apply_change 8-step pipeline (typecheck → diff → MAJOR gate
→ write → commit → audit → restart → health → rollback)
Stage walkthrough — factorial.sno
A two-line program shows what each stage does:
fac 0 = 1
fac n = n * fac (n - 1)
main = fac 10
1. Lexing
The lexer (synoema-lexer) does two jobs: character-level scanning and offside-rule indentation processing.
- Scanner (
scanner.rs) converts raw bytes to tokens. Every operator in Synoema is a single BPE token incl100k_base— a design invariant verified bylang/tools/bpe-verify/verify_bpe.py. The 33 operators (including->,|>,++,**,?,:) are all 1-token by construction. - Layout (
layout.rs) inserts syntheticINDENT/DEDENTtokens whenever indentation increases or decreases, following Python's offside rule. This makes the grammar context-free while keeping the surface whitespace-sensitive.
2. Parsing
The parser uses Pratt parsing (top-down operator precedence) with 13 precedence levels. Pratt parsing is ideal for expression-heavy functional languages because it handles operator precedence and associativity through a binding-power table rather than 13 levels of grammar productions.
Function definitions with multiple equations (fac 0 = 1, fac n = ...) are parsed as separate clauses and later merged into a single FuncDef with a list of equations. Pattern matching on the LHS of = is parsed during this stage.
The AST has 25 ExprKind variants covering literals, application, binary ops, conditionals, lists, list comprehensions, ranges, records, field access, lambdas, pipes, compose, sequencing, where-blocks, case, constructors, string interpolation, record updates, bytes, naturals, rationals, plus the async surface (Await, Async) and post-Phase 27 additions.
3. Type checking
The type checker (synoema-types) runs Algorithm W — Hindley-Milner type inference. Annotations are optional; the checker infers types for every expression without them.
For factorial, inference proceeds: fac 0 = 1 infers fac : Int → Int (first equation pins arg and result type) → fac n = n * fac (n - 1) unifies n : Int, confirms fac : Int → Int → main = fac 10 infers main : Int.
The checker also handles row polymorphism for records (a function f r = r.x accepts any record with an x field) and linear types (LinearArrow) for resources used exactly once. Type errors carry source spans and LLM-friendly hints via synoema-diagnostic. See Type System for the deep-dive.
4. Desugaring → Core IR
The desugarer translates the typed AST into a small Core IR based on System F. Core IR has fewer constructs than the surface language, making optimization and code generation simpler. Key transformations:
- Multi-equation function definitions → a single
Caseexpression (decision tree) ? cond -> then : else→Case(cond, [Alt(true→then), Alt(false→else)])|>pipe operator → nestedApp>>compose operator →Lamwrapper- List comprehensions →
concatMap+filtercalls whereblocks → nestedLetbindings
5. Optimization (4 passes)
Four passes in sequence. See Optimizer for full details.
- Constant folding + DCE.
2 + 3 → 5; dead branches of? false -> x : yeliminated. - E-graph equality saturation. Algebraic rewrites:
x + 0 → x,x * 1 → x,map f (map g xs) → map (f >> g) xs. Up to 10 saturation iterations. - Region annotation. Marks which sub-expressions allocate on the heap, feeding Perceus.
- Perceus RC insertion. Inserts
inc/decreference-count operations at ownership-transfer points, enabling memory reclamation without a GC.
6a. Interpreter
The tree-walking interpreter (synoema-eval) evaluates Core IR directly using big-step operational semantics. It supports all language features — including those not yet in the JIT (Nat, Rational, Char, Bytes, TCP networking, fd_popen). The interpreter is the reference implementation for language semantics.
6b. JIT compiler
The JIT (synoema-codegen/compiler.rs) uses Cranelift to generate native machine code at runtime. Cranelift was chosen over LLVM for fast builds, simple distribution, and JIT-first design (JITModule compiles and links in-process). For factorial, the JIT generates a native function that loops via TCO and multiplies, running ~4.2× faster than CPython on the same benchmark. See JIT & ABI.
Why these design choices?
Why Cranelift and not LLVM?
LLVM produces excellent code but adds 200–400 MB of build dependencies and minutes of compile time. For a research-stage language, iteration speed matters more than the last 10% of performance. Cranelift gives ~80% of LLVM's codegen quality at a fraction of the dependency cost.
Why Hindley-Milner and not bidirectional typing?
HM gives full type inference with no annotations required — critical for LLM-generated code: the model doesn't need to annotate types, reducing token cost and error surface. Bidirectional typing requires more annotations for polymorphism.
Why BPE-aligned operators?
Every operator tokenizes to exactly 1 BPE token in cl100k_base (used by GPT-4, Claude, most modern LLMs). Multi-token operators waste model capacity on syntactic overhead. Result: Synoema programs are 15% shorter than equivalent Python on average, up to 52% on algorithmic tasks (sorting, recursion, tree traversal).
Key benchmarks
Token efficiency (cl100k_base, 16 tasks)
| Language | Avg tokens | vs Python |
|---|---|---|
| Synoema | baseline | −15% avg |
| Python | +15% | reference |
Algorithmic tasks (quicksort, fibonacci, tree traversal): up to −52%.
JIT runtime performance (vs CPython 3.12, median of 5 runs)
| Benchmark | JIT speedup |
|---|---|
| fibonacci | 28.2× |
| factorial | 4.2× |
| gcd | 3.5× |
| collatz | 3.1× |
| quicksort | 2.7× |
| matrix_mult | 2.1× |
| Median | 3.0× |
Deep-dives
- Type System — Algorithm W, ADTs, traits, contracts, row polymorphism, linear types
- JIT & ABI — Cranelift IR, calling convention, async state-machine compilation
- Optimizer — constant folding, e-graph saturation, region annotation, Perceus RC
Cross-references
- Language Reference — surface syntax, types, stdlib
- CLI Reference —
sno run,sno jit,sno wasm - LLM Integration — how the architecture supports LLM workflows
- Canonical overview on GitHub