Synoema

Optimizer

4-pass Core IR pipeline — linearize, fold, e-graph saturate, Perceus RC

After desugaring the typed AST into Core IR, Synoema runs a 4-pass optimization pipeline before handing the code to either the interpreter or the JIT compiler. Order matters: constant folding happens before e-graph optimization (so the e-graph sees simpler terms), and Perceus happens last (so it sees the final shape of the code).

Entry point: optimize_program() in core/src/optimize.rs.

The pipeline

Typed AST
    │
    ▼ desugar.rs
Core IR (raw)
    │
    ▼ Pass 1: Linearize tree recursion
Core IR
    │
    ▼ Pass 2: Constant folding + DCE  (optimize.rs)
Core IR (constants folded, dead code removed)
    │
    ▼ Pass 3: E-graph equality saturation  (egraph.rs)
Core IR (algebraically simplified)
    │
    ▼ Pass 4: Perceus RC insertion  (perceus.rs)
Core IR (with inc/dec annotations)
    │
    ├──▶ synoema-eval     (tree-walking interpreter)
    └──▶ synoema-codegen  (Cranelift JIT)

Core IR — a quick primer

Core IR is a small System F-like language with 15 expression variants (vs 25 in the surface). Everything is a single CoreExpr:

pub enum CoreExpr {
    Var(Name),                          // variable reference
    Lit(Lit),                           // literal: Int, Bool, Float, Str, ...
    App(Box<CoreExpr>, Box<CoreExpr>),  // function application: f x
    Lam(Name, Box<CoreExpr>),           // lambda: \x -> body
    Let(Name, Box<CoreExpr>, Box<CoreExpr>),     // let x = val in body
    LetRec(Name, Box<CoreExpr>, Box<CoreExpr>),  // let rec f = val in body
    Case(Box<CoreExpr>, Vec<Alt>),      // pattern match
    Con(Name),                          // data constructor
    PrimOp(PrimOp),                     // primitive: Add, Mul, Eq, ...
    MkList(Vec<CoreExpr>),              // list literal
    Record(Vec<(Name, CoreExpr)>),      // record literal
    // ... (FieldAccess, RecordUpdate, Seq, StringInterp)
}

The optimizer works on CoreExpr trees using recursive traversal.

Pass 1: Linearize tree recursion

What it does: transforms certain tail-recursive functions so the JIT can apply TCO (tail-call optimization). A function like sum 0 acc = acc / sum n acc = sum (n - 1) (acc + n) is already tail-recursive by construction (the recursive call is in tail position). Linearization ensures the Core IR representation is shaped so the JIT's TCO detection (TcoContext in compiler.rs) recognizes and optimizes it.

Why it's Pass 1: subsequent passes (constant folding, e-graph) may create new optimization opportunities in the linearized form.

Pass 2: Constant folding + DCE

Rewrites Core IR expressions bottom-up, replacing known-constant subexpressions with their values and eliminating unreachable branches.

Constant folding

When both arguments to a PrimOp are literals, compute the result at compile time:

Before: App(App(PrimOp(Add), Lit(Int(2))), Lit(Int(3)))
After:  Lit(Int(5))

Before: App(App(PrimOp(Mul), Lit(Float(2.5))), Lit(Float(4.0)))
After:  Lit(Float(10.0))

This happens in fold_app() in optimize.rs — it pattern-matches on the shape App(App(PrimOp(op), Lit(a)), Lit(b)) and evaluates the operator immediately.

Dead branch elimination

When the scrutinee of a Case is a literal, only the matching branch is kept:

Before:
  Case(Lit(Bool(true)),
    [Alt(PatLit(true),  body_a),
     Alt(PatLit(false), body_b)])
After:
  body_a

For factorial's first equation fac 0 = 1, this means that calling fac with a literal 0 collapses the Case immediately.

Dead let elimination

If a let x = e in body binding is never used in body, it's removed — only safe when e is a literal (side-effecting expressions are not eliminated):

Before: Let("unused", Lit(42), Var("result"))
After:  Var("result")

Pass 3: E-graph equality saturation

An e-graph (equality graph) represents equivalence classes of expressions. The optimizer adds rewrite rules, applies them until no new equivalences are found (saturation), then extracts the lowest-cost representative from each class.

Why e-graphs?

E-graphs avoid the phase ordering problem in traditional peephole optimization: applying rule A before rule B might block rule C, but applying B first enables C. E-graphs explore all rule applications simultaneously and pick the best outcome.

How it works

  1. Build an e-graph from the Core IR
  2. Add each expression as its own e-class (equivalence class)
  3. Apply rewrite rules: when a rule's LHS matches a node, merge LHS class with RHS class
  4. Repeat until no new merges happen (saturation) or 10 iterations reached
  5. Extract: walk the e-graph, choose cheapest node from each class

Rewrite rules implemented

Algebraic identities:

x + 0  →  x
x * 1  →  x
x * 0  →  0
x - x  →  0
x + x  →  x * 2

Boolean simplifications:

x && true   →  x
x || false  →  x
not (not x) →  x

List fusion:

map f (map g xs)  →  map (f >> g) xs

The fusion rule is particularly valuable for LLM-generated code, which often chains map calls. One fused map does one pass over the list instead of two.

Adding a new rewrite rule

Edit egraph.rstry_algebraic_rules() or try_boolean_rules(). Rules are expressed as pattern matches on ENode trees. Follow the pattern of apply_binop_rules().

Pass 4: Perceus RC insertion

Analyzes the Core IR for ownership and inserts inc/dec reference-count operations at ownership transfer points. Enables memory reuse without a garbage collector. The technique comes from the Koka language.

Perceus in a nutshell

  • When a value is duplicated (passed to two places), insert inc before the split
  • When a value is consumed (last use in a branch), insert dec after the use
  • When a value is reused (rc reaches 0 and a new allocation of the same shape is needed), reuse in place instead of allocating

The result: in functional programs with value semantics, Perceus achieves near-zero allocation overhead for common patterns like map (which reuses list nodes as it traverses).

What the pass produces

The Perceus pass annotates CoreExpr with ownership information and wraps expressions in inc/dec calls that the JIT then translates to synoema_inc/synoema_dec FFI calls (see JIT & ABI for the runtime side).

Region annotation (subpass)

Before Perceus runs, the annotate_regions pass marks which expressions allocate on the heap. This allows the Perceus pass to skip non-allocating expressions (e.g., integer arithmetic needs no RC management). The synoema_region_enter / synoema_region_exit entry points exist in the runtime but are no-ops — region-based memory management was superseded by Perceus. The annotation pass still runs for escape analysis.

Worked example

Let's trace double_sum xs = foldl (\acc x -> acc + x) 0 xs through the pipeline.

After desugaring (Core IR):

LetRec("double_sum",
  Lam("xs",
    App(App(App(Var("foldl"),
      Lam("acc",
        Lam("x",
          App(App(PrimOp(Add), Var("acc")), Var("x"))))),
      Lit(Int(0))),
    Var("xs"))),
  ...)

After Pass 2 (constant folding): no change — no constants to fold in this expression.

After Pass 3 (e-graph): the (\acc x -> acc + x) lambda may be recognized as equivalent to the builtin (+) and simplified. The initial Lit(Int(0)) accumulator enables x + 0 → x if the first element is later proven to be 0; otherwise no simplification.

After Pass 4 (Perceus): the list xs gets inc/dec annotations as foldl traverses it. If this is the only reference to xs, Perceus enables in-place reuse of the list nodes as the fold proceeds.

Cross-references