Synoema

Type System

Hindley-Milner inference, row polymorphism, linear types — full inference, zero annotations required

Synoema uses Hindley-Milner (HM) type inference — the same algorithm that powers Haskell and ML-family languages. Annotations are optional; the compiler infers all types automatically. HM is complete: if a program is well-typed, the algorithm always finds its principal type. The trade-off is that HM doesn't handle higher-kinded types or GADTs — Synoema stays within the HM fragment intentionally.

For LLM-generated code this matters because no annotations are required (the model doesn't spend tokens writing : Int everywhere), type errors are caught at compile time, and the type checker knows exactly what was expected and what it got — precise error messages without speculation.

Type representation

pub enum Type {
    Var(TyVarId),                          // α, β, γ — unification variables
    Con(String),                            // Int, Bool, String, Float, Char
    Arrow(Box<Type>, Box<Type>),            // a → b — function type
    LinearArrow(Box<Type>, Box<Type>),      // a -o b — linear function type
    App(Box<Type>, Box<Type>),              // List a, Maybe a, Chan a
    Record(Vec<(String, Type)>, Option<TyVarId>),  // {x: Int, y: Float | r}
}

TyVarId is a u32 — fresh variables come from a counter in the inference state. The Record variant's second field is the row tail: None means a closed record (no extra fields allowed), Some(r) means an open record (additional fields permitted, captured by the row variable).

Algorithm W — how inference works

Algorithm W (Damas-Milner 1982) is a recursive algorithm over the AST. For each expression it returns a type and a substitution (mapping from type variables to concrete types). Substitutions are composed as inference recurses.

Key inference rules

Literals: integers infer Int, booleans Bool, strings String, no substitution needed.

Variables: if the environment binds x : ∀α₁...αₙ. τ, instantiate by replacing each αᵢ with a fresh variable βᵢ. Instantiation is what lets id be used both as Int → Int and String → String — each use gets a fresh copy of the type variables.

Function application: infer the function type τ₁, then the argument type τ₂, then unify τ₁ with τ₂ → β for a fresh β. The result is β after applying the unification substitution.

Let-binding (where polymorphism happens): infer the bound expression's type, generalize by quantifying free variables not in the outer environment, extend the environment with the polymorphic scheme, then infer the body. Generalization is what makes id polymorphic ∀a. a → a in let-bindings.

Worked example: identity

-- Source:
id x = x

-- Inference:
-- 1. Enter function: x : α (fresh)
-- 2. Body = Var("x") → α, no substitution
-- 3. Function type: α → α
-- 4. Generalize (α free, not in outer env): ∀a. a → a
-- Result: id : ∀a. a → a

Worked example: factorial

-- Source:
fac 0 = 1
fac n = n * fac (n - 1)

-- Inference:
-- 1. fac 0 = 1   → pattern 0 : Int forces n : Int, result : Int
-- 2. fac n = n * fac (n - 1)   → confirms Int → Int
-- Result: fac : Int → Int   (no quantifier — Int is concrete)

Worked example: map

-- Source:
map f [] = []
map f (x:xs) = f x : map f xs

-- Inference:
-- f : a → b (fresh)
-- x : a (from list pattern)
-- f x : b
-- result : List b
-- Generalize: ∀a b. (a → b) → List a → List b

Row polymorphism for records

Synoema extends HM with row polymorphism: write functions that accept any record containing at least a certain set of fields.

-- Accepts any record with an 'x' field:
get_x r = r.x

-- Inferred type: ∀a r. {x: a | r} → a
-- The '| r' means "plus any other fields"

The type {x: a | r} is a row type: a record with at least field x : a and a row tail variable r capturing additional fields. Unification of row types uses Rémy-style row unification (Didier Rémy, 1989).

-- All type-check:
get_x {x = 3}                : Int
get_x {x = "hello"}          : String
get_x {x = 1, y = 2}         : Int    -- extra 'y' field captured by r
get_x {x = 1, y = 2, z = 3}  : Int    -- still fine

A closed record type has no row tail variable:

-- Only accepts exactly {x: Int, y: Int}:
manhattan : {x: Int, y: Int} → Int
manhattan p = p.x + p.y

Row unification works by expanding row tails to accommodate extra fields. Unifying {x: a | r₁} with {x: Int, y: Bool} sets a = Int and r₁ = {y: Bool}.

Linear types

Synoema has a linear function arrow (-o) for resources that must be used exactly once. Used for file handles, channels, and other I/O resources.

-- Linear function: consumes its argument exactly once
consume : Resource -o Result

-- Regular function: may use argument any number of times
double : Int → Int
double x = x + x

Linear types are checked by tracking usage counts during inference. Two error codes:

  • linear_unused — a linear argument was never used
  • linear_duplicate — a linear argument was used more than once

In the current implementation, LinearArrow appears mainly in I/O operations and channel operations. The Chan a type uses linear arrows to ensure messages are not accidentally consumed twice.

Unification

Unification (Robinson 1965) finds the most general substitution that makes two types equal. The key safety rule is the occurs check: when unifying α ~ τ, α must not appear in τ, otherwise we'd construct an infinite type.

-- Without occurs check, this would loop:
unify(α, List α)
-- α = List α = List (List α) = List (List (List α)) = ...

-- With occurs check: type error "infinite_type"

Synoema runs the occurs check eagerly — any attempt to construct a recursive type is a compile-time error, not an infinite loop.

Type error messages

Errors flow through synoema-diagnostic for structured rendering. Every error has three representations:

  1. Human-readable — formatted with source spans and contextual prose
  2. JSON — for tooling integration (IDE, CI, ReAct loops)
  3. LLM hint — a short, model-friendly explanation oriented toward fixing

Example — type mismatch

-- Source with error:
greet name = "Hello, " ++ name.length

-- Type error:
Error [TYPE_MISMATCH] at line 1, col 22-32:
  Expected: String
  Found: Int

  In expression: name.length
  Hint: The ++ operator concatenates strings. `str_len` returns Int.
        Use `show (str_len name)` to convert to String first.

-- JSON form:
{
  "code": "TYPE_MISMATCH",
  "span": {"start": 22, "end": 32},
  "expected": "String",
  "found": "Int",
  "llm_hint": "The ++ operator concatenates strings. str_len returns Int. Use show (str_len name) to convert to String first."
}

Levenshtein did_you_mean

When an undefined variable is referenced, the type checker runs Levenshtein distance against all in-scope names and suggests the closest match.

undefined_var
-- Error: Undefined variable 'undefined_var'
-- Did you mean: 'undeclared_var'? (distance 1)

Type-pair suggestions

GotExpectedHint
IntStringUse show to convert Int to String
StringIntUse to_int to parse
List aaYou have a list; use head or pattern match
a → bbYou have a function; apply it with an argument

All 26 error codes have llm_hint coverage (verified by tests). Hints are written for a language model: they suggest the fix, not just describe the error.

Cross-references