Closing the Math Gap: 15 Builtins, 88 Tests, and a Doctest System That Finally Works
Until last week, asking an LLM to compute a sine wave or take a logarithm in Synoema was an awkward conversation. The language had sqrt, floor, ceil, round, and abs — and that was the entire transcendental toolbox. No sin, no cos, no exp, no ln, no pi, no mean, no stddev. For a language that markets itself on token efficiency for AI code generation, this was a problem: a benchmark that asked for a population mean would balloon from R's mean(x) to a 9-token sum xs / length xs in Synoema.
This article walks through the change — math-core-builtins-l1 and its two follow-ups — that closed the gap. The work added 15 builtins, 88 tests, and uncovered a parser bug that had silently disabled doctests on type signatures for the entire history of the project. Here is what shipped, what we deliberately did not ship, and what the process of doing it through OpenSpec actually looked like.
The Gap
Comparing the math surface area of Synoema against R and F# revealed a stark asymmetry:
| Function | R | F# | Synoema (before) |
|---|---|---|---|
sin / cos / tan | builtin | System.Math | missing |
exp / ln / log10 | builtin | System.Math | missing |
pi / e | builtin | Math.PI | missing |
mean / sd | builtin | Seq.average | missing |
vectorized arithmetic x + y | builtin | FsLab | not planned |
| matrices, BLAS | builtin | Math.NET | not planned |
The strategic decision was easy: do not chase R or F# to scientific computing. Synoema is not numpy. Vectorized operators would break Hindley-Milner inference and the BPE-aligned operator set, and matrices belong in a skill, not the core. But the basics — transcendental functions, two constants, two aggregations — were genuinely missing, and their absence was visibly hurting LLM token efficiency on math-flavored prompts. So we scoped a tight, additive change: 15 builtins, no new dependencies, no operator-level changes.
The Builtins
Fifteen functions, all working only on Float (or List Float for aggregations). No typeclass polymorphism, no auto-coercion from Int — the type checker rejects sin 0 and asks you to write sin 0.0 or sin (fromFloat 0).
sin : Float -> Float -- radians
cos : Float -> Float
tan : Float -> Float
asin : Float -> Float -- NaN if outside [-1, 1]
acos : Float -> Float
atan : Float -> Float
atan2 : Float -> Float -> Float -- atan2 y x; range (-pi, pi]
exp : Float -> Float
ln : Float -> Float -- natural log; -Inf for 0.0, NaN for negative
log10 : Float -> Float
log2 : Float -> Float
pi : Float -- 3.141592653589793
e : Float -- 2.718281828459045
mean : List Float -> Float -- NaN on empty
stddev: List Float -> Float -- sample stddev (n-1 denominator); matches R sd()
Two design decisions are worth flagging:
Sample standard deviation, not population. R's sd() uses the Bessel correction (n−1 in the denominator). NumPy's default uses n. We chose Bessel because R is the language LLMs have seen the most statistics code in, and matching sd() reduces hallucination risk. stddev [1.0 2.0 3.0] returns exactly 1.0, the same as R; NumPy would return 0.816.
pi and e as zero-arity builtins. Synoema already supports zero-arity builtins (the readline and chan precedent). When the evaluator sees Value::Builtin(name, 0) in identifier position, it auto-invokes. This means you write 2.0 * pi, not 2.0 * pi (). The runtime returns std::f64::consts::PI bit-for-bit — no precision loss.
How It Plugs In
Adding a builtin to Synoema requires touching three crates in lockstep. This is by design: the type system, the interpreter, and the JIT must agree, or the language fragments. The recipe:
1. synoema-types/src/infer.rs
register Float -> Float (or zero-arity) signature
2. synoema-eval/src/eval.rs
add (name, arity) to builtin_env
add "name" => ... arm in call_builtin
3. synoema-codegen/src/runtime.rs
#[no_mangle] extern "C" fn synoema_float_NAME(x: i64) -> i64
4. synoema-codegen/src/compiler.rs
register symbol in setup_jit_builder
declare signature in declare_runtime_functions
For the transcendentals, the JIT runtime is a one-liner that wraps Rust's std::f64 methods and re-tags the result as a Synoema Float. The aggregation builtins (mean, stddev) walk the tagged-pointer linked list directly — one pass for mean, two passes for stddev (sum, then sum-of-squared-deviations).
What Happened to WASM
The WASM backend gets its own paragraph in the design doc, and that paragraph is just one sentence: it does not currently support Float literals at all. wasm_codegen.rs emits a clear "WASM v2: float literal not supported" diagnostic if you try. Adding 15 Float-only builtins to a backend that rejects Float at the front door would be theatrical. The WASM heap is integer-only by design (Tier-0 IoT devices like wasm3 on STM32 don't have an FPU); when a Float WASM target arrives, it will be its own change, and the math builtins will be ready.
Testing: 88 New Tests, One Pre-existing JIT Bug
The test plan called for at least three tests per builtin, split across two backends. We ended up with 43 interpreter tests, 45 JIT tests, and three bit-identity tests that round-trip the same expression through both backends and compare f64::to_bits(). All 88 pass.
The bit-identity tests are not theoretical. LLM-generated code may freely switch between sno run and sno jit, and any divergence in transcendental results would silently break programs that compare against expected values. By forcing both backends through the same std::f64 implementation, we get the libm guarantees for free.
One JIT test is marked #[ignore] with an explicit reason: arithmetic on the result of any unary float builtin produces garbage. sqrt 2.0 + sqrt 3.0 returns a corrupted i64 instead of a tagged Float. This is a pre-existing bug, not introduced by the math change — sqrt has been in the language since long before this work. The interpreter computes the right answer; the JIT does not. Documenting this as an ignored test with a clear repro is the honest move; fixing it requires Cranelift IR work that is out of scope for a builtin-addition change.
The Doctest Bug We Found by Accident
The original spec required --- example: doctest lines for each builtin in lang/prelude/Number.sno. We wrote them. We ran synoema test prelude/. Output: 36 doctests collected, all from prelude.sno, none from Number.sno. The 15 examples we'd just added were silently ignored.
The cause turned out to be a forgotten branch in attach_doc:
fn attach_doc(decl: &mut Decl, doc: Vec<String>) {
if doc.is_empty() { return; }
match decl {
Decl::Func { doc: d, .. } => *d = doc,
Decl::TypeDef { doc: d, .. } => *d = doc,
Decl::TraitDecl{ doc: d, .. } => *d = doc,
Decl::TypeSig(_) | Decl::ImplDecl { .. } | ... => {}
// ^ silently dropped
}
}
The parser was collecting doc comments correctly, but TypeSig — the AST node for signature-only declarations like sin : Float -> Float — had no doc field, so the comments were thrown away. The doctest extractor in synoema-repl only walked Func/TypeDef/TraitDecl, so even if the comments had been preserved, they would not have run.
The fix was a five-line patch: add pub doc: Vec<String> to the TypeSig struct, set it in attach_doc, and add a fourth case to extract_doctests. After the fix, synoema test prelude/ reports 70 doctests — 36 existing plus 19 from Number.sno wrappers plus 15 new math ones. All pass.
And then there was the typo. With doctests on TypeSig finally running, one of the 19 existing examples failed:
--- example: toDecimal (1/4) == 0.25 <-- typo: function is named to_decimal
toFloat : Rational -> Float
toFloat r = to_decimal r
The example referenced a non-existent identifier. It had been wrong since the file was written, and nobody noticed because the doctests were never executed. Fixing it was a one-character change, but the bug is a small monument to why "tests that don't run" is functionally identical to "no tests at all."
OpenSpec Process: Three Changes, Each Smaller Than the Last
The math work shipped as three OpenSpec changes, not one:
- math-core-builtins-l1 — the 15 builtins, with proposal, design, specs, tasks, and 88 tests. Marked all tasks complete and archived. An honest follow-up audit found three uncovered scenarios from the spec.
- math-builtins-l1-spec-coverage — three tests for the uncovered scenarios:
tandivergence near pi/2,sin 0rejectingInt,mean [1 2 3]rejectingList Int. The middle two surfaced an interesting subtlety:eval_exprauto-coercesInttoFloatfor these calls, butsynoema_types::typecheck— the entry point that the CLI actually uses — correctly rejects them. The tests usetypecheckdirectly. - math-builtins-l1-doctest-and-budgets — the parser AST patch for TypeSig docs, the extract_doctests update, the typo fix, and a spec amendment that pulls the documentation token budgets in line with reality. The original spec demanded
synoema.mdstay under 2000 tokens; the file has been over 7000 since long before this change. We updated the budget to≤ 7500with an explicit note that the aspirational≤ 2000target is tracked in a separate, futuredocs-budget-tighteningchange.
Splitting work this way is awkward at first — it feels like extra ceremony for small fixes — but each subsequent change has a tight, auditable scope. The first change shipped the feature. The second closed a coverage gap that would have otherwise stayed in the "we'll get to it" backlog forever. The third turned a long-broken piece of testing infrastructure into something that works for everyone, not just for math.
What Did Not Ship, and Why
Three things were deliberately left out, with the design doc explaining each:
No vectorized operators. Writing xs + ys as element-wise addition on List Float is what makes R feel like a math language. It also requires implicit operator overloading and breaks Hindley-Milner inference. Synoema's contract with the LLM is that what you read is what you get; + is two-argument numeric addition, not anything else. If you want element-wise: zip_with (+) xs ys in five tokens.
No matrix support. Linear algebra belongs in a skill (lang/skills/math/linalg/), not the core. When matrices arrive, they will probably be List of Lists with FFI wrappers to BLAS for performance — but that is a separate, larger change with its own dependency story.
No probability distributions. R's pnorm, qnorm, rnorm and friends are how R wins the statistics niche. Synoema is not trying to win that niche. If you need it, write the algorithm in Synoema or call out to R via a process boundary; the language doesn't owe you a built-in normal distribution.
The Numbers
| Metric | Before | After |
|---|---|---|
| Math builtins | 5 | 20 |
| Math tests | ~10 | 98 (10 + 88) |
| Total cargo tests | 1644 | 1858 |
| Prelude doctests | 36 | 70 |
| New dependencies | — | 0 |
| BPE-aligned operators changed | 33 | 33 |
| Lines of new Rust | — | ~340 |
What This Means for Synoema
The math builtins close a visible token-efficiency gap on math-flavored prompts. mean xs is now five tokens in Synoema; it was nine before. sin pi works; it didn't. stddev matches R's sd() exactly, so an LLM that has seen R's documentation will not be surprised by the result.
The doctest fix is more interesting. Every signature-only declaration in the prelude can now carry executable documentation, which means the typo class of bugs ("the example references a function that doesn't exist") gets caught by CI instead of by readers. We added 34 doctests to Number.sno alone. We expect the rest of the prelude to grow doctests as the next few changes touch those areas.
The OpenSpec process kept the work honest: every claim in the proposals has a corresponding test, every uncovered spec scenario got its own follow-up change, and the unrealistic documentation budgets got an explicit amendment instead of being silently ignored. None of this is novel as software engineering practice. What is unusual is doing it for a language whose primary user is not a human.