About Synoema

The LLM-native programming language

What is Synoema?

Synoema [sy-NO-e-ma] — from Greek synoema (συνόημα), "shared understanding". An LLM-native programming language: designed for LLM as author, formally verifiable, deployable to real hardware without human review.

The core insight: LLMs cannot yet be trusted to generate correct code reliably. Unconstrained generation in Python or JavaScript produces syntax errors, semantic errors, and unverifiable logic. Synoema solves this at the language level — GBNF grammar eliminates syntactic errors structurally, Hindley-Milner types catch semantic errors statically, and requires/ensures contracts make intent machine-checkable.

Synoema is not a general-purpose language competing with Python or Rust. It occupies a new category: a language where LLM-generated code is formally verifiable and deploys to production without human review. The measure of success is not generation quality in a playground — it is prompt → LLM → compile → deploy → run on a real device, without a human in the loop.

Design Philosophy

Correctness by Construction

Reliability is the primary goal. GBNF grammar constrains LLM output to be 100% syntactically valid. Hindley-Milner type inference catches semantic errors without annotations. Verification contracts (requires/ensures) make intent machine-checkable. A feature LLMs cannot generate reliably must not exist in the language.

Deployment is the Measure

Generating code in a playground is not success. The measure is prompt → LLM → compile → deploy → run on a real target — native binary, WASM module, or IoT device — without human review. Every backend and deployment tier exists to close this loop.

LLM-Native Design

Syntax is co-designed with LLM generation: every operator = exactly 1 BPE token (cl100k_base, Llama 3, Mistral). ? cond -> then : else costs 3 tokens vs if cond then x else y at 6. Token savings are a consequence of this design — not the goal.

Immutable & Strict

All bindings are immutable. Evaluation is eager, left-to-right. No lazy evaluation surprises. Predictable for both humans and LLMs.

Equations, Not Statements

No def, no return, no semicolons between statements. Functions are equations: f x = body. The last expression is the result. Pattern matching via multiple equations.

Dual Backend

Interpreter for development (full I/O, networking, concurrency). Cranelift JIT for production speed (3x over Python). Same language, same semantics, choose your tradeoff.

Core Strengths

1. Reliability — Structural, Semantic, Formal

Three independent layers guarantee generated code correctness:

Layer	Mechanism	Result
Structural	GBNF constrained decoding (208 lines, 76 rules)	100% syntactically valid output
Semantic	Hindley-Milner type inference + fine-tuning	90.5% semantic correctness (fine-tuned 7B)
Formal	`requires`/`ensures` contracts	Machine-verifiable postconditions

Compare: unconstrained generation in Python yields ~24% compilation errors on first attempt (Nguyen & Nadi, 2022).

2. Native Speed — 3x Median Over Python

JIT-compiled via Cranelift to native x86-64. Benchmarked against CPython 3.12, median of 5 runs:

Benchmark	Python	Synoema JIT	Speedup
fibonacci	144ms	5.1ms	28.2x
factorial	24ms	5.7ms	4.2x
gcd	17ms	4.7ms	3.5x
collatz	18ms	5.6ms	3.1x
quicksort	17ms	6.2ms	2.7x
matrix_mult	16ms	7.7ms	2.1x
Median (12 tasks)			3.0x

Fibonacci shows 28x thanks to tail-call optimization. Typical sustainable speedup: 2–4x.

3. Deployability — Prompt to Real Hardware

The measure of success is prompt → LLM → compile → deploy → run on a real target. Synoema ships three deployment paths:

Target	Mechanism	Example
Native binary	Cranelift AOT — x86-64, aarch64	Linux server, Raspberry Pi
WASM module	WASM v3 codegen (records, ADTs, floats, contracts)	Browser, edge, wasm3-on-MCU
IoT (Tier 0–2)	wasm3 interpreter, bare MCU + WASM runtime	ESP32, STM32, Cortex-M

30/30 IoT rules compile and pass sno check; 29/30 produce valid WASM artifacts. Mean artifact size: 200 B. Cloud compile pipeline: natural language prompt → LLM → Synoema rule → WASM artifact in one command.

4. Type-Guided Generation

Hindley-Milner type inference acts as a semantic constraint on LLM output. Research shows type-constrained decoding reduces compilation errors by 74.8% vs only 9.0% for syntax-only constraints (Mundler et al., PLDI 2025). Synoema's type system is not just for correctness — it's a generation quality multiplier.

5. Token Efficiency — a Byproduct of LLM-Native Design

BPE-aligned operators and pattern-matching terseness produce measurable token savings on functional code. This is a consequence of design, not an optimization target:

Task type	vs Python	Examples
Functional & pattern-heavy	−22% to −52%	quicksort −38%, json_build −52%, pattern_match −40%
General algorithms	−8% to +14%	collatz −8%, fizzbuzz −6%
Imperative & index-heavy	+33% to +105%	binary_search +33%, matrix_mult +105%
Average (16 tasks)	~0%	Synoema 92.5 tokens, Python 92.9 tokens

Where Synoema wins: recursive algorithms, ADTs, JSON building. Where Python wins: string ops, imperative, matrix math. Token savings are domain-specific — reliability and deployability are universal.

Scientific Foundation

Synoema's design is grounded in 23 peer-reviewed publications. Key findings that shaped the language:

Finding	Source	Impact on Design
LLM inference consumes >90% of total AI energy	TokenPowerBench, 2024	LLM-native design reduces token count as a byproduct — cost/energy savings follow
Attention cost is O(n²) — halving tokens = 4x less compute	Vaswani et al., 2017	−33% tokens on functional tasks → ~55% less attention compute (O(n²) scaling)
Token efficiency varies 2.6x across languages	Alderson, 2026	Language design can significantly impact token count
Type errors = 33.6% of LLM code failures	Tambon et al., 2025	Hindley-Milner inference eliminates the dominant error class
Type constraints reduce errors by 74.8%	Mundler et al., PLDI 2025	Type system as generation constraint, not just verification
Bridge tokens distort LLM distributions	Domino, ICML 2024	All operators = 1 BPE token → no bridge token distortion
XGrammar: 100x speedup for grammar-constrained decoding	Dong et al., 2024	GBNF grammar designed for efficient constrained decoding
LLM quality degrades with sequence length	Multiple sources	Fewer tokens = less context rot = better output quality

Synoema vs Other Languages

Feature	Synoema	Python	Haskell	Rust	TypeScript
Token efficiency	−33% functional	Baseline	Similar	Verbose	Verbose
Type inference	Full (HM)	None	Full (HM)	Partial	Partial
Pattern matching	Full ADTs	Limited	Full	Full	None
Constrained decoding	GBNF	No	No	No	No
JIT compilation	Cranelift	No (CPython)	GHC	LLVM	V8
LLM toolchain	MCP + GBNF	None	None	None	None
Learning curve	Medium	Easy	Hard	Hard	Easy
Ecosystem	Small	Huge	Medium	Large	Huge
Evaluation	Strict	Strict	Lazy	Strict	Strict
Immutability	Default	No	Default	Default	No

Key insight: Synoema doesn't try to replace Python or Rust for human-written code. It's designed for the specific scenario where an LLM generates code — and in that scenario, token efficiency, type safety, and constrained decoding matter more than ecosystem size.

Maximum Impact Areas

Synoema gives nonlinear advantage where machines generate code and correctness is critical. The key insight: Synoema is not competing with Python for human developers — it's the language where AI writes code that other machines verify and execute.

Impact Matrix

	Correctness: nice-to-have	Correctness: critical
Machine generates code	Edge AI microtools, on-device code gen, IoT rules	Verified microservices, financial logic, executable specs, agent orchestration
Human writes code	Python/JS are better choices	Executable specifications, formal contracts

Maximum effect = machine generates, correctness critical. This is where all three verification layers (GBNF syntax + HM types + contracts) work simultaneously.

Six High-Impact Domains

1. LLM-Generated Microservices

User describes business logic in natural language. LLM generates a Synoema service. GBNF guarantees syntax. Types guarantee correctness. Contracts guard business invariants. The service runs immediately via TCP/HTTP builtins. Human never reads the code — code is an artifact between two machines.

2. Formally Verified AI Code

Today: LLM generates Python, human reviews, hopes it works. With Synoema: GBNF ensures syntax, HM types catch type errors (33.6% of all LLM failures), contracts enforce requires/ensures. Three layers of verification, zero human review needed for correctness. Critical for financial calculations, medical algorithms, regulatory compliance.

3. Edge AI / Small Models

Device: Raspberry Pi, phone, IoT. Model: 4B-7B parameters. Context: 2K-4K tokens. Synoema's compact reference (900 tokens) fits the full spec into context. GBNF eliminates syntax errors. JIT = 3x faster on constrained hardware. Fine-tuned 7B reaches 90.5% correctness.

4. LLM Self-Improvement Loops

LLM generates code → type checker finds errors → structured JSON error with llm_hint → LLM fixes using hint → repeat. Synoema's --errors json with fixability and did_you_mean is designed for machine consumption, not humans. No other language has error messages optimized for LLMs.

5. Executable Specifications

"Discount 10% for orders over 1000, max 500" becomes: discount : Int -> Int with requires total > 0, ensures result <= 500. The specification IS the code. Contracts are checked at runtime. synoema doc --contracts generates the spec table. Specification never diverges from implementation because they're the same thing.

6. AI Agent Orchestration

Multiple AI agents exchange Synoema programs instead of JSON or natural language. Programs are formally typed, verified by contracts, and executable. MCP server enables eval/typecheck/run in real-time. Synoema as lingua franca between AI agents — a shared language both machines understand with formal guarantees.

When NOT to Use Synoema

Critical thinking requires honesty. Synoema loses everywhere a human writes code or ecosystem matters more than correctness:

Domain	Why not Synoema	Better choice
Web frontends	No DOM, no browser API	TypeScript/JavaScript
Data science	No numpy/pandas ecosystem	Python
Systems programming	No ownership, no unsafe	Rust
Enterprise backend	Small ecosystem, no ORM	Java/Go
Mobile apps	No SDK	Swift/Kotlin
String-heavy tasks	Python is 87% more token-efficient	Python
Existing codebase	Migration cost not justified	Whatever's there
Human writes code manually	Python is simpler and more familiar	Python

The pattern: Synoema wins where machines generate code and correctness is critical. Synoema loses where humans write code or ecosystem size matters.

Architecture

~109,000 lines of Rust across 12 workspace crates (as of 2026-04-26):

Crate	Purpose	LOC
`synoema-lexer`	Tokenization, offside rule (indentation → INDENT/DEDENT)	1,984
`synoema-parser`	Pratt parser, 22 expression kinds, AST	3,453
`synoema-types`	Hindley-Milner inference, row polymorphism, contracts	6,068
`synoema-core`	Core IR (System F), constant folding, dead code elimination, e-graph	4,922
`synoema-eval`	Tree-walking interpreter, all builtins, I/O, networking	7,325
`synoema-codegen`	Cranelift JIT compiler, tagged pointer ABI, arena memory, 130 FFI	9,275
`synoema-diagnostic`	Structured errors with LLM hints, fixability scores	1,147
`synoema-lsp`	LSP server (hover, go-to-def, diagnostics, completion)	633
`synoema-repl`	CLI: run, jit, eval, test, doc, watch, new, install, check, verify, setup, migrate	6,181

Key architectural decisions:

Tagged pointer ABI — all values fit in i64 with bit-level type tags (bit 0 = list, bit 1 = string). Zero boxing overhead for small values.
Arena memory — 8MB bump allocator with region stack. Auto-reset in tail-recursive loops. No GC pauses.
2 dependencies only — Cranelift for JIT, pretty_assertions for tests. No runtime dependencies beyond std.

Ecosystem

MCP Server

npx synoema-mcp — instant integration with Claude Desktop, Cursor, Zed. Tools: eval, typecheck, run, constrain (token masking), feedback_loop (generate→check→retry loop), doc_query (structured doc extraction).

VS Code Extension

Syntax highlighting, run/JIT keybindings (Cmd+Shift+R/J), eval selection. LSP server for hover types, go-to-def, diagnostics.

GBNF Grammar

208 lines, 76 rules. Works with llama.cpp, vLLM, SGLang, TensorRT-LLM. Ensures 100% syntactically valid LLM output.

Fine-Tuning Corpus

10,324 verified examples (99.9% pass rate) in ChatML format. Covers algorithms, data structures, pattern matching, error handling, I/O, and more. Used to fine-tune 7B models to 90.5% run rate.

Benchmark Suite

30 tasks across 5 languages. Automated token counting (tiktoken) and runtime measurement. Reproducible via Python scripts.

2,734 Tests

Unit tests, stress tests (fib(35), 100K tokens, deep nesting), corpus validation, adversarial edge cases. 0 failures, 0 warnings.

Roadmap

Phase	Status	Description
Working Language	Done	Lexer, parser, type system, interpreter, REPL
Working Compiler	Done	Core IR, Cranelift JIT, tagged pointer ABI, arena memory
LLM-Native	Done	GBNF grammar, MCP server, constrained decoding, LLM error feedback
Production	90%	Region inference, contract docs, benchmark suite, small model templates
Community	Next	Package manager, expanded corpus, documentation

Current version: 0.1.0-beta.1 (beta — syntax and APIs may change)

Phase D: LLM Generation Benchmark Results

Measured April 2026 across 7 models, 9 standard tasks, via Ollama and OpenRouter. All five hypotheses resolved.

Model	Size	Config	Syntax%	Run%
llama-3.2-1b	1B	baseline	0%	0%
llama-3.2-3b	3B	baseline	44%	0%
qwen2.5-coder:3b	3B	baseline	87%	60%
llama-3.1-8b	8B	baseline	59%	30%
qwen2.5-coder-7b	7B	baseline	56%	41%
qwen3-8b	8B	multipass	59%	48%
qwen2.5-coder:7b	7B	baseline	36%	12%

Key findings:

H1 DISPROVED — Compact reference (~900 tokens) performs worse than baseline for all models tested. More context = better results.
H2 CONFIRMED — Model size strongly predicts quality. Spearman ρ=1.00 (1B→8B). Larger models generate better Synoema code.
H3 MODEL-SIZE-DEPENDENT — Multipass hurts ≤3B models (−32pp run), helps ≥7B models (+4–18pp).
Best 7B model: qwen2.5-coder-7b with 41% run rate (baseline, no retry needed).
Best overall at 8B: qwen3-8b with multipass reaches 48% run rate.

Source: docs/research/hypothesis-test-results.md

Powered by Synoema

This website — synoema.tech — is served by an HTTP server written entirely in Synoema. The server is 171 lines of code handling routing, static file serving, playground evaluation, and SEO — with no external frameworks or dependencies. Pure Synoema.

This is deliberate dogfooding: the language proves its own capabilities by running its own production website. The server demonstrates real-world functionality that goes beyond toy examples:

TCP networking — listening on a port, accepting connections, sending responses
HTTP parsing — request method, path, headers, content types
File I/O — reading HTML pages, CSS, images, serving static assets
String processing — template substitution, URL routing, content-type detection
Live code evaluation — the playground sends Synoema code to this same server, which evaluates it and returns the result

When we say Synoema is ready for real workloads, we mean it literally: you're looking at the proof.

v6 Fine-Tuning: Intermediate Results

After Phase D established baselines, we fine-tuned Qwen2.5-Coder models on corpus_v6 (10,324 compiler-validated examples, ChatML format, 99.9% parse+run correct). Results as of April 2026 — 3B and 1.5B evals in progress.

Model	Corpus	Syntax%	Run%	Constructs%	vs Baseline
qwen2.5-coder-7b (baseline)	—	56%	41%	—	—
synoema-coder-7b v6	v6 ChatML	100%	90.5%	44.6%	+49.5pp run
synoema-coder-3b v6	v6 ChatML	benchmark eval in progress
synoema-coder-1.5b v6	v6 ChatML	eval in progress

What improved: Run rate jumped from 41% to 90.5% (+49.5pp) — H6 confirmed. Syntax is now 100%.

What regressed: Constructs pass rate dropped from 52.7% (v5) to 44.6% — an 8.1pp regression. The model generates runnable code but sometimes avoids the specific constructs (|>, test, and_then) the task asked for. Under investigation.

Production status: v6 7B does not yet meet production criteria (constructs regression exceeds the −5pp tolerance). It is a research artifact with a known weakness.

Full analysis with failure breakdown →

Get Involved

Synoema is an open research project. Contributions welcome:

Try it — playground, getting started guide
Read the source — ~109K LOC across 12 crates, readable in a week
Report issues — GitHub
Contribute examples — expand the corpus, write benchmarks
Research — token efficiency, constrained decoding, type-guided generation

Get Started Language Reference Try Online