Insights
Research, design decisions, and lessons from building a language for machines
Synoema exists at an unusual intersection: programming language theory, LLM research, and compiler engineering. These articles document what we've learned — including things that didn't work the way we expected. No marketing, no hype. Just the data and the reasoning behind the decisions.
All Articles
Why Build a New Programming Language in the Age of AI?
The paradox at the heart of LLM code generation: AI can write Python fluently, yet most of the code it produces doesn't run. The case for a language designed for machines, not humans.
ExplainerWhy AI Writes Broken Code — and How Type Systems Can Fix It
Type errors account for 33.6% of LLM code failures. Hindley-Milner type inference reduces compilation errors by 74.8%. Here's how type-guided generation works in practice.
ExplainerIntroducing Synoema: A Language Machines Can Verify
A tour of the language: pattern matching, algebraic data types, verification contracts, Hindley-Milner inference, and the Cranelift JIT backend. With code examples throughout.
ResultsWhat We Learned Teaching AI a New Language
We ran 10+ LLMs on 9 standard tasks and a 50-task corpus. H1 was disproved. H2 was confirmed with ρ=1.00 Spearman correlation. Here's what the data actually showed.
ExplainerHow a Compiler Catches AI Mistakes Before They Run
Three verification layers: GBNF grammar for syntax, Hindley-Milner types for semantics, contracts for runtime behavior. And error messages designed for machine consumption, not humans.
ResultsFrom Zero to 41%: Building an AI That Writes Working Code
The full journey: language design, corpus generation (5,037 validated programs), QLoRA fine-tuning on AMD hardware, and where 59% of attempts still fail. An honest account.
ExplainerHow We Automated Our Entire Dev Workflow with Claude Code Skills
A directory-based skills system gives us autonomous pipelines, parallel execution, and zero-config automation. Here's how it works and what it enables.
ResearchThe Scientific Method Behind Synoema
12 falsifiable hypotheses. Statistical methodology with Bonferroni correction and Cohen's h effect sizes. A corpus validated by the compiler itself. How we try to do this rigorously.
ResultsIntermediate Results: v6 Fine-Tuning
90.5% run rate on the 7B model — up from 41% baseline. And an honest look at the constructs regression (44.6% vs 52.7% in v5) and what we're doing about it. ChatML format change: what it improved and what it broke.
About This Series
These articles were written for several audiences simultaneously: researchers evaluating the project's scientific rigor, developers curious about the technical decisions, and anyone who has wondered why AI-generated code fails so often. We've tried to make the technical content accessible without sacrificing precision.
All experimental data referenced in these articles is publicly available in the benchmarks/results/ and docs/research/ directories of the repository. If you find an error or want to replicate a result, the scripts are in benchmarks/scripts/.