LLM Integration
Everything Synoema offers for AI agents and code generation pipelines
Synoema is designed from the ground up for LLM code generation — not as an afterthought. Five complementary layers eliminate the gap between LLM output and running, verified code: a stateful MCP server, a local RAG index, constrained decoding via GBNF, a skills system for structured task specs, and a ReAct agent that auto-fixes broken code in a feedback loop.
On this page
- MCP Server — 27 tools, stateful session, AST cache, auto-inject
- RAG — Retrieval-Augmented Generation — build-index, 5 scopes, offline-first
- Constrained Decoding — GBNF grammar,
sno constrain, ident-aware token mask - Skills System — SKILL.md format, 11 bundled, list/install/activate
- ReAct Agent (
sno fix) — Thought/Action/Observation, --with-rag, audit log
MCP Server
The Synoema MCP server implements the Model Context Protocol (MCP 2024-11-05) over stdio. Connect it to Claude Desktop, Cursor, Zed, Cline, or any custom agent to get 27 tools for code evaluation, type checking, retrieval, and session management — all backed by a per-connection LRU-500 AST cache so the compiler never recompiles the same file twice.
Why MCP is required for production agents. The stateless CLI (sno run) recompiles from scratch on every call (50–180 ms overhead) with no session state, no retrieval, and no dev intelligence. The MCP server maintains a per-connection ULID session, LRU-500 AST cache, 50-turn transcript window, and all 27 tools across the session lifetime.
Install and connect
# Install MCP binary
sno mcp-install
# Connect to your IDE
sno setup claude --binary # Claude Desktop
sno setup cursor --binary # Cursor
sno setup cline # Cline (VS Code)
Verify: ask your agent "Use the eval tool to compute 2 + 3". If you see 5 : Int, MCP is connected.
Tool categories
Core language tools (3)
eval — single expression with inferred type
typecheck — full program type check
run — execute program, capture stdout
Dev intelligence tools (7)
project_overview, crate_info, file_summary, search_code, get_context_for_edit, doc_query, recipe
RAG retrieval tools (5)
search_corpus, search_docs, search_skills, search_traces, search_unified
Session & package tools
session_info — ULID, cache hit rate, call count
session_history — last N turns (max 50)
search_packages, suggest_packages
get_context, get_state, feedback_loop
Core language tools
| Tool | Input | Output |
|---|---|---|
| eval | Single Synoema expression, e.g. [1..10] |> sum | Value + inferred type, or structured error JSON |
| typecheck | Full Synoema program (with main) | main : Type or structured error with llm_hint |
| run | Full Synoema program (with main) | stdout output + final value, or error |
Every error response follows a machine-readable schema:
{
"code": "unbound_variable",
"severity": "error",
"message": "Undefined variable: foo",
"span": {"line": 4, "col": 8, "end_line": 4, "end_col": 11},
"llm_hint": "Variable 'foo' is not defined. Did you mean 'bar'?",
"fixability": "easy",
"did_you_mean": "bar",
"source_origin": "user"
}
source_origin distinguishes user code ("user"), imported modules ("import:<path>"), and prelude errors ("prelude"). Every error carries an llm_hint — a sentence written for the model, not the human.
Dev intelligence tools
| Tool | Input | Output (token budget) |
|---|---|---|
| project_overview | — | Crate structure, LOC, test counts (≤300 tok) |
| crate_info | crate_name | Public API: functions, types, structs (≤500 tok) |
| file_summary | file path | Function list with signatures, no bodies (≤300 tok) |
| search_code | query, optional scope | Top-5 keyword matches with context (≤400 tok) |
| get_context_for_edit | file, line | Enclosing function + ±20 lines context (≤500 tok) |
| doc_query | file path | Description, contracts (requires/ensures), examples (≤500 tok) |
| recipe | task description | Step-by-step recipe with current line numbers (≤500 tok) |
Available recipes: add_operator, add_builtin, add_type, fix_from_error. All budgets are ≤500 tokens for compatibility with 8K–32K context models.
Auto-injection for small models
Models ≤32B often cannot reliably emit search_* tool-use actions, but still benefit from retrieval context. The MCP server can auto-inject a retrieval_context field into responses from typecheck, run, and feedback_loop — transparent to the model, no protocol changes needed.
# ~/.sno/config.toml
[rag.auto_inject]
enabled = true
scopes = ["traces", "corpus"]
top_k = 3
max_chunk_chars = 800
When enabled and the RAG pack is installed, error responses become:
{
"error": "Type mismatch: expected Int, found String",
"retrieval_context": {
"hits": [
{"scope": "traces", "source": "trace/t42.json", "score": 0.82, "text": "..."},
{"scope": "corpus", "source": "corpus/add.sno", "score": 0.78, "text": "..."}
],
"auto_injected": true
}
}
The middleware silently skips injection when the RAG index is missing. Clients that ignore retrieval_context see no behavioral change.
Session state
| Capability | Details |
|---|---|
| Per-connection ULID session | Unique ID per MCP connection, persisted in SQLite |
| AstCache LRU-500 | Parsed + type-checked ASTs cached per file, 50–180 ms saved per reuse |
| IncrementalInfer | Type inference state reused across tool calls within the session |
| Transcript VecDeque-50 | Last 50 tool calls with inputs/outputs, queryable via session_history |
| SQLite persistence | Session survives MCP server restart; cache is warm on reconnect |
RAG — Retrieval-Augmented Generation
RAG gives LLMs a local knowledge base they can search before generating code. Synoema's RAG stack is Rust-native, offline-first, and opt-in. No Python, no external vector database. It ships as part of the sno CLI and the MCP server.
Install
# Install Synoema with RAG in one command
curl -fsSL https://synoema.tech/install.sh | sh -s -- --with-rag
# Add RAG to an existing install
sno rag install
# Check status
sno rag status
# → Installed: 2026-04-18, 12840 chunks across 5 scopes, ~179 MB
Build a custom index
# Build vector index from your own corpus
sno build-index \
--sources corpus/,docs/,skills/ \
--output ~/.sno/rag-index/ \
--chunk-strategy auto \
--include-scopes corpus,docs,skills
Five retrieval scopes
| Scope | Source | Default k | Typical use |
|---|---|---|---|
| corpus | Fine-tune training data (.sno + ChatML pairs) | 5 | Find idiomatic patterns for a function shape |
| docs | Language reference, guides, API docs | 5 | Answer "how does X work" questions |
| skills | Bundled skills + installed package SKILL.md files | 3 | Discover reusable patterns |
| traces | LLM failure traces with repair examples | 5 | Find how a past error was fixed |
| sno | .sno source files in the repo | 5 | Exact-match on existing definitions |
Three retrieval modes
ReAct Agent
sno fix file.sno --with-rag — explicit Thought→Action→Observation loop. Every retrieval call is GBNF-gated and audited.
Auto-Inject
Transparent for small models. When typecheck/run returns an error, the MCP middleware appends retrieval_context automatically.
Raw MCP Tools
Power users and custom agents call search_corpus / search_unified directly via JSON-RPC from any MCP client.
Embedder: potion-base-8M
Model2Vec static embedder — 8M parameters, 256-dim output, ~29 MB. Pure Rust, no ONNX runtime, no network I/O. Bundled directly into the binary via include_bytes! so RAG works out-of-the-box without any install step. Default builds ship a deterministic StubEmbedder so cargo test --all stays hermetic.
Constrained Decoding
Constrained decoding ensures LLMs generate syntactically valid Synoema code by construction — not by post-hoc validation. At each generation step, a token mask is computed from the current parser state: only tokens that could appear next in a valid Synoema program are allowed. Invalid tokens get probability 0 before sampling.
How it works
GBNF grammar
Synoema syntax is expressed as a GBNF grammar (an extension of BNF used by llama.cpp and compatible inference engines). The grammar defines what token sequences are valid at each parser state.
Ident-aware masking
Beyond syntax, the mask is context-aware: only identifiers that are in scope at the current cursor position are allowed. Hallucinated variable names get probability 0.
sno constrain
# Read partial Synoema source from stdin, output JSON token mask
echo 'f x = x +' | sno constrain
# → {"valid_tokens": [42, 137, 2048, ...], "context": "expr", "scope": ["x", "f"]}
The output is a JSON object with the set of valid next-token IDs (cl100k_base vocabulary), the current parser context, and the identifiers in scope. Inference engines consume this mask directly.
IoT Rules GBNF
For the IoT vertical, Synoema ships a specialized GBNF grammar that constrains generation to the IoT rule subset:
-- Example rule generated within the GBNF constraint
rule_fan_control : Int -> Bool
requires temp > -50
ensures result == (temp > 30)
rule_fan_control temp = temp > 30
Grammar: lang/tools/constrained/synoema-iot-rules.gbnf. Used by cloud_compile.py when calling the Anthropic API or a local Ollama model.
Benefits
| Without constraints | With GBNF + ident masking |
|---|---|
| LLM may hallucinate operators not in the language | Only valid BPE-aligned operators allowed |
| LLM may reference undefined variables | Only in-scope identifiers allowed |
| First-pass parse failure rate >30% on small models | Syntactically valid by construction |
| Re-generation required on parse failure | Single pass, compiler sees valid input |
Every Synoema operator is exactly 1 BPE token (cl100k_base). This is not a coincidence — the language was designed with tokenizer alignment as a first-class constraint, making the grammar-to-token-mask translation exact.
Skills System
Skills are structured LLM task specifications in SKILL.md format. A skill tells an LLM exactly how to perform a specific Synoema coding task: what inputs to expect, what output format to produce, what constraints apply, and which tools to call. Skills eliminate prompt engineering from repeated tasks.
SKILL.md format
---
name: add-builtin
version: 1.0.0
description: Add a new builtin function to the Synoema interpreter
inputs:
- name: function_name
type: string
- name: signature
type: string
outputs:
- type: patch
format: unified-diff
tools:
- recipe(add_builtin)
- search_code
- typecheck
---
## Task
Add a new builtin `{{function_name}}` with signature `{{signature}}` to the interpreter.
## Steps
1. Call `recipe(add_builtin)` to get current line numbers
2. ...
Discovery hierarchy
| Scope | Location | Priority |
|---|---|---|
| Workspace | .sno/skills/ in project root | Highest (overrides others) |
| User-global | ~/.sno/skills/ | Middle |
| Bundled | Compiled into synoema-mcp binary | Lowest (always present) |
11 bundled skills
The MCP binary ships 11 skills covering the most common agent tasks:
Compiler extension
add-builtin, add-operator, add-type — extend the language safely
Debugging
fix-from-error, explain-type-error — diagnose and repair compiler errors
Patterns
concurrency, iot-rule, tls-https, package-create and more — domain-specific task templates
CLI commands
# List all available skills (workspace + user-global + bundled)
sno skill list
# Install a skill from a local file or path
sno skill install ./my-skill/SKILL.md
# Remove a skill
sno skill remove my-skill-name
MCP tools
| Tool | When to use |
|---|---|
| list_skills | Show all available skills with description and input schema |
| activate_skill | Load a skill's full spec into the session context; returns structured task instructions |
When an agent calls activate_skill("add-builtin"), the MCP server returns the full SKILL.md content formatted for the model — inputs resolved, tool list ready, constraints highlighted. The agent then follows the structured task without improvising.
ReAct Agent (sno fix)
The ReAct agent is a Thought→Action→Observation loop that automatically repairs broken Synoema files. It combines the Synoema compiler (as a perfect verifier) with an LLM (for creative repairs) and optional RAG retrieval (for context). The loop terminates when the file compiles or the max-turns limit is reached.
Basic usage
# Auto-fix a broken .sno file (max 5 turns)
sno fix broken.sno
# Increase turn limit for complex fixes
sno fix broken.sno --max-turns 10
# Enable RAG retrieval (requires MCP server + RAG pack)
sno fix broken.sno --with-rag
# Preview without writing changes
sno fix broken.sno --dry-run
How the loop works
Thought: I see a type error on line 4. Let me check the signature.
Action: typecheck
Arguments: {"code": "..."}
Observation: {"error": "expected Int, found String", "llm_hint": "..."}
Thought: The variable `x` is String but the function expects Int. I'll add a conversion.
Action: apply_patch
Arguments: {"file": "broken.sno", "patch": "--- ...\n+++ ..."}
Observation: {"result": "patch applied"}
Thought: Let me verify the fix compiles.
Action: typecheck
Arguments: {"code": "..."}
Observation: {"type": "main : Unit"} ← success, loop exits
RAG retrieval actions (--with-rag)
When --with-rag is active, four additional actions become available in the loop:
| Action | When the agent calls it |
|---|---|
| search_corpus | Looking for idiomatic patterns for the broken construct |
| search_skills | Checking if a bundled skill covers the task |
| search_docs | Unclear language semantics need reference lookup |
| search_traces | Recognizing a known error pattern with a recorded fix |
The GBNF grammar gates which actions are available: retrieval actions only appear in the mask when --with-rag is set. Without the flag, the agent cannot accidentally call them even if the LLM tries to generate the action name.
Audit log
Every retrieval call is recorded to ~/.sno/audit_retrieval.jsonl:
{"ts": "2026-04-24T10:32:01Z", "session_id": "01HW...", "tool": "search_traces",
"query": "exhaustiveness case Nothing", "hits": 3, "top_score": 0.82}
Use this log to understand which retrieval queries succeed and which need corpus expansion.
HITL gate
When the agent proposes a patch, it pauses for human-in-the-loop (HITL) confirmation if the diff touches more than a configurable threshold of lines. The diff is shown in the terminal with a y/n prompt. Set SNO_FIX_HITL_THRESHOLD=0 to disable (fully autonomous mode).
Architecture
react_agent.py
Python 3.9+ script (lang/tools/llm/react_agent.py). Requires no extra deps beyond the Anthropic SDK. Calls the Synoema CLI for typecheck/run; calls the MCP server for retrieval when --with-rag is set.
mcp_client.py
Hermetic JSON-RPC MCP stdio client (lang/tools/llm/mcp_client.py). Spawns synoema-mcp as a subprocess, sends tool calls, receives responses. Falls back gracefully with a single warning on stderr if MCP is unavailable.
Further reading
- MCP, RAG & IoT — full reference including IoT pipeline and wave-2 verticals
- IoT Platform — 3-tier device support, WASM artifacts, 6 verticals
- Language Reference — syntax, types, builtins, contracts
- Skills Automation article — how skills reduce prompt engineering overhead
- Zero to 41% — fine-tuning a 3B model from scratch on Synoema IoT rules
- sno code — AI Agent — terminal-based coding agent with multi-provider profiles, sessions, and worktree isolation