LLM Integration

Everything Synoema offers for AI agents and code generation pipelines

Synoema is designed from the ground up for LLM code generation — not as an afterthought. Five complementary layers eliminate the gap between LLM output and running, verified code: a stateful MCP server, a local RAG index, constrained decoding via GBNF, a skills system for structured task specs, and a ReAct agent that auto-fixes broken code in a feedback loop.

MCP 2024-11-05 27 tools RAG: 5 scopes potion-base-8M GBNF constrained 11 bundled skills ReAct loop

MCP Server RAG Constrained Decoding Skills ReAct Agent

MCP Server — 27 tools, stateful session, AST cache, auto-inject
RAG — Retrieval-Augmented Generation — build-index, 5 scopes, offline-first
Constrained Decoding — GBNF grammar, sno constrain, ident-aware token mask
Skills System — SKILL.md format, 11 bundled, list/install/activate
ReAct Agent (sno fix) — Thought/Action/Observation, --with-rag, audit log

MCP Server

The Synoema MCP server implements the Model Context Protocol (MCP 2024-11-05) over stdio. Connect it to Claude Desktop, Cursor, Zed, Cline, or any custom agent to get 27 tools for code evaluation, type checking, retrieval, and session management — all backed by a per-connection LRU-500 AST cache so the compiler never recompiles the same file twice.

Why MCP is required for production agents. The stateless CLI (sno run) recompiles from scratch on every call (50–180 ms overhead) with no session state, no retrieval, and no dev intelligence. The MCP server maintains a per-connection ULID session, LRU-500 AST cache, 50-turn transcript window, and all 27 tools across the session lifetime.

Install and connect

# Install MCP binary
sno mcp-install

# Connect to your IDE
sno setup claude --binary    # Claude Desktop
sno setup cursor --binary    # Cursor
sno setup cline              # Cline (VS Code)

Verify: ask your agent "Use the eval tool to compute 2 + 3". If you see 5 : Int, MCP is connected.

Tool categories

Core language tools (3)

eval — single expression with inferred type
typecheck — full program type check
run — execute program, capture stdout

Dev intelligence tools (7)

project_overview, crate_info, file_summary, search_code, get_context_for_edit, doc_query, recipe

RAG retrieval tools (5)

search_corpus, search_docs, search_skills, search_traces, search_unified

Session & package tools

session_info — ULID, cache hit rate, call count
session_history — last N turns (max 50)
search_packages, suggest_packages
get_context, get_state, feedback_loop

Core language tools

Tool	Input	Output
eval	Single Synoema expression, e.g. `[1..10] \|> sum`	Value + inferred type, or structured error JSON
typecheck	Full Synoema program (with `main`)	`main : Type` or structured error with `llm_hint`
run	Full Synoema program (with `main`)	stdout output + final value, or error

Every error response follows a machine-readable schema:

{
  "code": "unbound_variable",
  "severity": "error",
  "message": "Undefined variable: foo",
  "span": {"line": 4, "col": 8, "end_line": 4, "end_col": 11},
  "llm_hint": "Variable 'foo' is not defined. Did you mean 'bar'?",
  "fixability": "easy",
  "did_you_mean": "bar",
  "source_origin": "user"
}

source_origin distinguishes user code ("user"), imported modules ("import:<path>"), and prelude errors ("prelude"). Every error carries an llm_hint — a sentence written for the model, not the human.

Dev intelligence tools

Tool	Input	Output (token budget)
project_overview	—	Crate structure, LOC, test counts (≤300 tok)
crate_info	`crate_name`	Public API: functions, types, structs (≤500 tok)
file_summary	`file` path	Function list with signatures, no bodies (≤300 tok)
search_code	`query`, optional `scope`	Top-5 keyword matches with context (≤400 tok)
get_context_for_edit	`file`, `line`	Enclosing function + ±20 lines context (≤500 tok)
doc_query	`file` path	Description, contracts (`requires`/`ensures`), examples (≤500 tok)
recipe	`task` description	Step-by-step recipe with current line numbers (≤500 tok)

Available recipes: add_operator, add_builtin, add_type, fix_from_error. All budgets are ≤500 tokens for compatibility with 8K–32K context models.

Auto-injection for small models

Models ≤32B often cannot reliably emit search_* tool-use actions, but still benefit from retrieval context. The MCP server can auto-inject a retrieval_context field into responses from typecheck, run, and feedback_loop — transparent to the model, no protocol changes needed.

# ~/.sno/config.toml
[rag.auto_inject]
enabled = true
scopes = ["traces", "corpus"]
top_k = 3
max_chunk_chars = 800

When enabled and the RAG pack is installed, error responses become:

{
  "error": "Type mismatch: expected Int, found String",
  "retrieval_context": {
    "hits": [
      {"scope": "traces", "source": "trace/t42.json", "score": 0.82, "text": "..."},
      {"scope": "corpus", "source": "corpus/add.sno",  "score": 0.78, "text": "..."}
    ],
    "auto_injected": true
  }
}

The middleware silently skips injection when the RAG index is missing. Clients that ignore retrieval_context see no behavioral change.

Session state

Capability	Details
Per-connection ULID session	Unique ID per MCP connection, persisted in SQLite
AstCache LRU-500	Parsed + type-checked ASTs cached per file, 50–180 ms saved per reuse
IncrementalInfer	Type inference state reused across tool calls within the session
Transcript VecDeque-50	Last 50 tool calls with inputs/outputs, queryable via `session_history`
SQLite persistence	Session survives MCP server restart; cache is warm on reconnect

Full MCP reference →

RAG — Retrieval-Augmented Generation

RAG gives LLMs a local knowledge base they can search before generating code. Synoema's RAG stack is Rust-native, offline-first, and opt-in. No Python, no external vector database. It ships as part of the sno CLI and the MCP server.

Install

# Install Synoema with RAG in one command
curl -fsSL https://synoema.tech/install.sh | sh -s -- --with-rag

# Add RAG to an existing install
sno rag install

# Check status
sno rag status
# → Installed: 2026-04-18, 12840 chunks across 5 scopes, ~179 MB

Build a custom index

# Build vector index from your own corpus
sno build-index \
  --sources corpus/,docs/,skills/ \
  --output ~/.sno/rag-index/ \
  --chunk-strategy auto \
  --include-scopes corpus,docs,skills

Five retrieval scopes

Scope	Source	Default k	Typical use
corpus	Fine-tune training data (.sno + ChatML pairs)	5	Find idiomatic patterns for a function shape
docs	Language reference, guides, API docs	5	Answer "how does X work" questions
skills	Bundled skills + installed package SKILL.md files	3	Discover reusable patterns
traces	LLM failure traces with repair examples	5	Find how a past error was fixed
sno	.sno source files in the repo	5	Exact-match on existing definitions

Three retrieval modes

ReAct Agent

sno fix file.sno --with-rag — explicit Thought→Action→Observation loop. Every retrieval call is GBNF-gated and audited.

Auto-Inject

Transparent for small models. When typecheck/run returns an error, the MCP middleware appends retrieval_context automatically.

Raw MCP Tools

Power users and custom agents call search_corpus / search_unified directly via JSON-RPC from any MCP client.

Embedder: potion-base-8M

Model2Vec static embedder — 8M parameters, 256-dim output, ~29 MB. Pure Rust, no ONNX runtime, no network I/O. Bundled directly into the binary via include_bytes! so RAG works out-of-the-box without any install step. Default builds ship a deterministic StubEmbedder so cargo test --all stays hermetic.

Full RAG reference →

Constrained Decoding

Constrained decoding ensures LLMs generate syntactically valid Synoema code by construction — not by post-hoc validation. At each generation step, a token mask is computed from the current parser state: only tokens that could appear next in a valid Synoema program are allowed. Invalid tokens get probability 0 before sampling.

How it works

GBNF grammar

Synoema syntax is expressed as a GBNF grammar (an extension of BNF used by llama.cpp and compatible inference engines). The grammar defines what token sequences are valid at each parser state.

Ident-aware masking

Beyond syntax, the mask is context-aware: only identifiers that are in scope at the current cursor position are allowed. Hallucinated variable names get probability 0.

sno constrain

# Read partial Synoema source from stdin, output JSON token mask
echo 'f x = x +' | sno constrain
# → {"valid_tokens": [42, 137, 2048, ...], "context": "expr", "scope": ["x", "f"]}

The output is a JSON object with the set of valid next-token IDs (cl100k_base vocabulary), the current parser context, and the identifiers in scope. Inference engines consume this mask directly.

IoT Rules GBNF

For the IoT vertical, Synoema ships a specialized GBNF grammar that constrains generation to the IoT rule subset:

-- Example rule generated within the GBNF constraint
rule_fan_control : Int -> Bool
  requires temp > -50
  ensures result == (temp > 30)
rule_fan_control temp = temp > 30

Grammar: lang/tools/constrained/synoema-iot-rules.gbnf. Used by cloud_compile.py when calling the Anthropic API or a local Ollama model.

Benefits

Without constraints	With GBNF + ident masking
LLM may hallucinate operators not in the language	Only valid BPE-aligned operators allowed
LLM may reference undefined variables	Only in-scope identifiers allowed
First-pass parse failure rate >30% on small models	Syntactically valid by construction
Re-generation required on parse failure	Single pass, compiler sees valid input

Every Synoema operator is exactly 1 BPE token (cl100k_base). This is not a coincidence — the language was designed with tokenizer alignment as a first-class constraint, making the grammar-to-token-mask translation exact.

Skills System

Skills are structured LLM task specifications in SKILL.md format. A skill tells an LLM exactly how to perform a specific Synoema coding task: what inputs to expect, what output format to produce, what constraints apply, and which tools to call. Skills eliminate prompt engineering from repeated tasks.

SKILL.md format

---
name: add-builtin
version: 1.0.0
description: Add a new builtin function to the Synoema interpreter
inputs:
  - name: function_name
    type: string
  - name: signature
    type: string
outputs:
  - type: patch
    format: unified-diff
tools:
  - recipe(add_builtin)
  - search_code
  - typecheck
---

## Task

Add a new builtin `{{function_name}}` with signature `{{signature}}` to the interpreter.

## Steps
1. Call `recipe(add_builtin)` to get current line numbers
2. ...

Discovery hierarchy

Scope	Location	Priority
Workspace	`.sno/skills/` in project root	Highest (overrides others)
User-global	`~/.sno/skills/`	Middle
Bundled	Compiled into `synoema-mcp` binary	Lowest (always present)

11 bundled skills

The MCP binary ships 11 skills covering the most common agent tasks:

Compiler extension

add-builtin, add-operator, add-type — extend the language safely

Debugging

fix-from-error, explain-type-error — diagnose and repair compiler errors

Patterns

concurrency, iot-rule, tls-https, package-create and more — domain-specific task templates

CLI commands

# List all available skills (workspace + user-global + bundled)
sno skill list

# Install a skill from a local file or path
sno skill install ./my-skill/SKILL.md

# Remove a skill
sno skill remove my-skill-name

MCP tools

Tool	When to use
list_skills	Show all available skills with description and input schema
activate_skill	Load a skill's full spec into the session context; returns structured task instructions

When an agent calls activate_skill("add-builtin"), the MCP server returns the full SKILL.md content formatted for the model — inputs resolved, tool list ready, constraints highlighted. The agent then follows the structured task without improvising.

ReAct Agent (`sno fix`)

The ReAct agent is a Thought→Action→Observation loop that automatically repairs broken Synoema files. It combines the Synoema compiler (as a perfect verifier) with an LLM (for creative repairs) and optional RAG retrieval (for context). The loop terminates when the file compiles or the max-turns limit is reached.

Basic usage

# Auto-fix a broken .sno file (max 5 turns)
sno fix broken.sno

# Increase turn limit for complex fixes
sno fix broken.sno --max-turns 10

# Enable RAG retrieval (requires MCP server + RAG pack)
sno fix broken.sno --with-rag

# Preview without writing changes
sno fix broken.sno --dry-run

How the loop works

Thought: I see a type error on line 4. Let me check the signature.
Action: typecheck
Arguments: {"code": "..."}
Observation: {"error": "expected Int, found String", "llm_hint": "..."}

Thought: The variable `x` is String but the function expects Int. I'll add a conversion.
Action: apply_patch
Arguments: {"file": "broken.sno", "patch": "--- ...\n+++ ..."}
Observation: {"result": "patch applied"}

Thought: Let me verify the fix compiles.
Action: typecheck
Arguments: {"code": "..."}
Observation: {"type": "main : Unit"}  ← success, loop exits

RAG retrieval actions (--with-rag)

When --with-rag is active, four additional actions become available in the loop:

Action	When the agent calls it
search_corpus	Looking for idiomatic patterns for the broken construct
search_skills	Checking if a bundled skill covers the task
search_docs	Unclear language semantics need reference lookup
search_traces	Recognizing a known error pattern with a recorded fix

The GBNF grammar gates which actions are available: retrieval actions only appear in the mask when --with-rag is set. Without the flag, the agent cannot accidentally call them even if the LLM tries to generate the action name.

Audit log

Every retrieval call is recorded to ~/.sno/audit_retrieval.jsonl:

{"ts": "2026-04-24T10:32:01Z", "session_id": "01HW...", "tool": "search_traces",
 "query": "exhaustiveness case Nothing", "hits": 3, "top_score": 0.82}

Use this log to understand which retrieval queries succeed and which need corpus expansion.

HITL gate

When the agent proposes a patch, it pauses for human-in-the-loop (HITL) confirmation if the diff touches more than a configurable threshold of lines. The diff is shown in the terminal with a y/n prompt. Set SNO_FIX_HITL_THRESHOLD=0 to disable (fully autonomous mode).

Architecture

react_agent.py

Python 3.9+ script (lang/tools/llm/react_agent.py). Requires no extra deps beyond the Anthropic SDK. Calls the Synoema CLI for typecheck/run; calls the MCP server for retrieval when --with-rag is set.

mcp_client.py

Hermetic JSON-RPC MCP stdio client (lang/tools/llm/mcp_client.py). Spawns synoema-mcp as a subprocess, sends tool calls, receives responses. Falls back gracefully with a single warning on stderr if MCP is unavailable.