MCP • RAG • IoT
The Synoema LLM Toolchain — complete reference
Three interconnected layers that make Synoema an LLM-native platform: the MCP server gives any AI agent 20+ tools for code evaluation and retrieval; RAG provides a local vector index over 5 corpora so models find idiomatic examples without guessing; and the IoT platform closes the loop from LLM prompt to compiled artifact on Raspberry Pi, STM32, or nRF5340.
On this page
- MCP Server — eval, typecheck, run, dev intelligence, RAG tools, auto-inject, session
- MCP Installation & Connection — npx, binary, Claude Desktop, Cursor
- RAG — Retrieval-Augmented Generation — architecture, 5 scopes, ReAct, auto-inject
- RAG Installation & Usage — sno rag install, status, update
- IoT Platform — 3 tiers, WASM pipeline, 6 verticals
- LLM → IoT Pipeline — cloud_compile.py, GBNF, cloud vs local model
- How They Fit Together — the full integrated picture
MCP Server
The Synoema MCP server implements the Model Context Protocol (MCP 2024-11-05) over stdio. It integrates the Synoema compiler, evaluator, type checker, and RAG retrieval layer into any MCP-compatible client — Claude Desktop, Cursor, Zed, or a custom agent.
Why MCP is required for LLM agents. The stateless CLI (sno run) recompiles from scratch on every call (50–180 ms overhead) and has no access to session state, dev intelligence, or retrieval. The MCP server maintains a per-connection LRU-500 AST cache, 7 dev intelligence tools, 5 RAG retrieval tools, and a 50-turn transcript window across the session lifetime.
Core Language Tools
| Tool | Input | Output |
|---|---|---|
| eval | Single Synoema expression, e.g. [1..10] |> sum | Value + inferred type, or structured error JSON |
| typecheck | Full Synoema program (with main) | main : Type or structured error with llm_hint |
| run | Full Synoema program (with main) | stdout output + final value, or error |
Error JSON shape — every error from eval, typecheck, and run follows a machine-readable schema that LLMs can parse and act on:
{
"code": "unbound_variable",
"severity": "error",
"message": "Undefined variable: foo",
"span": {"line": 4, "col": 8, "end_line": 4, "end_col": 11},
"llm_hint": "Variable 'foo' is not defined. Did you mean 'bar'?",
"fixability": "easy",
"did_you_mean": "bar",
"source_origin": "user"
}
source_origin distinguishes user code ("user"), imported modules ("import:<path>"), and prelude bugs ("prelude"). Every error carries an llm_hint — a sentence written for the model, not the human.
Dev Intelligence Tools
Seven tools expose a live index of the Synoema compiler source, powered by syn AST parsing. Line numbers and API surfaces are always current — no stale docs.
| Tool | Input | Output (budget) |
|---|---|---|
| project_overview | — | Crate structure, LOC, test counts (≤300 tok) |
| crate_info | crate_name | Public API: functions, types, structs (≤500 tok) |
| file_summary | file path | Function list with signatures, no bodies (≤300 tok) |
| search_code | query, optional scope | Top-5 keyword matches with context (≤400 tok) |
| get_context_for_edit | file, line | Enclosing function + ±20 lines context (≤500 tok) |
| doc_query | file path | Structured docs: description, contracts (requires/ensures), examples (≤500 tok) |
| recipe | task description | Step-by-step recipe with current line numbers (≤500 tok) |
All budgets are ≤500 tokens for compatibility with small context models (8K–32K). Available recipes: add_operator, add_builtin, add_type, fix_from_error.
RAG Retrieval Tools
Five tools perform semantic retrieval over the installed RAG index. See the RAG section for index details.
| Tool | Scope | Default k / max k |
|---|---|---|
| search_corpus | Fine-tune training corpus (.sno + ChatML) | 5 / 20 |
| search_docs | Docs (LANGUAGE.md, guides, API) | 5 / 20 |
| search_skills | Bundled skills + installed packages | 3 / 10 |
| search_traces | LLM failure traces with repair examples | 5 / 20 |
| search_unified | All 5 scopes (filterable via scopes param) | 10 / 30 |
All retrieval tools degrade gracefully when the RAG index is absent — they return a structured error and the server continues serving all other tools without restart.
Auto-Injection for Small Models
Models ≤7B often cannot reliably emit structured search_* tool-use actions, but still benefit from retrieval context. The MCP server can auto-inject a retrieval_context field into responses from typecheck, run, and feedback_loop — transparent to the model, no protocol changes needed.
Enable in ~/.sno/config.toml:
[rag.auto_inject]
enabled = true
scopes = ["traces", "corpus"] # any of corpus, docs, skills, traces, sno
top_k = 3
max_chunk_chars = 800
# Per-tool override:
[rag.auto_inject.per_tool.typecheck]
top_k = 2
scopes = ["traces"]
When the tool response contains an error, the middleware appends:
{
"error": "Type mismatch: expected Int, found String",
"retrieval_context": {
"query": "Type mismatch: expected Int, found String",
"hits": [
{"scope": "traces", "source": "trace/t42.json", "score": 0.82, "text": "..."},
{"scope": "corpus", "source": "corpus/add.sno", "score": 0.78, "text": "..."}
],
"auto_injected": true,
"top_k": 3
}
}
The middleware silently skips injection if the RAG index is missing. Clients that ignore retrieval_context see no behavioral change.
Session & State Tools
| Tool | Output |
|---|---|
| get_context | Phase-appropriate documentation: full LLM ref when writing code, error context when debugging (≤1800 tok) |
| get_state | Current dev phase + last 5 state transitions (JSON) |
| session_info | Session ULID, cache hit rate, tool call count, connection age |
| session_history | Last N tool calls with inputs and outputs (transcript window, max 50 turns) |
Package Discovery Tools
| Tool | What it does |
|---|---|
| search_packages | Search registry + installed packages by keyword. Returns install command and import snippet. |
| suggest_packages | Extract unknown identifiers from code and suggest packages that provide them. |
Self-Report Tools (LLM → author feedback)
Four tools let the LLM record gaps, contradictions, or ambiguities it encounters. These feed the research/llm-failures/ channel — not for end users, but for improving the language and docs.
| Tool | When to call |
|---|---|
| flag_doc_gap | Expected documentation on a topic but didn't find it |
| flag_doc_contradiction | A doc quote contradicts observed behavior |
| flag_ambiguity | Multiple valid interpretations; records which one was chosen |
| request_clarification | Task context is unclear; records the question |
Telemetry is off by default. Enable local-only collection in ~/.sno/config.toml:
[telemetry]
llm_failures = "local-only"
MCP Installation & Connection
Install via npx (recommended)
# No installation required — downloads automatically
npx synoema-mcp
# Claude Desktop: ~/Library/Application Support/Claude/claude_desktop_config.json
{
"mcpServers": {
"synoema": {
"command": "npx",
"args": ["synoema-mcp"]
}
}
}
Install via sno CLI (easiest after sno is installed)
sno mcp-install # installs binary to ~/.sno/bin/
sno setup claude --binary # writes Claude Desktop config
sno setup cursor --binary # writes Cursor config
Verify connection
Open Claude Desktop and ask: "Use the eval tool to compute 2 + 3". If you see 5 : Int, MCP is connected. For Cursor, open the MCP panel and look for the synoema server in the tool list.
Connect to other clients
# Cursor — .cursor/mcp.json
{ "synoema": { "command": "synoema-mcp" } }
# Zed — settings.json
{
"context_servers": {
"synoema": { "command": { "path": "synoema-mcp", "args": [] } }
}
}
# Manual test (stdio)
echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{
"protocolVersion":"2024-11-05","capabilities":{},
"clientInfo":{"name":"test","version":"0"}}}' | synoema-mcp
Traffic logging
# Enable logging for one run
SYNOEMA_MCP_TRAFFIC=1 synoema-mcp
# Or in ~/.sno/config.toml
[logging]
enabled = true
level = "errors" # all | errors | tools
dir = "~/.sno/mcp-traffic"
RAG — Retrieval-Augmented Generation
RAG gives LLMs a local knowledge base they can search before generating code. Synoema's RAG stack is Rust-native, offline-first, and opt-in. No Python, no external vector database. It ships as part of the sno CLI and the MCP server.
Architecture
┌──────────────────────────────────────────────────────────────────┐
│ OFFLINE (build phase) │
│ │
│ source tree sno build-index vector index │
│ ┌───────────┐ ──────────────────────▶ ┌──────────────┐ │
│ │ corpus │ 5 chunkers │ chunks.jsonl │ │
│ │ docs/ │ jina-code-v2 │ vectors.bin │ │
│ │ skills/ │ int8 quantized │ MANIFEST.json│ │
│ │ traces/ │ └──────────────┘ │
│ │ .sno files│ │
│ └───────────┘ │
└──────────────────────────────────────────────────────────────────┘
│
sno rag install
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ RUNTIME (MCP server) │
│ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ synoema-mcp ── search_corpus / search_docs / │ │
│ │ search_skills / search_traces / │ │
│ │ search_unified │ │
│ └─────────────────────┬─────────────────────┬────────────┘ │
│ │ │ │
│ ┌──────────────▼──┐ ┌─────────▼──────────────┐ │
│ │ sno fix │ │ auto_inject │ │
│ │ --with-rag │ │ middleware │ │
│ │ (ReAct loop) │ │ (transparent for │ │
│ │ explicit search │ │ small models) │ │
│ └──────────────────┘ └────────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘
Five Scopes
| Scope | Source | Chunking strategy | Typical use |
|---|---|---|---|
| corpus | Fine-tune training data (.sno + ChatML pairs) | One chunk per JSONL record | Find idiomatic patterns for a function shape |
| docs | Language reference, guides, API docs | Split at H2 headings, capped at 2 KB | Answer "how does X work" questions |
| skills | Bundled skills + installed package SKILL.md files | Whole SKILL.md per chunk | Discover reusable patterns (concurrency, IoT, etc.) |
| traces | LLM failure traces with repair examples | One chunk per trace record | Find how a past error was fixed |
| sno | .sno source files in the repo | One chunk per top-level definition (parser AST) | Exact-match on existing definitions |
Embedder
Default embedder: jina-code-v2 (568M parameters, 768-dim output, int8 quantized, ~160 MB on disk). It uses a custom tokenizer that preserves Synoema operators (|>, <>, :=, etc.) as single units, matching the language's BPE-aware surface.
Default builds ship a deterministic StubEmbedder so that cargo test --all stays hermetic on machines without the ONNX runtime. Production binaries include the real embedder behind --features=synoema-embed/inference.
Index format
Brute-force cosine similarity over the full vector set. On the current corpus (~13k chunks) a single query takes ~50 ms on a 2023 M2 laptop. HNSW is deferred until the corpus exceeds ~20k chunks or query rate exceeds 50 qps — the dependency cost is not justified yet.
Three retrieval modes
ReAct Agent
sno fix file.sno --with-rag — wires 4 retrieval actions into a Thought→Action→Observation loop. The model explicitly calls search_traces, search_corpus, etc. Every call is GBNF-gated and audited in ~/.sno/audit_retrieval.jsonl.
Auto-Inject
Transparent for small models (3B, 1.5B). When typecheck or run returns an error, the middleware appends a retrieval_context field with relevant hits. The model reads context without knowing RAG exists.
Raw MCP Tools
Power users and custom agents call search_corpus / search_unified directly via JSON-RPC. Useful for IDE integrations, pre-generation context loading, or custom ReAct frameworks.
ReAct retrieval trace example
Thought: I see a non-exhaustive case error. Let me search traces for similar fixes.
Action: search_traces
Arguments: {"query": "exhaustiveness case Nothing", "k": 3}
Observation: [
{"source": "traces/t42.json", "score": 0.82, "text": "case opt of\n Just x -> …\n Nothing -> 0"}
]
Thought: Found the pattern. Adding the missing branch.
Action: apply_patch
Arguments: {"file": "broken.sno", "patch": "…"}
RAG Installation & Usage
Install
# Fastest: install Synoema with RAG in one command
curl -fsSL https://synoema.tech/install.sh | sh -s -- --with-rag
# PowerShell (Windows)
.\install.ps1 -WithRag
# Add RAG to an existing install
sno rag install
# Check what is installed
sno rag status
# → Installed: 2026-04-18, 12840 chunks across 5 scopes, ~179 MB
# Update when a new pack is published
sno rag update
# Remove everything
sno rag remove
sno rag install downloads the pack manifest, verifies SHA-256, extracts to ~/.sno/models/embed/<date>/, and flips an atomic symlink at ~/.sno/models/embed/current/ — in-flight tools never see a half-updated index.
Troubleshooting
| Symptom | Cause | Fix |
|---|---|---|
sno rag status says not installed | Pack never downloaded | sno rag install |
rag_model_unavailable in MCP logs | ONNX feature missing or model not extracted | sno rag remove && sno rag install |
| Retrieval >500 ms/query | Running StubEmbedder instead of real model | Rebuild with --features=synoema-embed/inference |
[warning] MCP unavailable in sno fix --with-rag | MCP server not reachable | Run sno mcp-install and check config |
Run sno doctor for a single-command health report across all components including the RAG pack, embedder path, and index integrity.
IoT Platform
Synoema's IoT platform enables LLM-generated automation rules to run on embedded hardware — from bare Cortex-M MCUs to Raspberry Pi — through a 3-tier device model and a WASM-first compilation strategy.
Three Deployment Tiers
Tier 0: Bare MCU Tier 1: RTOS / wasm3 Tier 2: Linux Edge
────────────────── ──────────────────────── ──────────────────
Cortex-M0/M3, <64 KB ESP32, STM32, ≥128 KB RPi, aarch64 SBC
sno wasm → .wasm sno wasm → .wasm sno build --native
│ │ │
C host + wasm3 embed wasm3 on-device Cranelift ObjectModule
integer-only rules floats + contracts full language
aot_thumbv7m.rs (partial) wasm_codegen.rs v2+v3 aot_aarch64.rs
→ blocked: Cranelift ARM32 + wasm_runtime.rs → shipped
Target hardware: Target hardware: Target hardware:
RP2040, STM32F0/F1 ESP32, STM32F4/H7 Raspberry Pi (all)
nRF5340 (Zephyr) nRF5340, Arduino Jetson, BeagleBone
<64 KiB flash ≥128 KiB flash x86/aarch64 Linux
WASM v3 Feature Matrix
| Feature | v2 | v3 records/ADT | v3 floats | v3 contracts | Deferred |
|---|---|---|---|---|---|
| Integer arithmetic | ✓ | ✓ | ✓ | ✓ | — |
| Strings, Lists, Closures | ✓ | ✓ | ✓ | ✓ | Nil/Cons deep patterns |
| Perceus reference counting | ✓ | ✓ | ✓ | ✓ | — |
| Records & ADT constructors | — | ✓ | ✓ | ✓ | Pattern nesting >1 level |
| Floats + math builtins | — | — | ✓ | ✓ | — |
requires/ensures contracts | — | — | — | ✓ | — |
| Host imports (GPIO, I2C, SPI) | — | — | — | — | wasm-host-imports |
Contracts at compile time
requires and ensures clauses compile to WebAssembly unreachable traps — violations are caught at runtime before they propagate. This mirrors the JIT contract enforcement via Cranelift trapif:
-- Synoema rule with safety contracts
rule_overpressure : Int -> Bool
requires pressure > 0
ensures result == (pressure > 800)
rule_overpressure pressure = pressure > 800
; WASM output (simplified)
;; requires check: trap if pressure <= 0
i64.const 0
call $__int_unbox
i64.gt_s
i64.eqz
if unreachable end
;; function body…
;; ensures check: trap if result != (pressure > 800)
…
Six Vertical MVPs
| Vertical | Wave | Platform | Rules | sno check | sno wasm | Mean B |
|---|---|---|---|---|---|---|
| Home automation | 1 | RPi aarch64 | 5 | 5/5 | 5/5 | 74.2 |
| Industrial safety | 1 | STM32 | 5 | 5/5 | 4/5 | 87.4 |
| Wearable health | 1 | nRF5340 | 5 | 5/5 | 5/5 | 84.6 |
| Automotive | 2 | STM32 | 5 | 5/5 | 5/5 | ~110 |
| Agriculture | 2 | ESP32/RPi | 5 | 5/5 | 5/5 | ~98 |
| Healthcare | 2 | nRF5340/RPi | 5 | 5/5 | 5/5 | ~114 |
Wave 2 aggregate: 30/30 check (100%) • 29/30 WASM (96.7%) • mean 200 B. Known failure: rule_flow_min_alarm — Nil list pattern not yet registered in WASM v3 ctor_tags; deferred to wasm-host-imports.
Honest deferrals. Native Thumb2 AOT (aot_thumbv7m.rs) is scaffolded but blocked on the Cranelift ARM32 backend. GPIO host imports for WASM (gpio_* builtins inside .wasm) are deferred to the wasm-host-imports change. Real wasm3-on-MCU CI and 5 Wave-3 verticals (automotive/medical/agriculture/logistics/smart-grid) are future work.
LLM → IoT Pipeline
The complete pipeline from natural language prompt to WASM artifact on hardware:
Natural language prompt
│
▼ LLM (constrained by synoema-iot-rules.gbnf)
Synoema IoT rule (.sno)
│
├── sno check → parse + typecheck → PASS / FAIL + structured errors
│
└── sno wasm → WASM v3 codegen → .wasm artifact
│
├── Tier 0: C host + wasm3 embed (MCU)
├── Tier 1: on-device wasm3 (RTOS/ESP32)
└── sno build --native (Linux edge ELF)
--target aarch64-linux
GBNF-constrained generation
The LLM generates IoT rules constrained by lang/tools/constrained/synoema-iot-rules.gbnf — a GBNF grammar that limits output to valid Synoema rule syntax. This eliminates hallucinated syntax and reduces parse failures on first generation.
# Rule example generated within the GBNF constraint
rule_fan_control : Int -> Bool
requires temp > -50
ensures result == (temp > 30)
rule_fan_control temp = temp > 30
Two LLM backends
| Backend | Flag | Use case |
|---|---|---|
| Anthropic API (claude-opus-4-7) | default | Development, CI, highest quality |
| Local fine-tuned (Qwen2.5-Coder-3B via Ollama) | --model ollama:iot-rules-3b | Offline, air-gapped, low-latency |
| Mock (CI) | --mock | Hermetic tests, no API key required |
Reference pipeline: cloud_compile.py
# Cloud path (requires ANTHROPIC_API_KEY)
python3 lang/tools/llm/cloud_compile.py \
--prompt "turn on fan when temperature exceeds 30°C" \
--target rpi
# Mock mode — fully hermetic, no API key
python3 lang/tools/llm/cloud_compile.py \
--mock \
--prompt "turn on fan when temp > 30" \
--target rpi
# Industrial rules with contracts
python3 lang/tools/llm/cloud_compile.py \
--prompt "shut off pump when pressure > 800 PSI" \
--vertical industrial \
--target stm32
Wave-2 Training Corpus
1,177 unique (prompt, rule) pairs across 5 verticals × 7 rule kinds × 3 difficulty tiers:
Rule kinds: threshold / hysteresis / counter / timer /
interlock / safety / pattern-match
Verticals: home / industrial / wearable / automotive / agriculture
Difficulty: simple / medium / hard
Splits: 946 train / 104 val / 127 test (hash-mod-10 deterministic)
Token stats: mean 89.5 / median 87 / p95 139 (cl100k_base)
Training script: research/finetune/train_iot_rules_small.py (Qwen2.5-Coder-3B, QLoRA 4-bit, AMD RX 7900 GRE + unsloth + ROCm).
How MCP + RAG + IoT Fit Together
The three layers are independent but designed to compose:
Claude Desktop / Cursor / custom agent
│
│ MCP 2024-11-05 (stdio JSON-RPC)
│
┌──────▼──────────────────────────────────────────────┐
│ synoema-mcp │
│ │
│ eval / typecheck / run / feedback_loop │
│ │ │ │
│ │ auto_inject │ search_* tools │
│ │ middleware │ (when agent calls them) │
│ │ │ │ │
│ │ └────┬────┘ │
│ │ │ │
│ │ ┌──────▼──────────────────┐ │
│ │ │ RAG index │ │
│ │ │ ~/.sno/models/embed/ │ │
│ │ │ jina-code-v2 │ │
│ │ │ 5 scopes, ~13k chunks │ │
│ │ └─────────────────────────┘ │
└───────┼──────────────────────────────────────────────┘
│
│ sno check / sno wasm / sno build --native
│
┌───────▼────────────────────────────────────────────┐
│ IoT Compilation Pipeline │
│ │
│ .sno rule → WASM v3 codegen → .wasm artifact │
│ │ │
│ ┌──────────┴──┐ │
│ Tier 0 Tier 1/2 │
│ wasm3 MCU Linux ELF │
└────────────────────────────────────────────────────┘
Typical LLM agent workflow
- Agent opens session — MCP server creates a ULID session, warms the AST cache.
- Agent calls
get_context— receives phase-appropriate reference docs (≤1800 tok). - Agent generates a rule — calls
typecheckwith the draft. If there is an error, the auto-inject middleware attaches relevantsearch_traceshits automatically. - Agent calls
run— executes the rule to verify output. If it fails again, the ReAct loop (sno fix --with-rag) activates and searches the corpus for idiomatic fixes. - Agent compiles to WASM — calls
sno wasm rule.snovia theruntool or the CLI. Gets a.wasmartifact ready for the target tier. - Agent deploys — via
cloud_compile.pyfor the full pipeline, or manually for custom hardware.
Key measured numbers
MCP
LRU-500 AST cache • 50-turn transcript • 20+ tools • 50–180 ms faster per call vs CLI
RAG
~13k chunks • 5 scopes • ~50 ms / query • jina-code-v2 int8 (~160 MB)
IoT
30/30 sno check • 29/30 WASM • 200 B mean artifact • 6 verticals • 3 tiers
Further reading
- IoT Platform page — hero section, tier diagrams, vertical demos
- Package Manager —
sno-httpand the package registry - GitHub repository — source for MCP server, RAG crate, IoT examples
- Language Reference — syntax, types, builtins, stdlib
- Zero to 41% — fine-tuning a 3B model from scratch