Multi-Agent Orchestration
From prompt to production — per-job containers, weak/strong tier execution, contracts-first verification
Overview
The Synoema orchestrator is a distributed agent fleet daemon. A human submits a single job — free-form task text plus an acceptance.toml — and the daemon drives the entire lifecycle to a verified deploy. It plans the work into a DAG of subtasks, runs each subtask in an isolated Docker container with its own short-lived virtual API key, watches the container logs for failures, classifies them, fixes the trivial ones, escalates the rest, captures a pre-deploy git snapshot so it can roll back if a fresh deploy starts crashing, and gates the whole pipeline on optional human approvals.
LLM coding agents are individually capable but operationally unsupervised. Running ten of them in parallel on the same workstation is a recipe for: leaked credentials, runaway cost, container OOM, regressions deployed to production, and an inbox full of "I'm not sure" half-answers. The orchestrator is the substrate that turns "ten agents in parallel" into a system you can leave running overnight.
Three load-bearing constraints define the design philosophy:
- One job per container. No multiplexing. Each subtask gets its own ephemeral filesystem, its own short-lived virtual API key, its own resource caps. Multiplexing is achieved by spawning more containers, never by sharing one.
- Weak / strong model tiering. Planning, review, and goal-drift detection use a "strong" model (Claude Opus / GPT-4 class); implementation, tests, and docs use a "weak" model (Sonnet / GPT-4-mini / Haiku class). Configured per-tier in
config::ModelRoster, weighted-sampled at dispatch. - Contracts-first verification at every boundary. Every transition is gated: budget check before the LLM call, type/contract check before write, drift score before deploy, snapshot before deploy, approval before deploy when configured, log classifier after deploy, automatic rollback if a Critical event fires within the post-deploy window. The orchestrator does not believe an agent's "done" — it verifies it.
What the orchestrator is not. Not a CI server (no GitHub triggers, no PR comments). Not a Kubernetes scheduler (Docker subprocess driver, no orchestration of long-running services). Not a model router (delegates to LiteLLM via synoema-cred-broker). Not the agent itself — that lives in synoema-agent and runs inside the container.
The execution loop
One tokio task ticks every 10 seconds, BFS-walks queued subtasks, and dispatches anything whose dependencies are satisfied. Dispatch hands the subtask to an injectable ExecFn; in production this is default_docker_exec, which spawns a docker run, streams output via docker logs --follow, and writes a per-subtask cost report when the container exits. Status changes broadcast over an SSE channel so the dashboard updates in real time.
sno orchestrator submit
│
▼
┌──────────────────────────────────────────┐
│ HTTP server (axum) │
│ POST /jobs → row in jobs table │
│ optional Socratic clarification │
└────────────────┬─────────────────────────┘
▼
┌──────────────────────────────────────────┐
│ Planner (planner.rs) │
│ strong-tier LLM call │
│ fallback: dag::plan canned 4-subtask │
│ (plan → impl ‖ tests → review) │
└────────────────┬─────────────────────────┘
▼
┌──────────────────────────────────────────┐
│ ExecutorLoop (executor.rs, every 10s) │
│ BFS-walk DAG → ready set │
│ for each ready subtask: │
│ BudgetTracker.check │
│ cred-broker.issue_key │
│ ExecFn.spawn(docker run) │
│ LimitsPoller.attach │
└────────────────┬─────────────────────────┘
▼
┌──────────────────────────────────────────┐
│ Container (one per subtask) │
│ synoema-agent runs inside │
│ writes telemetry JSONL → stdout │
└────────────────┬─────────────────────────┘
▼
┌──────────────────────────────────────────┐
│ log_watcher (3-tier classifier) │
│ Tier 1: regex patterns │
│ Tier 2: known-bug-signatures table │
│ Tier 3: LLM stub │
│ → auto_fix or escalation │
└────────────────┬─────────────────────────┘
▼
┌──────────────────────────────────────────┐
│ Drift gate + snapshot + deploy │
│ drift_score < 5 blocks deploy │
│ pre-deploy git SHA → snapshots table │
│ webhook → host-side deploy daemon │
│ rollback on Critical (3/day cap) │
└────────────────┬─────────────────────────┘
▼
┌──────────────────────────────────────────┐
│ Cost report → SSE broadcast │
│ audit row → approvals / rollback / │
│ deploy / escalations tables │
└──────────────────────────────────────────┘
Safety nets
Six independent gates catch the failure modes that bite multi-agent fleets in practice. Each one is a separate module, separately testable, and writes its own audit row.
Budget caps (budget.rs)
Three independent caps enforced in front of every LLM call: daily USD across the whole daemon, per-job USD for the active job, concurrent semaphore on the number of in-flight container dispatches. Exceeding any cap returns a structured BudgetExceeded error and never spawns the container.
Approval gates (container.rs, approvals table)
Every container has an ApprovalMode field: Auto (no gate), Ask (writes pending row, waits for human), Block (always rejected, human must edit). Consulted by deploy_container_handler at the deploy gate and by auto_fix::decide at the auto-fix gate. Every decision is recorded in the approvals audit table with the decider, timestamp, and reason.
Snapshots and rollback (rollback.rs, snapshots table)
Every deploy captures the pre-deploy git SHA into the snapshots table before the webhook fires. If log_watcher classifies a post-deploy event as Critical within the post-deploy observation window, the orchestrator rolls back to the captured SHA and re-deploys. Capped at 3 automatic rollbacks per day to prevent flap loops.
Drift block (drift.rs)
The original prompt is hashed at submit. After the DAG completes, a strong-tier reviewer compares the resulting diff against the original prompt and emits a drift score. A score below 5 (out of 10) blocks the deploy and queues an escalation. Catches the case where an agent has technically passed every contract but quietly delivered something unrelated to what was asked.
Integration tests (integration_test.rs)
The [test.*] sections of the submitted acceptance.toml are parsed and enqueued as [INT-TEST] subtasks after the main DAG completes. They run in their own ephemeral containers with the deploy artifact mounted read-only. Failure blocks deploy; pass writes to the integration_tests table.
Log classifier (log_watcher.rs)
3-tier classifier reads container stdout/stderr line-by-line. Tier 1 is hand-rolled regex against known panic / OOM / segfault / TLS-handshake-failure signatures — matches in microseconds. Tier 2 consults the known_bug_signatures table for project-specific patterns observed in past runs. Tier 3 is an LLM stub for unknown patterns. (Minor, Trivial) events become [AUTO-FIX] subtasks; everything else becomes an inbox row.
Subsystem map
Nine logical subsystems. Each maps to one or more modules under lang/crates/synoema-orchestrator/src/.
| Subsystem | Modules | Role |
|---|---|---|
| HTTP server + state | server.rs, router.rs | axum listener, ~40 routes, SSE channel, model roster, scheduled tasks registry, long-lived background tasks. |
| Persistence | db.rs, error.rs | OrchestratorDb wraps a rusqlite::Connection behind Arc<Mutex<…>>. 19 tables on first open, additive ALTER TABLE migrations on reopen. |
| Job + DAG model | job.rs, subtask.rs, dag.rs, planner.rs, session.rs | Job/Subtask FSM. dag::plan canned fallback (plan → impl ‖ tests → review). planner::call_llm_planner calls strong-tier LLM for richer DAGs. |
| Container management | container.rs, vault.rs, limits.rs, docker.rs | ContainerConfig (disk, env, deploy_mode, limits, approval_mode). Vault = AES-256-GCM keyed off SNO_ORCH_VAULT_KEY. LimitsPoller polls docker stats, kills containers exceeding caps. |
| Model roster + credentials | config.rs, cred.rs | ModelRoster::strong/weak weighted-sample. cred.rs shims synoema-cred-broker for per-job virtual keys against LiteLLM. |
| Execution + scheduling | executor.rs, poller.rs, scheduled.rs, gc.rs | ExecutorLoop ticks every 10s. scheduled.rs runs cron-driven builtin tasks (nightly log audit, weekly GC, daily report). gc.rs retention-prunes old job rows. |
| Deployment + safety | deploy.rs, rollback.rs | dispatch_deploy POSTs to webhook (host-side sno orchestrator deploy-hook daemon does rsync or docker pull && docker restart). rollback.rs auto-rolls-back on Critical events with 3/day cap. |
| Observability + classification | log_watcher.rs, topology.rs, telemetry/, aggregator/ | 3-tier log classifier. topology.rs polls docker ps every 60s and renders an SVG graph. telemetry/ ingests JSONL counters; aggregator/ rolls them up daily. |
| Quality + decision gates | budget.rs, drift.rs, socratic.rs, auto_fix.rs, integration_test.rs, verifier.rs, escalation.rs | Three budget caps. Drift score < 5 blocks deploy. Socratic detects uncertainty markers. Integration tests gate deploy. escalation.rs is the kind-tagged inbox shared by all gates. |
Available CLI commands
Daemon and client live in the same binary. Run from any directory once sno is on PATH.
sno orchestrator start [--port 7777] [--detach]
Start the daemon. Writes ~/.sno/orchestrator.pid; --detach forks
and re-execs as a background process.
sno orchestrator stop
Graceful daemon shutdown via POST /shutdown. Drains in-flight
subtasks, then exits.
sno orchestrator status [--daemon | --job <id>] [--json]
Daemon-level health (default) or per-job status snapshot.
sno orchestrator submit
--task <text> --acceptance <toml>
[--policy <toml>] [--budget '$X,Ymin,Zturns'] [--workspace <path>]
Submit a job; the rest is autonomous.
sno orchestrator logs <job-id> [--follow]
Stream job logs over SSE.
sno orchestrator cancel <job-id>
Async cancel — graceful drain then docker stop.
sno orchestrator inbox [--json]
List pending escalations awaiting human resolution.
sno orchestrator resolve <job-id> <subtask-id>
--pick <option> | --abort
Resolve an escalation row.
sno orchestrator review <job-id>
Print the final acceptance report (verifier output).
sno orchestrator metrics
[--aggregate | --by-profile | --by-model | --by-day]
[--drift-map | --doc-coherence | --export <path>]
Telemetry-hub dashboards over the aggregator tables.
Status
Beta. 17 modules, ~7000 LOC, 287 unit tests + 14 integration tests = 301 tests, 0 warnings. SQLite store with 19 tables — including dedicated audit tables for approvals, rollback, deploy, and the escalations inbox.
What is production-ready today: dispatch loop, budget caps, approval gates, snapshots and rollback, drift gate, integration tests, Tier-1 + Tier-2 log classifier, HTTP API, CLI, telemetry aggregation.
What is stubbed (and the architecture doc is explicit about it): Tier-3 LLM log classifier, planner LLM call (falls back to a canned 4-subtask DAG), reviewer LLM agents. These are seams the project will move; they don't block the gates above.
Read the full architecture document
This page is the introduction. The full reference — data flow walk-throughs, state machines, full SQL schema, concurrency model, security model, extension points — lives in the Synoema repo:
- docs/architecture/orchestrator.md on GitHub — 705 lines, 9 sections.
- CLI Reference — the rest of the
snocommand surface. - Architecture — the compilation pipeline the orchestrator dispatches against.
- AI Agent — the per-container worker (
synoema-agent) the orchestrator runs inside each Docker container.