Bernstein — Declarative Agent Orchestration

Who it's for

Built for developers who want the work done, not the process managed

Bernstein handles parallelism, model selection, retries, and verification. You write the goal.

Solo developers

Queue up a backlog, go to sleep, wake up to passing tests and a clean git history. Bernstein runs until the queue is empty or your budget runs out.

Small teams

Multiple agents working in parallel on isolated tasks, each with its own role, model, and verification signal. No manual coordination needed.

Platform engineers

Run bernstein --evolve and watch it analyse metrics, generate improvement proposals, validate them in a sandbox, and apply safe ones automatically.

Enterprise teams

Use Claude, Codex, Gemini, or Qwen — mix models in the same run. The orchestrator is deterministic Python, not an LLM. Switch providers without rewriting anything.

How it works

Short-lived agents, file-based state, deterministic orchestration

Agents aren't long-lived workers. They spawn fresh, do 1-3 tasks, and exit. No idle loops, no context rot. All state lives in .sdd/ files — git-native, inspectable, reproducible.

Describe a goal Pass a plain-English goal or a bernstein.yaml config. Multi-repo workspaces supported.
Manager decomposes An Opus manager breaks the goal into typed tasks with roles, scopes, dependencies, and completion signals.
Router selects models Each task gets the right model and effort level. Opus for architecture, Sonnet for implementation, free-tier for simple fixes. Cost-aware by default.
Agents work in parallel Fresh CLI agents spawn, work in isolated git worktrees, and exit. Dependency-blocked tasks wait automatically.
Janitor verifies, loop continues Before marking done, the janitor checks completion signals: file exists, tests pass, API responds. Failed tasks get retried or escalated.

architecture

bernstein run │ ▼ Task Server (FastAPI :8052) │ ├── POST /tasks create task ├── POST /tasks/{id}/complete ├── GET /status dashboard data └── POST /bulletin agent messages │ ▼ Orchestrator (deterministic Python) │ ├── Model router opus / sonnet / haiku ├── CLI adapter claude / codex / gemini / qwen ├── Multi-cell parallel agent groups └── Evolution self-improvement loop │ ▼ Janitor (verification + metrics)

Features

Everything a multi-agent system needs

No hand-waving. Each feature is a concrete, observable behaviour you can inspect in code.

Orchestration

Parallel agents

Multiple agents run simultaneously on independent tasks in isolated git worktrees. Dependencies resolve automatically — no manual coordination.

Smart model routing

Opus for architecture and security. Sonnet for implementation. Free-tier for formatting. The router picks based on task scope, complexity, and role — not a flat config.

Multi-cell orchestration

Organise agents into cells — self-contained teams with a manager and workers. A VP agent coordinates across cells for large-scale projects.

Reliability

Built-in verification

Tasks define completion signals (path_exists, test_passes, api_responds). The janitor verifies them — agents can't self-report done.

File-based state

All state lives in .sdd/ files. Git-native, inspectable, reproducible. No database required for single-instance. State survives crashes and restarts.

Cost budgeting

Set a dollar cap with --budget 5.00. The tracker warns at 80%, alerts at 95%, and hard-stops agents at 100%. No more runaway API bills.

Extensibility

Any CLI agent

Claude Code, Codex, Gemini CLI, Qwen — all supported via thin adapters. The orchestrator is model-agnostic. Mix models in the same run. No vendor lock-in.

Agent catalogs

Load specialist agents from YAML/Markdown catalogs. Sync from remote marketplaces. Each agent brings its own role prompt, model preferences, and capabilities.

MCP tool access

Agents can use MCP servers (stdio/SSE) for external tools — GitHub, filesystem, web search. The orchestrator manages MCP server lifecycle automatically.

Governance & Compliance

HMAC-chained audit log

Every orchestrator decision is logged in a tamper-evident, HMAC-chained JSONL file. Daily rotation, bernstein audit verify-hmac validates chain integrity. SOC2 Type II ready.

Execution WAL

Hash-chained write-ahead log for crash recovery and determinism proof. bernstein verify --determinism produces an execution fingerprint proving two runs made identical decisions.

CI autofix pipeline

bernstein ci fix <url> parses a failing GitHub Actions run, creates a fix task, and opens a PR. bernstein ci watch monitors continuously and auto-fixes.

Operations

Self-evolution

Run --evolve and Bernstein analyses its own metrics, generates improvements via a visionary/analyst pipeline, validates in a sandbox, and applies safe ones automatically.

Live dashboard

Real-time TUI shows agent status, task progress, activity feed, sparkline, and cost tracking. Chat input lets you add tasks mid-run without leaving the terminal. See screenshots ↑

Comparative benchmarks

bernstein benchmark compare runs the same tasks single-agent vs orchestrated and produces a markdown report with wall time, cost, success rate, and token usage.

Comparison

How Bernstein differs

Purpose-built for CLI coding agents and file-based codebases — not general LLM pipelines or locked-in SDKs.

Capability	Bernstein	CrewAI	Ruflo	AutoGen	LangGraph
Model-agnostic (any CLI agent)	✓	—	—	Partial	—
Short-lived agents (no idle loop)	✓	—	—	—	—
File-based state (.sdd/)	✓	—	—	—	—
Self-evolution loop	✓	—	✓	—	—
Completion signal verification	✓	—	—	—	—
HMAC-chained audit trail	✓	—	—	—	—
Execution WAL + determinism proof	✓	—	—	—	—
CI autofix (detect failure → fix → PR)	✓	—	—	—	—
Deterministic orchestrator (not LLM)	✓	—	—	—	✓
REST task server API	✓	—	—	—	—
Agent marketplace / catalog	✓	Partial	—	—	—
A2A protocol (agent federation)	✓	—	—	—	—
Per-run cost budgeting	✓	—	—	—	—
GitHub Action (CI integration)	✓	—	—	—	—
No vendor lock-in	✓	✓	—	✓	✓

Declarative agent orchestration. Ship while you sleep.