Declarative agent orchestration. Ship while you sleep.

Describe a goal. Bernstein decomposes it, spawns short-lived coding agents in parallel, verifies the output, and commits the results. Works with Claude Code, Codex, Gemini CLI, Qwen — or any future CLI agent.

Think of it as what Kubernetes did for containers, but for AI coding agents. You declare a goal. The control plane decomposes it into tasks. Short-lived agents execute them in isolated worktrees — like pods. A janitor verifies the output before anything lands.

Built for developers who want the work done, not the process managed

Bernstein handles parallelism, model selection, retries, and verification. You write the goal.

Solo developers

Queue up a backlog, go to sleep, wake up to passing tests and a clean git history. Bernstein runs until the queue is empty or your budget runs out.

Small teams

Multiple agents working in parallel on isolated tasks, each with its own role, model, and verification signal. No manual coordination needed.

Platform engineers

Run bernstein --evolve and watch it analyse metrics, generate improvement proposals, validate them in a sandbox, and apply safe ones automatically.

Enterprise teams

Use Claude, Codex, Gemini, or Qwen — mix models in the same run. The orchestrator is deterministic Python, not an LLM. Switch providers without rewriting anything.

Short-lived agents, file-based state, deterministic orchestration

Agents aren't long-lived workers. They spawn fresh, do 1-3 tasks, and exit. No idle loops, no context rot. All state lives in .sdd/ files — git-native, inspectable, reproducible.

  1. Describe a goal Pass a plain-English goal or a bernstein.yaml config. Multi-repo workspaces supported.
  2. Manager decomposes An Opus manager breaks the goal into typed tasks with roles, scopes, dependencies, and completion signals.
  3. Router selects models Each task gets the right model and effort level. Opus for architecture, Sonnet for implementation, free-tier for simple fixes. Cost-aware by default.
  4. Agents work in parallel Fresh CLI agents spawn, work in isolated git worktrees, and exit. Dependency-blocked tasks wait automatically.
  5. Janitor verifies, loop continues Before marking done, the janitor checks completion signals: file exists, tests pass, API responds. Failed tasks get retried or escalated.
architecture
bernstein run Task Server (FastAPI :8052) ├── POST /tasks create task ├── POST /tasks/{id}/complete ├── GET /status dashboard data └── POST /bulletin agent messages Orchestrator (deterministic Python) ├── Model router opus / sonnet / haiku ├── CLI adapter claude / codex / gemini / qwen ├── Multi-cell parallel agent groups └── Evolution self-improvement loop Janitor (verification + metrics)

Everything a multi-agent system needs

No hand-waving. Each feature is a concrete, observable behaviour you can inspect in code.

Orchestration

Parallel agents

Multiple agents run simultaneously on independent tasks in isolated git worktrees. Dependencies resolve automatically — no manual coordination.

Smart model routing

Opus for architecture and security. Sonnet for implementation. Free-tier for formatting. The router picks based on task scope, complexity, and role — not a flat config.

Multi-cell orchestration

Organise agents into cells — self-contained teams with a manager and workers. A VP agent coordinates across cells for large-scale projects.

Reliability

Built-in verification

Tasks define completion signals (path_exists, test_passes, api_responds). The janitor verifies them — agents can't self-report done.

File-based state

All state lives in .sdd/ files. Git-native, inspectable, reproducible. No database required for single-instance. State survives crashes and restarts.

Cost budgeting

Set a dollar cap with --budget 5.00. The tracker warns at 80%, alerts at 95%, and hard-stops agents at 100%. No more runaway API bills.

Extensibility

Any CLI agent

Claude Code, Codex, Gemini CLI, Qwen — all supported via thin adapters. The orchestrator is model-agnostic. Mix models in the same run. No vendor lock-in.

Agent catalogs

Load specialist agents from YAML/Markdown catalogs. Sync from remote marketplaces. Each agent brings its own role prompt, model preferences, and capabilities.

MCP tool access

Agents can use MCP servers (stdio/SSE) for external tools — GitHub, filesystem, web search. The orchestrator manages MCP server lifecycle automatically.

Governance & Compliance

HMAC-chained audit log

Every orchestrator decision is logged in a tamper-evident, HMAC-chained JSONL file. Daily rotation, bernstein audit verify-hmac validates chain integrity. SOC2 Type II ready.

Execution WAL

Hash-chained write-ahead log for crash recovery and determinism proof. bernstein verify --determinism produces an execution fingerprint proving two runs made identical decisions.

CI autofix pipeline

bernstein ci fix <url> parses a failing GitHub Actions run, creates a fix task, and opens a PR. bernstein ci watch monitors continuously and auto-fixes.

Operations

Self-evolution

Run --evolve and Bernstein analyses its own metrics, generates improvements via a visionary/analyst pipeline, validates in a sandbox, and applies safe ones automatically.

Live dashboard

Real-time TUI shows agent status, task progress, activity feed, sparkline, and cost tracking. Chat input lets you add tasks mid-run without leaving the terminal. See screenshots ↑

Comparative benchmarks

bernstein benchmark compare runs the same tasks single-agent vs orchestrated and produces a markdown report with wall time, cost, success rate, and token usage.

How Bernstein differs

Purpose-built for CLI coding agents and file-based codebases — not general LLM pipelines or locked-in SDKs.

Capability Bernstein CrewAI Ruflo AutoGen LangGraph
Model-agnostic (any CLI agent) Partial
Short-lived agents (no idle loop)
File-based state (.sdd/)
Self-evolution loop
Completion signal verification
HMAC-chained audit trail
Execution WAL + determinism proof
CI autofix (detect failure → fix → PR)
Deterministic orchestrator (not LLM)
REST task server API
Agent marketplace / catalog Partial
A2A protocol (agent federation)
Per-run cost budgeting
GitHub Action (CI integration)
No vendor lock-in

Up in two commands

# Install (Python 3.12+)
pipx install bernstein   # or: uv tool install bernstein

# Run with a plain-English goal
bernstein -g "Add pagination to the users API endpoint"

# Or run in self-evolution mode
bernstein run --evolve --max-cycles 10

Requires a CLI agent installed (Claude Code, Codex, Gemini CLI, or Qwen). Bernstein auto-detects what's available.

Get started

Open-source, Apache 2.0. Works with the CLI agents you already have.