Describe a goal. Bernstein decomposes it, spawns short-lived coding agents in parallel, verifies the output, and commits the results. Works with Claude Code, Codex, Gemini CLI, Qwen — or any future CLI agent.
Think of it as what Kubernetes did for containers, but for AI coding agents. You declare a goal. The control plane decomposes it into tasks. Short-lived agents execute them in isolated worktrees — like pods. A janitor verifies the output before anything lands.
Bernstein handles parallelism, model selection, retries, and verification. You write the goal.
Queue up a backlog, go to sleep, wake up to passing tests and a clean git history. Bernstein runs until the queue is empty or your budget runs out.
Multiple agents working in parallel on isolated tasks, each with its own role, model, and verification signal. No manual coordination needed.
Run bernstein --evolve and watch it analyse metrics, generate improvement proposals, validate them in a sandbox, and apply safe ones automatically.
Use Claude, Codex, Gemini, or Qwen — mix models in the same run. The orchestrator is deterministic Python, not an LLM. Switch providers without rewriting anything.
Agents aren't long-lived workers. They spawn fresh, do 1-3 tasks, and exit. No idle loops, no context rot. All state lives in .sdd/ files — git-native, inspectable, reproducible.
bernstein.yaml config. Multi-repo workspaces supported.
No hand-waving. Each feature is a concrete, observable behaviour you can inspect in code.
Multiple agents run simultaneously on independent tasks in isolated git worktrees. Dependencies resolve automatically — no manual coordination.
Opus for architecture and security. Sonnet for implementation. Free-tier for formatting. The router picks based on task scope, complexity, and role — not a flat config.
Organise agents into cells — self-contained teams with a manager and workers. A VP agent coordinates across cells for large-scale projects.
Tasks define completion signals (path_exists, test_passes, api_responds). The janitor verifies them — agents can't self-report done.
All state lives in .sdd/ files. Git-native, inspectable, reproducible. No database required for single-instance. State survives crashes and restarts.
Set a dollar cap with --budget 5.00. The tracker warns at 80%, alerts at 95%, and hard-stops agents at 100%. No more runaway API bills.
Claude Code, Codex, Gemini CLI, Qwen — all supported via thin adapters. The orchestrator is model-agnostic. Mix models in the same run. No vendor lock-in.
Load specialist agents from YAML/Markdown catalogs. Sync from remote marketplaces. Each agent brings its own role prompt, model preferences, and capabilities.
Agents can use MCP servers (stdio/SSE) for external tools — GitHub, filesystem, web search. The orchestrator manages MCP server lifecycle automatically.
Every orchestrator decision is logged in a tamper-evident, HMAC-chained JSONL file. Daily rotation, bernstein audit verify-hmac validates chain integrity. SOC2 Type II ready.
Hash-chained write-ahead log for crash recovery and determinism proof. bernstein verify --determinism produces an execution fingerprint proving two runs made identical decisions.
bernstein ci fix <url> parses a failing GitHub Actions run, creates a fix task, and opens a PR. bernstein ci watch monitors continuously and auto-fixes.
Run --evolve and Bernstein analyses its own metrics, generates improvements via a visionary/analyst pipeline, validates in a sandbox, and applies safe ones automatically.
Real-time TUI shows agent status, task progress, activity feed, sparkline, and cost tracking. Chat input lets you add tasks mid-run without leaving the terminal. See screenshots ↑
bernstein benchmark compare runs the same tasks single-agent vs orchestrated and produces a markdown report with wall time, cost, success rate, and token usage.
Purpose-built for CLI coding agents and file-based codebases — not general LLM pipelines or locked-in SDKs.
| Capability | Bernstein | CrewAI | Ruflo | AutoGen | LangGraph |
|---|---|---|---|---|---|
| Model-agnostic (any CLI agent) | ✓ | — | — | Partial | — |
| Short-lived agents (no idle loop) | ✓ | — | — | — | — |
| File-based state (.sdd/) | ✓ | — | — | — | — |
| Self-evolution loop | ✓ | — | ✓ | — | — |
| Completion signal verification | ✓ | — | — | — | — |
| HMAC-chained audit trail | ✓ | — | — | — | — |
| Execution WAL + determinism proof | ✓ | — | — | — | — |
| CI autofix (detect failure → fix → PR) | ✓ | — | — | — | — |
| Deterministic orchestrator (not LLM) | ✓ | — | — | — | ✓ |
| REST task server API | ✓ | — | — | — | — |
| Agent marketplace / catalog | ✓ | Partial | — | — | — |
| A2A protocol (agent federation) | ✓ | — | — | — | — |
| Per-run cost budgeting | ✓ | — | — | — | — |
| GitHub Action (CI integration) | ✓ | — | — | — | — |
| No vendor lock-in | ✓ | ✓ | — | ✓ | ✓ |
# Install (Python 3.12+)
pipx install bernstein # or: uv tool install bernstein
# Run with a plain-English goal
bernstein -g "Add pagination to the users API endpoint"
# Or run in self-evolution mode
bernstein run --evolve --max-cycles 10
Requires a CLI agent installed (Claude Code, Codex, Gemini CLI, or Qwen). Bernstein auto-detects what's available.
Open-source, Apache 2.0. Works with the CLI agents you already have.