Installation
Requires Python 3.12+ and at least one CLI coding agent (Claude Code, Codex CLI, Cursor, Gemini CLI, or Qwen).
# Install from PyPI
pipx install bernstein # or: uv tool install bernstein
# Verify
bernstein --version
Bernstein itself is pure Python. The CLI agents it spawns (Claude Code, Codex, etc.) must be installed and authenticated separately. Bernstein auto-detects which agents are available.
Development install
If you want to contribute or develop Bernstein itself:
git clone https://github.com/chernistry/bernstein && cd bernstein
uv venv && uv pip install -e ".[dev]"
uv run python scripts/run_tests.py -x
First run
The fastest path — pass a goal inline:
cd my-project
bernstein init
bernstein -g "Add pagination to the users API endpoint"
Bernstein will:
- Start a task server on
localhost:8052 - Inject an initial manager task with your goal
- Spawn a manager agent (Claude Code, Opus, max effort)
- The manager decomposes the goal and queues specialist tasks
- The spawner picks up each task, routes model/effort, and launches agents
- Agents work, commit, report results, and exit
- The system exits when the queue is empty
Init & workspace
Run bernstein init once per project. It creates .sdd/ — the file-based state directory:
.sdd/
backlog/
open/ # task files waiting to be claimed
closed/ # completed and failed task records
agents/ # per-agent heartbeat and state files
runtime/ # PID files, server.log, spawner.log
knowledge/ # project notes injected as context
decisions/ # architecture decision records
config.yaml # port, model defaults, worker limits
Nothing leaves your project directory. .sdd/ is designed to be committed — it's your team's shared memory.
Seed file
For repeatable orchestration sessions, use a bernstein.yaml seed file. Bernstein picks it up automatically if it exists in the project root.
goal: "Build a legal RAG system with hybrid retrieval and typed answers"
tasks:
- title: "Implement vector store"
role: backend
priority: 1
scope: medium
complexity: medium
- title: "Add BM25 sparse index"
role: backend
priority: 2
scope: small
complexity: low
depends_on: ["TSK-001"]
- title: "Write integration tests"
role: qa
priority: 2
scope: medium
complexity: medium
depends_on: ["TSK-001", "TSK-002"]
# Auto-detects bernstein.yaml
bernstein
# Or point to a specific file
bernstein --seed-file path/to/bernstein.yaml
Configuration
.sdd/config.yaml is created by bernstein init with sensible defaults:
| Key | Default | Description |
|---|---|---|
server_port | 8052 | Task server port |
max_workers | 6 | Max simultaneous worker agents |
default_model | sonnet | Default model for workers |
default_effort | high | Default effort level |
Model routing
The spawner selects model and effort automatically based on task metadata — you don't usually need to touch this:
| Task property | Model | Effort |
|---|---|---|
role=manager | opus | max |
role=security | opus | max |
scope=large, complexity=high | opus | high |
complexity=medium | sonnet | high |
complexity=low | sonnet | normal |
Monitoring
Bernstein starts with a live TUI dashboard by default. You can also attach to a running session or query the server directly:
# Attach to running dashboard
bernstein live
# View recent agent logs
bernstein logs
# Dashboard summary via API
curl http://127.0.0.1:8052/status
# All tasks
curl http://127.0.0.1:8052/tasks
# Filter by status
curl "http://127.0.0.1:8052/tasks?status=open"
Inject tasks at runtime
The task server accepts new tasks while it's running:
curl -s -X POST http://127.0.0.1:8052/tasks \
-H "Content-Type: application/json" \
-d '{
"title": "Add rate limiting to the auth endpoint",
"role": "backend",
"description": "Implement token-bucket rate limiting, 100 req/min per IP.",
"priority": 1,
"scope": "small",
"complexity": "low"
}'
Stopping
# Graceful stop (waits up to 10s for in-flight agents)
bernstein stop
# Shorter timeout
bernstein stop --timeout 3
Bernstein sends a shutdown signal to the task server, waits for in-flight agents to finish, then cleans up PID files and exits.
Headless mode
Useful for overnight runs, CI pipelines, or remote servers:
bernstein --headless -g "Refactor the auth module"
bernstein --headless --seed-file bernstein.yaml
Output goes to .sdd/runtime/ logs instead of the terminal.
Evolve mode
Run Bernstein against itself for continuous self-improvement:
# Run indefinitely
bernstein --evolve
# Stop after 10 cycles or $5 spent
bernstein --evolve --max-cycles 10 --budget 5
# Unattended overnight run
bernstein --evolve --headless --max-cycles 20 --budget 10
| Flag | Default | Description |
|---|---|---|
--evolve | — | Enable continuous self-improvement mode |
--max-cycles N | 0 (unlimited) | Stop after N evolve cycles |
--budget N | 0 (unlimited) | Stop after $N spent |
--interval N | 300 | Seconds between cycles |
Risk-stratified: L0 (config tweaks) auto-apply, L1 (templates) sandbox-first, L2 (logic) requires PR + review, L3 (core) human-only. Critical files are SHA-locked.
# Review pending proposals
bernstein evolve review
# Approve a specific proposal
bernstein evolve approve <PROPOSAL_ID>
Cost tracking
Monitor API spend in real-time:
# Show cost breakdown
bernstein cost
# Set a budget cap (stops execution when reached)
bernstein run --budget 5.00 -g "Add auth"
The cost engine tracks spend per model, per role, and per task. A bandit-based model selector learns the cheapest model that meets quality thresholds.
CLI reference
| Command | Description |
|---|---|
bernstein init | Initialize .sdd/ directory in current project |
bernstein run / -g | Start orchestration with a goal |
bernstein stop | Gracefully stop all agents and server |
bernstein status | Show current task/agent status |
bernstein live | Attach to running TUI dashboard |
bernstein logs | View recent agent logs |
bernstein cost | Show spend breakdown by model, role, task |
bernstein plan | Decompose a goal into tasks without running |
bernstein demo | Run a quick demo with a temp project |
bernstein ideate | Run creative visionary/analyst pipeline |
bernstein retro | Generate retrospective from last run |
bernstein agents list | List available agent roles |
bernstein agents sync | Sync agent catalog from remote registries |
bernstein benchmark | Run evaluation harness |
bernstein doctor | Pre-flight check: adapters, API keys, ports |
bernstein recap | Post-run summary: tasks, pass/fail, cost |
bernstein trace <ID> | Step-by-step agent decision trace |
bernstein replay <ID> | Re-run a task from its trace |
bernstein dashboard | Open web dashboard in browser |
bernstein ps | Show running agent processes |
bernstein add-task | Add a task to the backlog interactively |
bernstein cancel <ID> | Cancel a running or queued task |
bernstein checkpoint | Save progress for later resume |
bernstein wrap-up | End session with summary and learnings |
bernstein workspace | Multi-repo workspace status |
bernstein audit | Audit tasks and code quality |
bernstein verify | Verify task completion signals |
bernstein mcp | Start MCP server for tool integration |
bernstein plugins | List discovered plugins |