Getting Started

Install Bernstein, run your first orchestration, and understand the configuration options.

Installation

Requires Python 3.12+ and at least one CLI coding agent (Claude Code, Codex CLI, Cursor, Gemini CLI, or Qwen).

# Install from PyPI
pipx install bernstein   # or: uv tool install bernstein

# Verify
bernstein --version
ℹ️

Bernstein itself is pure Python. The CLI agents it spawns (Claude Code, Codex, etc.) must be installed and authenticated separately. Bernstein auto-detects which agents are available.

Development install

If you want to contribute or develop Bernstein itself:

git clone https://github.com/chernistry/bernstein && cd bernstein
uv venv && uv pip install -e ".[dev]"
uv run python scripts/run_tests.py -x

First run

The fastest path — pass a goal inline:

cd my-project
bernstein init
bernstein -g "Add pagination to the users API endpoint"

Bernstein will:

  1. Start a task server on localhost:8052
  2. Inject an initial manager task with your goal
  3. Spawn a manager agent (Claude Code, Opus, max effort)
  4. The manager decomposes the goal and queues specialist tasks
  5. The spawner picks up each task, routes model/effort, and launches agents
  6. Agents work, commit, report results, and exit
  7. The system exits when the queue is empty

Init & workspace

Run bernstein init once per project. It creates .sdd/ — the file-based state directory:

.sdd/
  backlog/
    open/       # task files waiting to be claimed
    closed/     # completed and failed task records
  agents/       # per-agent heartbeat and state files
  runtime/      # PID files, server.log, spawner.log
  knowledge/    # project notes injected as context
  decisions/    # architecture decision records
  config.yaml   # port, model defaults, worker limits

Nothing leaves your project directory. .sdd/ is designed to be committed — it's your team's shared memory.

Seed file

For repeatable orchestration sessions, use a bernstein.yaml seed file. Bernstein picks it up automatically if it exists in the project root.

goal: "Build a legal RAG system with hybrid retrieval and typed answers"

tasks:
  - title: "Implement vector store"
    role: backend
    priority: 1
    scope: medium
    complexity: medium

  - title: "Add BM25 sparse index"
    role: backend
    priority: 2
    scope: small
    complexity: low
    depends_on: ["TSK-001"]

  - title: "Write integration tests"
    role: qa
    priority: 2
    scope: medium
    complexity: medium
    depends_on: ["TSK-001", "TSK-002"]
# Auto-detects bernstein.yaml
bernstein

# Or point to a specific file
bernstein --seed-file path/to/bernstein.yaml

Configuration

.sdd/config.yaml is created by bernstein init with sensible defaults:

KeyDefaultDescription
server_port8052Task server port
max_workers6Max simultaneous worker agents
default_modelsonnetDefault model for workers
default_efforthighDefault effort level

Model routing

The spawner selects model and effort automatically based on task metadata — you don't usually need to touch this:

Task propertyModelEffort
role=manageropusmax
role=securityopusmax
scope=large, complexity=highopushigh
complexity=mediumsonnethigh
complexity=lowsonnetnormal

Monitoring

Bernstein starts with a live TUI dashboard by default. You can also attach to a running session or query the server directly:

# Attach to running dashboard
bernstein live

# View recent agent logs
bernstein logs

# Dashboard summary via API
curl http://127.0.0.1:8052/status

# All tasks
curl http://127.0.0.1:8052/tasks

# Filter by status
curl "http://127.0.0.1:8052/tasks?status=open"

Inject tasks at runtime

The task server accepts new tasks while it's running:

curl -s -X POST http://127.0.0.1:8052/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "title": "Add rate limiting to the auth endpoint",
    "role": "backend",
    "description": "Implement token-bucket rate limiting, 100 req/min per IP.",
    "priority": 1,
    "scope": "small",
    "complexity": "low"
  }'

Stopping

# Graceful stop (waits up to 10s for in-flight agents)
bernstein stop

# Shorter timeout
bernstein stop --timeout 3

Bernstein sends a shutdown signal to the task server, waits for in-flight agents to finish, then cleans up PID files and exits.

Headless mode

Useful for overnight runs, CI pipelines, or remote servers:

bernstein --headless -g "Refactor the auth module"
bernstein --headless --seed-file bernstein.yaml

Output goes to .sdd/runtime/ logs instead of the terminal.

Evolve mode

Run Bernstein against itself for continuous self-improvement:

# Run indefinitely
bernstein --evolve

# Stop after 10 cycles or $5 spent
bernstein --evolve --max-cycles 10 --budget 5

# Unattended overnight run
bernstein --evolve --headless --max-cycles 20 --budget 10
FlagDefaultDescription
--evolveEnable continuous self-improvement mode
--max-cycles N0 (unlimited)Stop after N evolve cycles
--budget N0 (unlimited)Stop after $N spent
--interval N300Seconds between cycles

Risk-stratified: L0 (config tweaks) auto-apply, L1 (templates) sandbox-first, L2 (logic) requires PR + review, L3 (core) human-only. Critical files are SHA-locked.

# Review pending proposals
bernstein evolve review

# Approve a specific proposal
bernstein evolve approve <PROPOSAL_ID>

Cost tracking

Monitor API spend in real-time:

# Show cost breakdown
bernstein cost

# Set a budget cap (stops execution when reached)
bernstein run --budget 5.00 -g "Add auth"

The cost engine tracks spend per model, per role, and per task. A bandit-based model selector learns the cheapest model that meets quality thresholds.

CLI reference

CommandDescription
bernstein initInitialize .sdd/ directory in current project
bernstein run / -gStart orchestration with a goal
bernstein stopGracefully stop all agents and server
bernstein statusShow current task/agent status
bernstein liveAttach to running TUI dashboard
bernstein logsView recent agent logs
bernstein costShow spend breakdown by model, role, task
bernstein planDecompose a goal into tasks without running
bernstein demoRun a quick demo with a temp project
bernstein ideateRun creative visionary/analyst pipeline
bernstein retroGenerate retrospective from last run
bernstein agents listList available agent roles
bernstein agents syncSync agent catalog from remote registries
bernstein benchmarkRun evaluation harness
bernstein doctorPre-flight check: adapters, API keys, ports
bernstein recapPost-run summary: tasks, pass/fail, cost
bernstein trace <ID>Step-by-step agent decision trace
bernstein replay <ID>Re-run a task from its trace
bernstein dashboardOpen web dashboard in browser
bernstein psShow running agent processes
bernstein add-taskAdd a task to the backlog interactively
bernstein cancel <ID>Cancel a running or queued task
bernstein checkpointSave progress for later resume
bernstein wrap-upEnd session with summary and learnings
bernstein workspaceMulti-repo workspace status
bernstein auditAudit tasks and code quality
bernstein verifyVerify task completion signals
bernstein mcpStart MCP server for tool integration
bernstein pluginsList discovered plugins