Getting Started — Bernstein

Installation

Requires Python 3.12+ and at least one CLI coding agent (Claude Code, Codex CLI, Cursor, Gemini CLI, or Qwen).

# Install from PyPI
pipx install bernstein   # or: uv tool install bernstein

# Verify
bernstein --version

ℹ️

Bernstein itself is pure Python. The CLI agents it spawns (Claude Code, Codex, etc.) must be installed and authenticated separately. Bernstein auto-detects which agents are available.

Development install

If you want to contribute or develop Bernstein itself:

git clone https://github.com/chernistry/bernstein && cd bernstein
uv venv && uv pip install -e ".[dev]"
uv run python scripts/run_tests.py -x

First run

The fastest path — pass a goal inline:

cd my-project
bernstein init
bernstein -g "Add pagination to the users API endpoint"

Bernstein will:

Start a task server on localhost:8052
Inject an initial manager task with your goal
Spawn a manager agent (Claude Code, Opus, max effort)
The manager decomposes the goal and queues specialist tasks
The spawner picks up each task, routes model/effort, and launches agents
Agents work, commit, report results, and exit
The system exits when the queue is empty

Init & workspace

Run bernstein init once per project. It creates .sdd/ — the file-based state directory:

.sdd/
  backlog/
    open/       # task files waiting to be claimed
    closed/     # completed and failed task records
  agents/       # per-agent heartbeat and state files
  runtime/      # PID files, server.log, spawner.log
  knowledge/    # project notes injected as context
  decisions/    # architecture decision records
  config.yaml   # port, model defaults, worker limits

Nothing leaves your project directory. .sdd/ is designed to be committed — it's your team's shared memory.

Seed file

For repeatable orchestration sessions, use a bernstein.yaml seed file. Bernstein picks it up automatically if it exists in the project root.

goal: "Build a legal RAG system with hybrid retrieval and typed answers"

tasks:
  - title: "Implement vector store"
    role: backend
    priority: 1
    scope: medium
    complexity: medium

  - title: "Add BM25 sparse index"
    role: backend
    priority: 2
    scope: small
    complexity: low
    depends_on: ["TSK-001"]

  - title: "Write integration tests"
    role: qa
    priority: 2
    scope: medium
    complexity: medium
    depends_on: ["TSK-001", "TSK-002"]

# Auto-detects bernstein.yaml
bernstein

# Or point to a specific file
bernstein --seed-file path/to/bernstein.yaml

Configuration

.sdd/config.yaml is created by bernstein init with sensible defaults:

Key	Default	Description
`server_port`	`8052`	Task server port
`max_workers`	`6`	Max simultaneous worker agents
`default_model`	`sonnet`	Default model for workers
`default_effort`	`high`	Default effort level

Model routing

The spawner selects model and effort automatically based on task metadata — you don't usually need to touch this:

Task property	Model	Effort
`role=manager`	opus	max
`role=security`	opus	max
`scope=large, complexity=high`	opus	high
`complexity=medium`	sonnet	high
`complexity=low`	sonnet	normal

Monitoring

Bernstein starts with a live TUI dashboard by default. You can also attach to a running session or query the server directly:

# Attach to running dashboard
bernstein live

# View recent agent logs
bernstein logs

# Dashboard summary via API
curl http://127.0.0.1:8052/status

# All tasks
curl http://127.0.0.1:8052/tasks

# Filter by status
curl "http://127.0.0.1:8052/tasks?status=open"

Inject tasks at runtime

The task server accepts new tasks while it's running:

curl -s -X POST http://127.0.0.1:8052/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "title": "Add rate limiting to the auth endpoint",
    "role": "backend",
    "description": "Implement token-bucket rate limiting, 100 req/min per IP.",
    "priority": 1,
    "scope": "small",
    "complexity": "low"
  }'

Stopping

# Graceful stop (waits up to 10s for in-flight agents)
bernstein stop

# Shorter timeout
bernstein stop --timeout 3

Bernstein sends a shutdown signal to the task server, waits for in-flight agents to finish, then cleans up PID files and exits.

Headless mode

Useful for overnight runs, CI pipelines, or remote servers:

bernstein --headless -g "Refactor the auth module"
bernstein --headless --seed-file bernstein.yaml

Output goes to .sdd/runtime/ logs instead of the terminal.

Evolve mode

Run Bernstein against itself for continuous self-improvement:

# Run indefinitely
bernstein --evolve

# Stop after 10 cycles or $5 spent
bernstein --evolve --max-cycles 10 --budget 5

# Unattended overnight run
bernstein --evolve --headless --max-cycles 20 --budget 10

Flag	Default	Description
`--evolve`	—	Enable continuous self-improvement mode
`--max-cycles N`	`0` (unlimited)	Stop after N evolve cycles
`--budget N`	`0` (unlimited)	Stop after $N spent
`--interval N`	`300`	Seconds between cycles

Risk-stratified: L0 (config tweaks) auto-apply, L1 (templates) sandbox-first, L2 (logic) requires PR + review, L3 (core) human-only. Critical files are SHA-locked.

# Review pending proposals
bernstein evolve review

# Approve a specific proposal
bernstein evolve approve <PROPOSAL_ID>

Cost tracking

Monitor API spend in real-time:

# Show cost breakdown
bernstein cost

# Set a budget cap (stops execution when reached)
bernstein run --budget 5.00 -g "Add auth"

The cost engine tracks spend per model, per role, and per task. A bandit-based model selector learns the cheapest model that meets quality thresholds.

CLI reference

Command	Description
`bernstein init`	Initialize `.sdd/` directory in current project
`bernstein run` / `-g`	Start orchestration with a goal
`bernstein stop`	Gracefully stop all agents and server
`bernstein status`	Show current task/agent status
`bernstein live`	Attach to running TUI dashboard
`bernstein logs`	View recent agent logs
`bernstein cost`	Show spend breakdown by model, role, task
`bernstein plan`	Decompose a goal into tasks without running
`bernstein demo`	Run a quick demo with a temp project
`bernstein ideate`	Run creative visionary/analyst pipeline
`bernstein retro`	Generate retrospective from last run
`bernstein agents list`	List available agent roles
`bernstein agents sync`	Sync agent catalog from remote registries
`bernstein benchmark`	Run evaluation harness
`bernstein doctor`	Pre-flight check: adapters, API keys, ports
`bernstein recap`	Post-run summary: tasks, pass/fail, cost
`bernstein trace <ID>`	Step-by-step agent decision trace
`bernstein replay <ID>`	Re-run a task from its trace
`bernstein dashboard`	Open web dashboard in browser
`bernstein ps`	Show running agent processes
`bernstein add-task`	Add a task to the backlog interactively
`bernstein cancel <ID>`	Cancel a running or queued task
`bernstein checkpoint`	Save progress for later resume
`bernstein wrap-up`	End session with summary and learnings
`bernstein workspace`	Multi-repo workspace status
`bernstein audit`	Audit tasks and code quality
`bernstein verify`	Verify task completion signals
`bernstein mcp`	Start MCP server for tool integration
`bernstein plugins`	List discovered plugins