Architecture
Bernstein has four main components, all running as separate processes:
| Component | Role |
|---|---|
| Task Server | FastAPI HTTP server at :8052. The single source of truth for task state. Agents talk to it via REST. |
| Agent Spawner | Polls the task server, picks up open tasks, routes model/effort, and launches CLI agent sessions. |
| Janitor | Runs completion signal checks before marking tasks done. Prevents false positives. |
| CLI Adapter | A thin wrapper per agent CLI (Claude Code, Codex, Gemini). Handles subprocess launch, env setup, and output parsing. |
The orchestrator is deterministic code, not an LLM. Scheduling, routing, and lifecycle management are pure Python — fast, testable, and auditable.
bernstein CLI
│
▼
Task Server (FastAPI, :8052)
│
├── POST /tasks create a task
├── GET /tasks/next/{role} claim next open task
├── POST /tasks/{id}/complete mark done
├── POST /tasks/{id}/fail mark failed
└── GET /status dashboard summary
│
▼
Agent Spawner
│
├── Model router opus / sonnet
├── Effort router max / high / normal
└── CLI adapter claude / codex / gemini
│
▼
Janitor (completion signal verification)
Task lifecycle
Every task moves through a well-defined state machine:
open → claimed → in_progress → done
↘ failed
↑
blocked (dependency not yet met)
| State | Meaning |
|---|---|
open | Waiting in the queue. No agent has picked it up yet. |
blocked | Has unsatisfied depends_on entries. Will become open automatically when dependencies complete. |
claimed | An agent has reserved the task but hasn't started yet. |
in_progress | An agent is actively working. Heartbeat timeout triggers respawn if the agent goes silent. |
done | Agent reported completion and janitor verified completion signals. |
failed | Agent reported failure, or janitor found completion signals unmet after max retries. |
Task schema
Tasks live as YAML files in .sdd/backlog/open/:
id: "TSK-042"
title: "Implement hybrid retrieval with BM25 fallback"
role: "backend"
priority: 1 # 1=critical, 2=normal, 3=nice-to-have
scope: "medium" # small / medium / large
complexity: "high" # low / medium / high
estimated_minutes: 30
depends_on: ["TSK-040"]
completion_signals:
- type: "path_exists"
path: "src/retrieval/bm25.py"
- type: "test_passes"
command: "pytest tests/test_bm25.py"
files:
- "src/retrieval/bm25.py"
- "tests/test_bm25.py"
Agent roles
Roles are defined in templates/roles/{role}/. Each role directory contains:
system_prompt.md— Role description, rules, and style guidelinestask_prompt.md— Per-task prompt template with{{TASK_TITLE}},{{TASK_DESCRIPTION}},{{FILES}}config.yaml— Default model, effort, and max tasks per session
| Role | Purpose | Default model |
|---|---|---|
manager | Goal decomposition, task planning, sprint review | opus |
backend | API design, business logic, database work | sonnet |
frontend | UI components, styling, accessibility | sonnet |
qa | Test writing, coverage analysis, regression testing | sonnet |
security | Threat modelling, vulnerability review, pen-testing scripts | opus |
devops | CI/CD, infrastructure, deployment configuration | sonnet |
docs | Documentation, changelogs, README updates | sonnet |
Add custom roles by creating a new directory in templates/roles/. The spawner picks them up automatically.
Model routing
The spawner's router is deterministic — it inspects task metadata and returns a (model, effort) pair. No LLM involved:
| Condition | Model | Effort | Reasoning |
|---|---|---|---|
role=manager | opus | max | Planning requires deep reasoning |
role=security | opus | max | Security review needs exhaustive analysis |
scope=large, complexity=high | opus | high | Complex architecture tasks |
complexity=medium | sonnet | high | Standard feature implementation |
complexity=low | sonnet | normal | Quick fixes, docs, formatting |
You can override routing per-task with explicit model and effort fields in the task YAML.
Completion signals
The janitor checks completion signals before a task transitions to done. This prevents agents from self-reporting success without evidence.
| Signal type | What it checks |
|---|---|
path_exists | A file or directory exists at the given path |
test_passes | A shell command exits with code 0 |
file_contains | A file contains a specific string |
api_responds | An HTTP endpoint returns the expected status code |
completion_signals:
- type: "path_exists"
path: "src/auth/jwt.py"
- type: "test_passes"
command: "pytest tests/test_auth.py -x -q"
- type: "file_contains"
path: "src/auth/jwt.py"
contains: "def verify_token"
Self-evolution
Bernstein's evolve mode runs a continuous improvement loop against its own codebase:
Analyse metrics
Reads .sdd/metrics/ to identify slow tasks, high-cost models, frequently failing roles, and test flakiness.
Propose improvements
A manager agent generates concrete proposals (patch + rationale) ranked by risk level (L0–L3).
Sandbox validation
Low-risk proposals (L0/L1) are applied in an isolated branch and validated with the test suite before merging.
Apply or defer
Safe proposals are applied automatically. L2+ proposals go to .sdd/evolution/deferred.jsonl for human review.
Agent catalogs
Agent catalogs let you load specialist role definitions from YAML or Markdown files — locally or from a remote marketplace.
# List available agents in loaded catalogs
bernstein agents list
# Sync from remote catalog sources
bernstein agents sync
# Validate local agent catalog files
bernstein agents validate
A catalog entry looks like this:
name: "data-engineer"
description: "Specialist in ETL pipelines, data modelling, and query optimisation"
model: sonnet
effort: high
max_tasks_per_session: 2
system_prompt: |
You are a data engineering specialist. You write clean, tested,
well-documented ETL code. You prefer streaming over batch where
possible, and always profile query performance.
File-based state
All state lives in .sdd/ — structured files that can be committed, diffed, and read by any tool. There is no hidden database.
| Path | Contents |
|---|---|
.sdd/backlog/open/ | Open task YAML files, one per task |
.sdd/backlog/closed/ | Completed and failed task records |
.sdd/agents/ | Per-agent heartbeat and session state files |
.sdd/runtime/ | PID files, server.log, spawner.log |
.sdd/metrics/ | Cost, token, and duration records per model and role |
.sdd/knowledge/ | Project context injected into agent prompts |
.sdd/evolution/ | Deferred evolution proposals awaiting review |
.sdd/config.yaml | Server port, model defaults, worker limits |
Agents communicate with the task server over HTTP — they never write to .sdd/ directly (except their own heartbeat file).