Concepts — Bernstein

Architecture

Bernstein has four main components, all running as separate processes:

Component	Role
Task Server	FastAPI HTTP server at `:8052`. The single source of truth for task state. Agents talk to it via REST.
Agent Spawner	Polls the task server, picks up open tasks, routes model/effort, and launches CLI agent sessions.
Janitor	Runs completion signal checks before marking tasks done. Prevents false positives.
CLI Adapter	A thin wrapper per agent CLI (Claude Code, Codex, Gemini). Handles subprocess launch, env setup, and output parsing.

The orchestrator is deterministic code, not an LLM. Scheduling, routing, and lifecycle management are pure Python — fast, testable, and auditable.

bernstein CLI
    │
    ▼
Task Server (FastAPI, :8052)
    │
    ├── POST /tasks                     create a task
    ├── GET  /tasks/next/{role}         claim next open task
    ├── POST /tasks/{id}/complete       mark done
    ├── POST /tasks/{id}/fail           mark failed
    └── GET  /status                    dashboard summary
    │
    ▼
Agent Spawner
    │
    ├── Model router      opus / sonnet
    ├── Effort router     max / high / normal
    └── CLI adapter       claude / codex / gemini
    │
    ▼
Janitor (completion signal verification)

Task lifecycle

Every task moves through a well-defined state machine:

open → claimed → in_progress → done
                              ↘ failed
              ↑
         blocked (dependency not yet met)

State	Meaning
`open`	Waiting in the queue. No agent has picked it up yet.
`blocked`	Has unsatisfied `depends_on` entries. Will become `open` automatically when dependencies complete.
`claimed`	An agent has reserved the task but hasn't started yet.
`in_progress`	An agent is actively working. Heartbeat timeout triggers respawn if the agent goes silent.
`done`	Agent reported completion and janitor verified completion signals.
`failed`	Agent reported failure, or janitor found completion signals unmet after max retries.

Task schema

Tasks live as YAML files in .sdd/backlog/open/:

id: "TSK-042"
title: "Implement hybrid retrieval with BM25 fallback"
role: "backend"
priority: 1            # 1=critical, 2=normal, 3=nice-to-have
scope: "medium"        # small / medium / large
complexity: "high"     # low / medium / high
estimated_minutes: 30
depends_on: ["TSK-040"]
completion_signals:
  - type: "path_exists"
    path: "src/retrieval/bm25.py"
  - type: "test_passes"
    command: "pytest tests/test_bm25.py"
files:
  - "src/retrieval/bm25.py"
  - "tests/test_bm25.py"

Agent roles

Roles are defined in templates/roles/{role}/. Each role directory contains:

system_prompt.md — Role description, rules, and style guidelines
task_prompt.md — Per-task prompt template with {{TASK_TITLE}}, {{TASK_DESCRIPTION}}, {{FILES}}
config.yaml — Default model, effort, and max tasks per session

Role	Purpose	Default model
`manager`	Goal decomposition, task planning, sprint review	opus
`backend`	API design, business logic, database work	sonnet
`frontend`	UI components, styling, accessibility	sonnet
`qa`	Test writing, coverage analysis, regression testing	sonnet
`security`	Threat modelling, vulnerability review, pen-testing scripts	opus
`devops`	CI/CD, infrastructure, deployment configuration	sonnet
`docs`	Documentation, changelogs, README updates	sonnet

Add custom roles by creating a new directory in templates/roles/. The spawner picks them up automatically.

Model routing

The spawner's router is deterministic — it inspects task metadata and returns a (model, effort) pair. No LLM involved:

Condition	Model	Effort	Reasoning
`role=manager`	opus	max	Planning requires deep reasoning
`role=security`	opus	max	Security review needs exhaustive analysis
`scope=large, complexity=high`	opus	high	Complex architecture tasks
`complexity=medium`	sonnet	high	Standard feature implementation
`complexity=low`	sonnet	normal	Quick fixes, docs, formatting

You can override routing per-task with explicit model and effort fields in the task YAML.

Completion signals

The janitor checks completion signals before a task transitions to done. This prevents agents from self-reporting success without evidence.

Signal type	What it checks
`path_exists`	A file or directory exists at the given path
`test_passes`	A shell command exits with code 0
`file_contains`	A file contains a specific string
`api_responds`	An HTTP endpoint returns the expected status code

completion_signals:
  - type: "path_exists"
    path: "src/auth/jwt.py"
  - type: "test_passes"
    command: "pytest tests/test_auth.py -x -q"
  - type: "file_contains"
    path: "src/auth/jwt.py"
    contains: "def verify_token"

Self-evolution

Bernstein's evolve mode runs a continuous improvement loop against its own codebase:

Analyse metrics

Reads .sdd/metrics/ to identify slow tasks, high-cost models, frequently failing roles, and test flakiness.

Propose improvements

A manager agent generates concrete proposals (patch + rationale) ranked by risk level (L0–L3).

Sandbox validation

Low-risk proposals (L0/L1) are applied in an isolated branch and validated with the test suite before merging.

Apply or defer

Safe proposals are applied automatically. L2+ proposals go to .sdd/evolution/deferred.jsonl for human review.

Agent catalogs

Agent catalogs let you load specialist role definitions from YAML or Markdown files — locally or from a remote marketplace.

# List available agents in loaded catalogs
bernstein agents list

# Sync from remote catalog sources
bernstein agents sync

# Validate local agent catalog files
bernstein agents validate

A catalog entry looks like this:

name: "data-engineer"
description: "Specialist in ETL pipelines, data modelling, and query optimisation"
model: sonnet
effort: high
max_tasks_per_session: 2
system_prompt: |
  You are a data engineering specialist. You write clean, tested,
  well-documented ETL code. You prefer streaming over batch where
  possible, and always profile query performance.

File-based state

All state lives in .sdd/ — structured files that can be committed, diffed, and read by any tool. There is no hidden database.

Path	Contents
`.sdd/backlog/open/`	Open task YAML files, one per task
`.sdd/backlog/closed/`	Completed and failed task records
`.sdd/agents/`	Per-agent heartbeat and session state files
`.sdd/runtime/`	PID files, `server.log`, `spawner.log`
`.sdd/metrics/`	Cost, token, and duration records per model and role
`.sdd/knowledge/`	Project context injected into agent prompts
`.sdd/evolution/`	Deferred evolution proposals awaiting review
`.sdd/config.yaml`	Server port, model defaults, worker limits

Agents communicate with the task server over HTTP — they never write to .sdd/ directly (except their own heartbeat file).