Concepts

The mental model behind Bernstein — how it thinks about agents, tasks, and orchestration.

Architecture

Bernstein has four main components, all running as separate processes:

ComponentRole
Task ServerFastAPI HTTP server at :8052. The single source of truth for task state. Agents talk to it via REST.
Agent SpawnerPolls the task server, picks up open tasks, routes model/effort, and launches CLI agent sessions.
JanitorRuns completion signal checks before marking tasks done. Prevents false positives.
CLI AdapterA thin wrapper per agent CLI (Claude Code, Codex, Gemini). Handles subprocess launch, env setup, and output parsing.

The orchestrator is deterministic code, not an LLM. Scheduling, routing, and lifecycle management are pure Python — fast, testable, and auditable.

bernstein CLI
    │
    ▼
Task Server (FastAPI, :8052)
    │
    ├── POST /tasks                     create a task
    ├── GET  /tasks/next/{role}         claim next open task
    ├── POST /tasks/{id}/complete       mark done
    ├── POST /tasks/{id}/fail           mark failed
    └── GET  /status                    dashboard summary
    │
    ▼
Agent Spawner
    │
    ├── Model router      opus / sonnet
    ├── Effort router     max / high / normal
    └── CLI adapter       claude / codex / gemini
    │
    ▼
Janitor (completion signal verification)

Task lifecycle

Every task moves through a well-defined state machine:

open → claimed → in_progress → done
                              ↘ failed
              ↑
         blocked (dependency not yet met)
StateMeaning
openWaiting in the queue. No agent has picked it up yet.
blockedHas unsatisfied depends_on entries. Will become open automatically when dependencies complete.
claimedAn agent has reserved the task but hasn't started yet.
in_progressAn agent is actively working. Heartbeat timeout triggers respawn if the agent goes silent.
doneAgent reported completion and janitor verified completion signals.
failedAgent reported failure, or janitor found completion signals unmet after max retries.

Task schema

Tasks live as YAML files in .sdd/backlog/open/:

id: "TSK-042"
title: "Implement hybrid retrieval with BM25 fallback"
role: "backend"
priority: 1            # 1=critical, 2=normal, 3=nice-to-have
scope: "medium"        # small / medium / large
complexity: "high"     # low / medium / high
estimated_minutes: 30
depends_on: ["TSK-040"]
completion_signals:
  - type: "path_exists"
    path: "src/retrieval/bm25.py"
  - type: "test_passes"
    command: "pytest tests/test_bm25.py"
files:
  - "src/retrieval/bm25.py"
  - "tests/test_bm25.py"

Agent roles

Roles are defined in templates/roles/{role}/. Each role directory contains:

RolePurposeDefault model
managerGoal decomposition, task planning, sprint reviewopus
backendAPI design, business logic, database worksonnet
frontendUI components, styling, accessibilitysonnet
qaTest writing, coverage analysis, regression testingsonnet
securityThreat modelling, vulnerability review, pen-testing scriptsopus
devopsCI/CD, infrastructure, deployment configurationsonnet
docsDocumentation, changelogs, README updatessonnet

Add custom roles by creating a new directory in templates/roles/. The spawner picks them up automatically.

Model routing

The spawner's router is deterministic — it inspects task metadata and returns a (model, effort) pair. No LLM involved:

ConditionModelEffortReasoning
role=manageropusmaxPlanning requires deep reasoning
role=securityopusmaxSecurity review needs exhaustive analysis
scope=large, complexity=highopushighComplex architecture tasks
complexity=mediumsonnethighStandard feature implementation
complexity=lowsonnetnormalQuick fixes, docs, formatting

You can override routing per-task with explicit model and effort fields in the task YAML.

Completion signals

The janitor checks completion signals before a task transitions to done. This prevents agents from self-reporting success without evidence.

Signal typeWhat it checks
path_existsA file or directory exists at the given path
test_passesA shell command exits with code 0
file_containsA file contains a specific string
api_respondsAn HTTP endpoint returns the expected status code
completion_signals:
  - type: "path_exists"
    path: "src/auth/jwt.py"
  - type: "test_passes"
    command: "pytest tests/test_auth.py -x -q"
  - type: "file_contains"
    path: "src/auth/jwt.py"
    contains: "def verify_token"

Self-evolution

Bernstein's evolve mode runs a continuous improvement loop against its own codebase:

1

Analyse metrics

Reads .sdd/metrics/ to identify slow tasks, high-cost models, frequently failing roles, and test flakiness.

2

Propose improvements

A manager agent generates concrete proposals (patch + rationale) ranked by risk level (L0–L3).

3

Sandbox validation

Low-risk proposals (L0/L1) are applied in an isolated branch and validated with the test suite before merging.

4

Apply or defer

Safe proposals are applied automatically. L2+ proposals go to .sdd/evolution/deferred.jsonl for human review.

Agent catalogs

Agent catalogs let you load specialist role definitions from YAML or Markdown files — locally or from a remote marketplace.

# List available agents in loaded catalogs
bernstein agents list

# Sync from remote catalog sources
bernstein agents sync

# Validate local agent catalog files
bernstein agents validate

A catalog entry looks like this:

name: "data-engineer"
description: "Specialist in ETL pipelines, data modelling, and query optimisation"
model: sonnet
effort: high
max_tasks_per_session: 2
system_prompt: |
  You are a data engineering specialist. You write clean, tested,
  well-documented ETL code. You prefer streaming over batch where
  possible, and always profile query performance.

File-based state

All state lives in .sdd/ — structured files that can be committed, diffed, and read by any tool. There is no hidden database.

PathContents
.sdd/backlog/open/Open task YAML files, one per task
.sdd/backlog/closed/Completed and failed task records
.sdd/agents/Per-agent heartbeat and session state files
.sdd/runtime/PID files, server.log, spawner.log
.sdd/metrics/Cost, token, and duration records per model and role
.sdd/knowledge/Project context injected into agent prompts
.sdd/evolution/Deferred evolution proposals awaiting review
.sdd/config.yamlServer port, model defaults, worker limits

Agents communicate with the task server over HTTP — they never write to .sdd/ directly (except their own heartbeat file).