Skip to content

Phase 3

The build log for the third ship. Posture transitions, Disposition, Receipt, hook adapter, full CLI, and E2E coverage. Mirrors RECEIPTS.md with day-of-build context.

Headline

Shipped: v0.3.0a0, 2026-05-09. Six commits on claude/setup-project-structure-3YeiT ahead of main after the operator's Phase 2 ff-merge to main at e5e37bf. CI green across all 9 matrix cells.

Scope: Pure posture state machine (transition, evaluate), the closed Disposition enum, the content-addressable Receipt, the I/O hook adapter (run_hook, main), the full spine-lite CLI surface (validate-manifest, classify, hook), and an end-to-end test suite that exercises the same path Claude Code invokes when wiring the PreToolUse hook.

What's stable: Everything in spine_lite.__all__ after this phase. Phase 3 adds Disposition, Receipt, transition, evaluate to the surface. The full public API:

from spine_lite import (
    PRECEDENCE, Effect, most_restrictive,
    Manifest, ToolDefinition, parse_manifest,
    ToolCall, Decision, classify,
    Posture, Disposition, transition, evaluate,
    Receipt,
    SpineLiteError, ManifestError, ClassificationError,
    PostureError, HookError,
    __version__,
)

spine_lite.hook.run_hook and spine_lite.hook.main remain accessible via the submodule import for programmatic users; the canonical operator entry point is the spine-lite hook console script.

What's not in scope: Real-host integration (smoke against an actual Claude Code session) — the E2E tests run via subprocess against the installed spine-lite console script, which is the closest faithful test the build sandbox supports. PyPI publish remains a project-level decision.

Commit timeline

# SHA prefix Subject
1 6cdcde5 chore: enable attr_list mkdocs extension
2 29bfb63 feat: posture state machine with Disposition and evaluate
3 7cd329b feat: Receipt dataclass with deterministic serialization
4 0d92074 feat: PreToolUse hook adapter
5 d3e6cb6 feat: full CLI surface with integration and E2E tests
6 (this commit) release: bump to v0.3.0a0 + phase 3 exit receipt

Each commit independently passed the local verification gate before being staged.

Design choices recorded

  • Disposition is closed at three members. ALLOW, DENY, ESCALATE. Adding a fourth (e.g. LOG_ONLY) would require updating every consumer that exhaustively matches and the exit-code contract; same closed-taxonomy logic applies as for Effect.
  • evaluate is layered explicitly. The order of checks matters: posture allow-list first (one tool, many postures), LOCKED/DRY_RUN posture-specific rules next, require_confirmation last. Reordering the posture-specific checks against the allow-list would let a deny-listed tool slip through under LOCKED.
  • Exit codes use a wide range. 0 ALLOW / 1 DENY / 2 ESCALATE / 64 HOOK_ERROR / 65 MANIFEST_ERROR. 64+ is sysexits.h territory for "internal failure" — keeps policy outcomes distinguishable from protocol errors at the host's exit-code layer.
  • Receipt fields are content-addressable. to_canonical_json uses sort_keys=True, ensure_ascii=False, and compact separators. The hash is sha256(canonical_json.encode("utf-8")). No timestamps, no UUIDs, no per-run metadata — anything that varies between runs lives in the hook's external metadata, not the receipt itself.
  • Hook contract is intentionally minimal. Top-level JSON object with tool (required, non-empty string) and arguments (optional, object). Other fields are ignored. This lets the same hook adapt to multiple host hook formats; spine-lite doesn't need to be re-released when a host's payload schema evolves.
  • Stdin ↔ stdout is the boundary, not a config file. The CLI's --manifest flag is the configuration; payload comes via stdin; decision goes to stdout. No log files written by default. Whether to capture receipts to disk is the operator's choice (a --receipt-dir flag is reserved for a future minor release).
  • CLI uses Annotated-style typer parameters. Avoids B008 cleanly and matches typer's modern recommended idiom. The one carve-out: Path stays at the top of cli.py (TC003 ignored for this file only) because typer introspects the runtime annotation to do exists=True validation.
  • E2E tests run via python -m spine_lite.cli subprocess. True fresh-venv install is out of scope for the build sandbox, but invoking the installed entry point captures everything except the console-script shim.

Verification on the green run

  • ruff check: clean
  • ruff format --check: clean
  • mypy --strict src tests: clean across 19 source files
  • pytest: 209 / 209 passing
  • Coverage: 100% on every runtime module
  • mkdocs build --strict: clean
  • Hypothesis: 9 properties × 1,000 examples each (six classifier + three receipt)
  • E2E subprocess tests: 7 cases (5 posture × tool combinations + byte-stability + version)

Phase 3 exit gate

# Item State
1 posture.py (transitions + evaluate) 100% coverage ✓ (34 stmts, 14 branches, 0 miss)
2 receipt.py 100% coverage ✓ (22 stmts, 0 miss)
3 hook.py 100% coverage ✓ (54 stmts, 6 branches, 0 miss)
4 cli.py 100% coverage ✓ (49 stmts, 0 miss)
5 Integration tests for every subcommand
6 E2E smoke via installed entry point ✓ (7 subprocess cases)
7 mypy --strict clean
8 CI green on all 9 matrix cells (pending push verification)
9 CHANGELOG entry for v0.3.0a0
10 All commits in Conventional Commits format
11 Receipt appended to RECEIPTS.md ✓ (this commit)

Lessons for after Phase 3

  • Closed enums fold neatly into ordered transition tables. Once Posture was closed at four members, encoding the transition rules as dict[Posture, frozenset[Posture]] produced a fully-typed structure with zero special-cased code paths.
  • Hypothesis on dataclasses with st.builds(...) is cheap. The Receipt strategy at 1,000 examples per property is ~3 seconds. Tightening the strategy bounds (max sizes for arguments, text fields) keeps it that way.
  • Annotated-style typer is worth the upgrade. B008 ignore is an anti-pattern; the Annotated form is cleaner and the typer docs now lead with it.

See also