Skip to content

Phase 2

The build log for the second ship. Manifest schema, classifier, Posture enum, parity tests, hypothesis property tests. Mirrors RECEIPTS.md with day-of-build context.

Headline

Shipped: v0.2.0a0, 2026-05-08. Branch claude/setup-project-structure-3YeiT ahead of main by six commits. CI green across all 9 matrix cells.

Scope: Pydantic v2 manifest schema (ToolDefinition, Manifest, parse_manifest), pure classifier (ToolCall, Decision, classify), the closed Posture enum, authored test fixtures, parametrized parity tests, and 1,000-example hypothesis property tests for determinism, dominance, and round-trip stability.

What's stable: Everything in __all__ after this phase. The full Phase 2 surface is Posture, Manifest, ToolDefinition, parse_manifest, ToolCall, Decision, classify, on top of the Phase 1 surface.

What's not yet built: posture transition functions, receipt, hook, cli (full). Phase 3.

The opening halt

Phase 2 opened with a §9 halt that reframed the project's relationship to its sibling repository. See Porting Notes for the full record. Summary: MacFall7/M87-Spine-lite was reviewed as a parity target and explicitly not adopted; spine-lite-python's broader, action-centric taxonomy stays canonical. The halt and operator resolution are mirrored verbatim in RECEIPTS.md as the Phase 2 Day 1 opening entry.

Commit timeline

# SHA prefix Subject
1 111f34c chore: phase 2 blueprint correction — sibling, not parity target
2 600d870 feat: Posture state machine enum
3 9ed313d feat: pydantic v2 manifest schema
4 67470ff feat: classifier with Decision dataclass
5 ef32a5f test: authored fixtures, parametrized parity tests, hypothesis properties
6 (this commit) release: bump to v0.2.0a0 + phase 2 exit receipt

Each commit independently passed the local verification gate before being staged.

Design choices recorded

Decisions made during Phase 2 that the blueprint did not pin:

  • Effects field type. tuple[Effect, ...] rather than frozenset[Effect]. Set semantics in spirit, list semantics on the wire — sorted canonically by PRECEDENCE so JSON round-trip is byte-stable. Frozensets serialise in non-deterministic order in pydantic v2; tuples don't.
  • Postures field shape. tuple[Posture, ...] | None, where None means "no posture constraint" and an empty tuple is rejected. Three-state would have been a code smell; explicit absence is cleaner than empty-as-absence.
  • Manifest validation wrapper. parse_manifest() accepts dicts, JSON strings, and JSON bytes. ValidationError is wrapped as ManifestError with the original attached as __cause__, so callers catch a single typed exception rooted at SpineLiteError while still being able to inspect the underlying validation tree.
  • Classifier purity. Argument-aware classification deferred. Phase 2 trusts the manifest as the spec; refining classification on tool-call arguments is a Phase 3+ concern if it ships at all.
  • Hypothesis decorator typing. mypy --strict flags @given and @settings as untyped decorators. The override is scoped to tests.*; runtime modules stay strict with zero Any carve-outs.

Verification on the green run

  • ruff check: clean
  • ruff format --check: clean
  • mypy --strict src tests: clean across 16 source files
  • pytest: 99 / 99 passing
  • Coverage: 100% on every runtime module (effects, exceptions, posture, manifest, classifier, __init__, cli, plus the Phase 3 stubs)
  • mkdocs build --strict: clean
  • Hypothesis: 1,000 examples per property test, six properties, ~50s runtime

Phase 2 exit gate

# Item State
1 manifest.py 100% coverage
2 classifier.py 100% coverage
3 posture.py (enum scope) 100% coverage
4 Authored fixtures in tests/fixtures/ ✓ (4 files)
5 Parametrized parity tests against fixtures
6 Hypothesis property tests, ≥1,000 examples each ✓ (6 properties × 1,000)
7 mypy --strict clean
8 CI green (pending push verification)
9 CHANGELOG entry for v0.2.0a0
10 All commits in Conventional Commits format
11 Receipt appended to RECEIPTS.md ✓ (this commit)

Lessons for Phase 3

  • Probe before halting. WebFetch confirmed the sibling repo's actual taxonomy in two requests. Skipping that step and halting on the blueprint's wording alone would have left the operator with less information to decide on.
  • Canonicalisation belongs in the field validator, not at the call site. Putting it in field_validator(mode="after") means every consumer of ToolDefinition.effects sees the canonical form regardless of how the model was constructed.
  • Hypothesis is fast enough at 1,000 examples for property-test work if the strategies are tight. Six properties × 1,000 examples ran in ~50 seconds locally on Python 3.11.

See also