Validation Contracts

Define testable assertions upfront so every milestone has a clear definition of done

What Are Validation Contracts

A validation contract is a set of testable claims about your system that you define before any work begins. Each claim, called an assertion, describes a concrete, verifiable behavior. The contract acts as an acceptance test plan that lives alongside your task files and drives the validator agent throughout the mission.

Contracts answer a simple question: how do we know this milestone is actually done? Instead of relying on gut checks or manual review, the contract gives every agent a shared, unambiguous checklist.

Contracts are powerful because they:

Force you to think about acceptance criteria during planning, not after implementation
Give the orchestrator a gate between milestones — work does not advance until assertions pass
Let the validator agent run independently, checking claims without blocking other agents
Provide a clear audit trail of what was verified, when, and with what evidence

Creating a Contract

Validation contracts live in .tasks/validation-contract.md inside your project root. The file is plain Markdown with bold assertion IDs.

# Create the file manually or let the lead agent generate it during planning
mkdir -p .tasks
touch .tasks/validation-contract.md

Format

Each assertion is a single line starting with a bold ID followed by a colon and the claim:

# Validation Contract

## Authentication

**VAL-AUTH-001**: Login endpoint returns a signed JWT on valid credentials
**VAL-AUTH-002**: Invalid credentials return 401 with a structured error body
**VAL-AUTH-003**: Expired tokens are rejected with 401 and a `token_expired` code
**VAL-AUTH-004**: Refresh token endpoint issues a new access token

## API

**VAL-API-001**: All endpoints require a valid Bearer token except /health
**VAL-API-002**: Rate limiter returns 429 after 100 requests per minute per key

## Data

**VAL-DATA-001**: User creation persists to the database and is retrievable by ID
**VAL-DATA-002**: Deleting a user cascades to dependent records

The validator parses these IDs from the contract file. You can organize assertions under headings, add prose between them, or include notes — only lines matching the **VAL-...**: pattern are treated as assertions.

Assertion Design

Naming Conventions

Use a hierarchical ID scheme: VAL-<DOMAIN>-<NUMBER>.

Prefix	Domain
`VAL-AUTH-`	Authentication and authorization
`VAL-API-`	REST/GraphQL API behavior
`VAL-UI-`	Frontend and user interface
`VAL-DATA-`	Database and data integrity
`VAL-PERF-`	Performance and load
`VAL-SEC-`	Security hardening
`VAL-INT-`	Integration and third-party APIs
`VAL-INFRA-`	Infrastructure and deployment

Numbers are zero-padded to three digits (001, 002, ...) so they sort naturally.

What Makes a Good Assertion

A good assertion is:

Specific — describes one observable behavior, not a vague quality
Testable — an agent or script can verify it with a concrete check
Independent — can be validated without relying on other assertions passing first
Evidenced — the validator can attach proof (test output, curl response, screenshot)

Good:

**VAL-AUTH-003**: Expired tokens are rejected with 401 and a `token_expired` code

Bad:

**VAL-AUTH-003**: Auth works correctly

The first assertion tells the validator exactly what to check. The second is unfalsifiable.

Examples Across Domains

## Auth

**VAL-AUTH-001**: POST /auth/login with valid credentials returns 200 and a JWT
**VAL-AUTH-002**: POST /auth/login with invalid credentials returns 401

## API

**VAL-API-001**: GET /api/users requires Authorization header; omitting it returns 401
**VAL-API-002**: GET /api/users?limit=5 returns at most 5 records

## UI

**VAL-UI-001**: Login form shows inline error on invalid credentials without page reload
**VAL-UI-002**: Dashboard renders within 2 seconds on a cold load

## Data

**VAL-DATA-001**: Creating a user with duplicate email returns 409 Conflict
**VAL-DATA-002**: Soft-deleted users are excluded from list queries by default

Linking Tasks to Assertions

When you create a task, use the --fulfills flag to declare which assertions that task is expected to satisfy:

tmux-ide task create "Implement JWT login" \
  --milestone M1 \
  --specialty backend \
  --fulfills "VAL-AUTH-001,VAL-AUTH-002"

tmux-ide task create "Add token refresh" \
  --milestone M1 \
  --specialty backend \
  --fulfills "VAL-AUTH-004"

tmux-ide task create "Build login form" \
  --milestone M2 \
  --specialty frontend \
  --fulfills "VAL-UI-001"

Coverage Invariant

Before moving from planning to execution, every assertion in the contract should be claimed by at least one task. The validate coverage command checks this:

tmux-ide validate coverage

Validation Coverage
  Covered:   10/12 assertions
  Uncovered: VAL-PERF-001, VAL-SEC-001

  ✗ Coverage incomplete — 2 assertions have no fulfilling task

If coverage is incomplete, create tasks to fill the gaps before running tmux-ide mission plan-complete. The orchestrator will not dispatch work for assertions that no task claims.

# Fill the gap
tmux-ide task create "Add rate limit tests" \
  --milestone M2 \
  --fulfills "VAL-PERF-001"

tmux-ide task create "Security headers audit" \
  --milestone M2 \
  --fulfills "VAL-SEC-001"

# Verify
tmux-ide validate coverage

Validation Coverage
  Covered:   12/12 assertions
  Uncovered: (none)

  ✓ Full coverage

Use --json for programmatic checks:

tmux-ide validate coverage --json

{
  "total": 12,
  "covered": 12,
  "uncovered": [],
  "complete": true
}

Validator Workflow

The validator is a dedicated agent pane that verifies assertions as milestones complete. In the missions template, the validator pane is preconfigured:

panes:
  - title: Validator
    command: claude
    role: validator
    skill: reviewer

How It Works

A milestone's tasks all reach done status.
The orchestrator triggers validation for that milestone.
The validator reads .tasks/validation-contract.md and identifies which assertions are linked to tasks in the completed milestone.
For each assertion, the validator runs the appropriate check — executing tests, curling endpoints, inspecting files, or reviewing code.
The validator calls tmux-ide validate assert for each assertion with a status and evidence.
Once all assertions for the milestone are checked, the validator produces a report.
If all assertions pass, the orchestrator advances to the next milestone. If any fail, remediation kicks in.

Manual Validation

You can also drive validation manually at any time:

# Check a specific assertion
tmux-ide validate assert VAL-AUTH-001 \
  --status passing \
  --evidence "POST /auth/login returns 200 with valid JWT (test suite: 3/3 passed)"

# View the full contract with current states
tmux-ide validate show

# Generate a summary report
tmux-ide validate report

Assertion States

Each assertion moves through a lifecycle:

State	Meaning
`pending`	Default state. The assertion has not been checked yet.
`passing`	The validator confirmed the assertion holds. Evidence attached.
`failing`	The validator checked and the assertion does not hold.
`blocked`	Cannot be validated because a dependency or precondition is missing.

State transitions:

pending ──→ passing
pending ──→ failing ──→ passing (after remediation)
pending ──→ blocked ──→ pending (after unblock) ──→ passing

View assertion states with:

tmux-ide validate show

Validation Contract — 12 assertions

  VAL-AUTH-001  passing   Login endpoint returns a signed JWT
  VAL-AUTH-002  passing   Invalid credentials return 401
  VAL-AUTH-003  pending   Expired tokens rejected with token_expired code
  VAL-AUTH-004  failing   Refresh token endpoint issues new access token
  VAL-API-001   passing   All endpoints require Bearer token
  VAL-API-002   blocked   Rate limiter returns 429 after 100 req/min
  ...

Summary: 3 passing, 1 failing, 1 blocked, 7 pending

Remediation

When assertions fail, the system does not silently move on. Failure triggers a structured remediation flow.

What Happens on Failure

Milestone blocks. The orchestrator will not advance to the next milestone while any assertion in the current one is failing.
Auto-task creation. The orchestrator can create remediation tasks automatically, tagged with the failing assertion IDs.
Milestone reset. If the milestone was marked done prematurely, it resets to active so new tasks can be dispatched.
Validator re-check. After remediation tasks complete, the validator re-runs the failing assertions.

Manual Remediation

If you prefer manual control:

# See what failed
tmux-ide validate report

# Create a fix task
tmux-ide task create "Fix refresh token rotation" \
  --milestone M1 \
  --specialty backend \
  --fulfills "VAL-AUTH-004"

# After the fix, re-assert
tmux-ide validate assert VAL-AUTH-004 \
  --status passing \
  --evidence "Refresh endpoint returns new access token (test: refresh_flow passed)"

Blocked Assertions

Mark an assertion as blocked when an external dependency prevents validation:

tmux-ide validate assert VAL-API-002 \
  --status blocked \
  --evidence "Rate limiter depends on Redis, not yet provisioned"

Blocked assertions do not prevent milestone advancement, but they will appear in reports as outstanding items.

Example Contract

Here is a realistic contract for a todo-list API project with authentication:

# Validation Contract — Todo API v1

## Authentication

**VAL-AUTH-001**: POST /auth/register creates a user and returns 201 with a JWT
**VAL-AUTH-002**: POST /auth/register with duplicate email returns 409 Conflict
**VAL-AUTH-003**: POST /auth/login with valid credentials returns 200 with a JWT
**VAL-AUTH-004**: POST /auth/login with wrong password returns 401
**VAL-AUTH-005**: JWT expires after 15 minutes; requests with expired tokens return 401

## API — Todos

**VAL-API-001**: POST /api/todos creates a todo and returns 201
**VAL-API-002**: GET /api/todos returns only todos belonging to the authenticated user
**VAL-API-003**: PATCH /api/todos/:id updates title and completed status
**VAL-API-004**: DELETE /api/todos/:id returns 204 and the todo is no longer retrievable
**VAL-API-005**: GET /api/todos?completed=true filters to completed todos only

## Data Integrity

**VAL-DATA-001**: Deleting a user cascades to their todos
**VAL-DATA-002**: Todo titles are trimmed and limited to 255 characters; longer input returns 422

## Security

**VAL-SEC-001**: All /api/\* routes return 401 without a valid Authorization header
**VAL-SEC-002**: Users cannot access or modify todos belonging to other users

Tasks Fulfilling This Contract

tmux-ide task create "User registration endpoint" \
  --milestone M1 --specialty backend \
  --fulfills "VAL-AUTH-001,VAL-AUTH-002"

tmux-ide task create "Login endpoint" \
  --milestone M1 --specialty backend \
  --fulfills "VAL-AUTH-003,VAL-AUTH-004"

tmux-ide task create "JWT expiry middleware" \
  --milestone M1 --specialty backend \
  --fulfills "VAL-AUTH-005"

tmux-ide task create "Todo CRUD endpoints" \
  --milestone M2 --specialty backend \
  --fulfills "VAL-API-001,VAL-API-002,VAL-API-003,VAL-API-004,VAL-API-005"

tmux-ide task create "Cascade delete and input validation" \
  --milestone M2 --specialty backend \
  --fulfills "VAL-DATA-001,VAL-DATA-002"

tmux-ide task create "Auth guard and ownership checks" \
  --milestone M2 --specialty backend \
  --fulfills "VAL-SEC-001,VAL-SEC-002"

# Confirm full coverage
tmux-ide validate coverage

Commands Reference

`validate show`

Display the full validation contract with current assertion states.

tmux-ide validate show
tmux-ide validate show --json

The --json output includes every assertion with its ID, description, state, and attached evidence.

`validate assert`

Set the state of a specific assertion.

tmux-ide validate assert <assertion-id> --status <state> --evidence "<text>"

Flag	Required	Description
`--status`	Yes	One of `pending`, `passing`, `failing`, `blocked`
`--evidence`	No	Free-text proof attached to the assertion

tmux-ide validate assert VAL-AUTH-001 \
  --status passing \
  --evidence "Integration test auth.login.test.ts: 4/4 passed"

tmux-ide validate assert VAL-API-002 \
  --status failing \
  --evidence "GET /api/todos returns all todos regardless of user"

`validate report`

Generate a summary report of the entire contract.

tmux-ide validate report
tmux-ide validate report --json

Validation Report
  Total:    14 assertions
  Passing:  10
  Failing:   1
  Blocked:   0
  Pending:   3

  Failing assertions:
    VAL-API-002  GET /api/todos returns only todos belonging to the authenticated user

  Pending assertions:
    VAL-DATA-001  Deleting a user cascades to their todos
    VAL-DATA-002  Todo titles trimmed and limited to 255 characters
    VAL-SEC-002   Users cannot access or modify other users' todos

`validate coverage`

Check whether every assertion in the contract is claimed by at least one task via --fulfills.

tmux-ide validate coverage
tmux-ide validate coverage --json

Run this before tmux-ide mission plan-complete to catch gaps early.

Worked Examples

The three patterns below cover the three engineering scenarios where validation contracts pay for themselves the fastest: shipping a new subsystem, evolving an existing API, and refactoring without regressions. Each example is a real-feeling .tasks/validation-contract.md plus the CLI flow that drives it.

Pattern 1 — Auth System Contract

Auth is the canonical case for validation contracts. The behavior is high-stakes, easy to get subtly wrong, and trivial to verify with concrete request/response checks. Define the contract before writing the first route handler.

.tasks/validation-contract.md:

# Validation Contract — Auth System

## Login & Tokens

**VAL-AUTH-001**: POST /auth/login with valid credentials returns 200 and a signed JWT in the response body
**VAL-AUTH-002**: JWT refresh rotates the refresh token; the previous refresh token is rejected on reuse
**VAL-AUTH-003**: Expired access token returns 401 with `{"code":"token_expired"}`, never 500
**VAL-AUTH-004**: POST /auth/login with wrong password returns 401 and does NOT leak whether the email exists

## Session Hygiene

**VAL-AUTH-005**: POST /auth/logout invalidates the refresh token server-side; subsequent refresh returns 401
**VAL-AUTH-006**: Concurrent refresh from two devices does not cause both to be logged out (single-rotation race)

Two tasks fulfill this contract. Note that VAL-AUTH-001 and VAL-AUTH-002 are claimed by the same task because they are tightly coupled — you cannot ship rotation without first issuing tokens:

tmux-ide task create "Implement login + refresh rotation" \
  --milestone M1 --specialty backend \
  --fulfills "VAL-AUTH-001,VAL-AUTH-002,VAL-AUTH-005"

tmux-ide task create "Error handling for expired and invalid tokens" \
  --milestone M1 --specialty backend \
  --fulfills "VAL-AUTH-003,VAL-AUTH-004"

tmux-ide task create "Refresh race-condition handling" \
  --milestone M1 --specialty backend \
  --depends "001" \
  --fulfills "VAL-AUTH-006"

tmux-ide validate coverage

Once all three tasks reach done, the orchestrator transitions M1 to validating and dispatches the validator pane:

tmux-ide validate show

Validation Contract — 6 assertions

  VAL-AUTH-001  pending   POST /auth/login returns 200 and signed JWT
  VAL-AUTH-002  pending   JWT refresh rotates the refresh token
  VAL-AUTH-003  pending   Expired access token returns 401, not 500
  VAL-AUTH-004  pending   Wrong password returns 401 without email enumeration
  VAL-AUTH-005  pending   Logout invalidates refresh token server-side
  VAL-AUTH-006  pending   Concurrent refresh handles single-rotation race

The validator agent runs the integration suite, then records evidence for each assertion:

tmux-ide validate assert VAL-AUTH-001 \
  --status passing \
  --evidence "auth.login.test.ts: 4/4 passed; curl POST /auth/login returns {access,refresh} tokens"

tmux-ide validate assert VAL-AUTH-002 \
  --status passing \
  --evidence "auth.refresh.rotation.test.ts: replay of old refresh token returns 401"

tmux-ide validate assert VAL-AUTH-003 \
  --status failing \
  --evidence "Expired token returns 500 with stack trace — middleware swallows JWT decode error"

The failure on VAL-AUTH-003 triggers the remediation flow described above:

The orchestrator creates a Remediate: assertion VAL-AUTH-003 failing task tagged remediation and resets the assertion to pending.
M1 transitions back to active, blocking M2 from starting.
After the fix lands and the assertion is re-asserted as passing, M1 completes and M2 unblocks.

tmux-ide validate report

Validation Report
  Total:    6 assertions
  Passing:  5
  Failing:  0
  Blocked:  0
  Pending:  1

  Pending assertions:
    VAL-AUTH-003  Expired access token returns 401 with token_expired code (remediation in progress)

Pattern 2 — API Migration Contract

Migrations are the case where multiple tasks legitimately claim the same assertion. Backwards compatibility is the product of work on both the legacy adapter and the new handlers, and you want either side of that pair to be able to break the assertion independently.

.tasks/validation-contract.md:

# Validation Contract — v1 → v2 API Migration

## Backwards Compatibility

**VAL-MIG-001**: v1 routes still respond 200 with the original deprecated shape (no breaking field renames, no removed keys)
**VAL-MIG-002**: v2 routes accept v1 client payloads via the request adapter (old clients keep working without redeploy)
**VAL-MIG-003**: v1 routes emit a `Deprecation` and `Sunset` header pointing at the v2 equivalent

## v2 Surface

**VAL-MIG-004**: v2 routes return the new canonical shape with snake_case → camelCase migration applied
**VAL-MIG-005**: v2 error envelope conforms to RFC 7807 (problem+json) with `type`, `title`, `status`, `detail`

## Cutover Safety

**VAL-MIG-006**: A request to a v1 route and the equivalent v2 route, with the same input, produces the same observable side effects (same row written, same event emitted)

Multiple tasks per assertion is the right call here — VAL-MIG-006 is the bridge invariant, and three different tasks each contribute a slice of the proof:

# Adapter team
tmux-ide task create "v1→v2 request adapter" \
  --milestone M1 --specialty backend \
  --fulfills "VAL-MIG-002,VAL-MIG-006"

tmux-ide task create "v2→v1 response shape mapper" \
  --milestone M1 --specialty backend \
  --fulfills "VAL-MIG-001,VAL-MIG-006"

# v2 surface team (parallel)
tmux-ide task create "v2 canonical shape + RFC 7807 errors" \
  --milestone M1 --specialty backend \
  --fulfills "VAL-MIG-004,VAL-MIG-005"

# Deprecation headers
tmux-ide task create "Emit Deprecation/Sunset headers on v1" \
  --milestone M1 --specialty backend \
  --fulfills "VAL-MIG-003"

# Side-effect parity test harness (also fulfills the bridge invariant)
tmux-ide task create "Side-effect parity test: v1 vs v2 for all mutating routes" \
  --milestone M1 --specialty backend \
  --depends "001,002,003" \
  --fulfills "VAL-MIG-006"

Because three tasks all claim VAL-MIG-006, the assertion stays pending until the validator actually exercises the parity harness — finishing the individual feature tasks is not enough on its own. The validator runs the parity suite last:

tmux-ide validate assert VAL-MIG-006 \
  --status passing \
  --evidence "parity-suite.test.ts: 24/24 routes produce identical DB writes and event payloads (v1 vs v2)"

If VAL-MIG-001 fails because a v1 response key got renamed, the orchestrator will create a remediation task but will NOT roll back the v2 work — only assertions tied to failing milestones gate progress. Use validate coverage --json in CI to make sure every new v1 deprecation has a fulfilling task before merge:

tmux-ide validate coverage --json | jq '.uncovered'

[]

Pattern 3 — Refactor Contract

Refactors are the smallest-scope contracts and the easiest to skip — which is exactly why they bite. The trick is to frame assertions as invariants: things that must remain true after the refactor lands. Don't describe the new architecture, describe what must NOT change.

.tasks/validation-contract.md:

# Validation Contract — Extract Pricing Engine

## Behavioral Invariants

**VAL-REF-001**: All existing snapshot tests still pass unchanged (no `--update-snapshots`)
**VAL-REF-002**: No new TypeScript errors in changed files (`pnpm tsc --noEmit` exit 0)
**VAL-REF-003**: Public exports from `@app/pricing` are identical pre- and post-refactor (`api-extractor` diff is empty)

## Performance Floor

**VAL-REF-004**: The `pricing.calculate.bench.ts` benchmark stays within ±5% of the pre-refactor baseline

Refactor contracts are usually fulfilled by a single task:

tmux-ide task create "Extract pricing engine to @app/pricing package" \
  --milestone M1 --specialty backend \
  --fulfills "VAL-REF-001,VAL-REF-002,VAL-REF-003,VAL-REF-004"

tmux-ide validate coverage

The validator runs each check directly:

tmux-ide validate assert VAL-REF-001 \
  --status passing \
  --evidence "pnpm vitest run: 412/412 passed, 0 snapshots updated"

tmux-ide validate assert VAL-REF-002 \
  --status passing \
  --evidence "pnpm tsc --noEmit: exit 0, 0 new errors in 14 changed files"

tmux-ide validate assert VAL-REF-003 \
  --status passing \
  --evidence "api-extractor diff: 0 changes to .api.md"

tmux-ide validate assert VAL-REF-004 \
  --status failing \
  --evidence "pricing.calculate.bench.ts: 312ms vs baseline 240ms (+30%), regression in tier resolution"

The VAL-REF-004 failure does exactly what you want — it blocks M1 from completing, surfaces a remediation task with the benchmark evidence attached, and prevents the refactor from being declared "done" with a silent 30% perf regression. Re-assert after fixing:

tmux-ide validate report --json

{
  "total": 4,
  "passing": 4,
  "failing": 0,
  "blocked": 0,
  "pending": 0
}

Anti-patterns

These four assertion shapes look reasonable in a planning doc but make the validator's job impossible. Each one should be rewritten before mission plan-complete.

Too vague

**VAL-AUTH-001**: Auth works correctly

There is no observable behavior here. The validator cannot tell you whether this assertion holds — and worse, neither can a human reviewer. Fix: name the endpoint, the input, the expected status code, and the expected response shape.

Untestable

**VAL-CODE-001**: The code is clean and maintainable

"Clean" is a taste judgment, not a check. The validator has no script to run. Fix: translate the intent into something measurable — Cyclomatic complexity of all new functions in src/pricing/ is ≤ 10 (eslint-plugin-sonarjs).

Tautological

**VAL-TEST-001**: All tests pass

This is always either trivially true (because you ran the suite) or trivially false (because something broke unrelated to the work). It does not constrain the change. Fix: specify the behavior the new tests must cover — Login flow has integration tests for valid credentials, wrong password, and expired token cases.

Implementation-detail-coupled

**VAL-IMPL-001**: The PricingEngine class uses the Strategy pattern with a TierResolver injected via DI

This locks in a specific design before the work starts and makes the assertion fail the moment someone finds a better factoring. The validator is now policing your architecture instead of your behavior. Fix: assert the observable contract — pricing.calculate(input) returns the same output for tiers A/B/C as the legacy implementation across the 200-case fixture.

Best Practices

Write the contract during planning, before any code exists
Keep assertions atomic — one behavior per assertion
Use validate coverage as a planning checkpoint, not an afterthought
Attach concrete evidence when asserting — test names, curl output, or log excerpts
Do not skip blocked assertions; track them so they surface in reports
Re-run validate report after every milestone to maintain an accurate audit trail
Let the validator agent own assertion state; avoid manually marking assertions as passing unless you are debugging

Validation Contracts

On this page