Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.ntropii.com/llms.txt

Use this file to discover all available pages before exploring further.

ntro workflow test is the design-time inner loop. You author a runbook locally, run the command, and get a per-scenario summary back in under a second. No Temporal cluster, no Docker, no deploy cycle. It catches the same workflow bugs the deployed e2e flow would catch — wrong @ui_step ordering, malformed activity payloads, signal handling regressions, child-workflow dispatch problems — but in seconds rather than minutes. This is what makes the coding-agent loop tight: the agent generates a change, runs the test, sees the result, iterates.

Prerequisites

pip install 'ntro[testing]'
This is installed automatically as a dependency of ntro-cli, so if you have the CLI, you have the harness.

Run a single workflow

ntro workflow test ./runbooks/document-ingest
Output:
✓  happy        (0.42s)
    [ 0.08s] submit_file       di-7a820232  signal=submit, source=invoice
    [ 0.18s] review            di-7a820232  response=approved
    [ 0.42s] done              di-7a820232
✓  reject_all   (0.31s)
summary: 2 passed, 0 failed (of 2)
The harness runs both built-in scenarios (HAPPY and REJECT_ALL) by default. HAPPY exercises the most code; REJECT_ALL verifies your runbook handles rejection cleanly.

Parent + children

Most production runbooks dispatch child workflows. Register all of them on the same harness invocation:
ntro workflow test ./runbooks/nav-monthly \
  --child ./runbooks/document-ingest \
  --child ./runbooks/nav-monthly-journals
Each --child is registered alongside the parent so the harness can dispatch them when the parent calls run_child_workflow(slug=...). Without registering a child, the dispatch fails with “child workflow slug not registered”.

Specific scenarios

Run only one scenario:
ntro workflow test ./runbooks/nav-monthly --scenario happy
Run several:
ntro workflow test ./runbooks/nav-monthly --scenario happy --scenario reject_all
Custom scenario names work too — define them in runbooks/<slug>/tests/scenarios.py and reference by name:
ntro workflow test ./runbooks/nav-monthly --scenario extract-with-fixes

CI / scripting — JSON output

ntro workflow test ./runbooks/nav-monthly --scenario happy --json
Returns a structured payload of scenario results that’s easy to parse from a CI script:
{
  "summary": {"passed": 2, "failed": 0, "total": 2},
  "scenarios": [
    {"name": "happy", "passed": true, "duration_s": 0.42, "events": [...]},
    {"name": "reject_all", "passed": true, "duration_s": 0.31, "events": [...]}
  ]
}
The non-zero exit code on failure is your CI gate.

What’s auto-mocked vs what’s real

The harness uses your runbook’s real code paths — your NtroWorkflow subclass, your @ui_step decorators, your Pydantic models. The bits it fakes are:
  • Activity returns — derived from the activity’s return type via Pydantic introspection
  • HITL responsesHAPPY approves, REJECT_ALL rejects, custom scenarios script per-step
  • submit_file signals — synthetic document_refs derived from the workflow’s advertised args
  • Temporal worker — runs in-memory via WorkflowEnvironment instead of a real Temporal cluster
Everything else is your code running for real. See ntro.testing for the harness internals and how to write custom scenarios.

What this catches (and what it doesn’t)

CatchesDoesn’t catch
@ui_step ordering / declaration order issuesReal LLM call failures (those need a real provider)
Activity signature mismatchesReal database schema mismatches (use the data plane for that — see CI database fixtures)
HITL signal handling regressionsReal worker-side config drift
Child workflow dispatch failuresReal Temporal cluster behaviour (timeouts under load, eviction edge cases)
Pydantic validation errors at activity boundariesReal provider rate limits
For everything in the right column, deploy to a staging tenant and run there. The local harness is the inner loop; staging is the outer loop. Both are needed.

A typical iteration

# Edit runbooks/nav-monthly/templates/activities.py
vim runbooks/nav-monthly/templates/activities.py

# Run the local harness
ntro workflow test ./runbooks/nav-monthly \
  --child ./runbooks/document-ingest \
  --child ./runbooks/nav-monthly-journals

# Failure! Read the trace, fix the bug, re-run.
ntro workflow test ./runbooks/nav-monthly \
  --child ./runbooks/document-ingest \
  --child ./runbooks/nav-monthly-journals

# All green. Commit, push, open PR.
git add . && git commit -m "Fix journal balance edge case for split rent"
Iteration cycle: under 5 seconds. That’s the value prop.

Testing capability (SDK)

Internals: WorkflowHarness, Scenario, custom mocks.

Deploy to production

Once scenarios pass, ship it.