Documentation Index
Fetch the complete documentation index at: https://docs.ntropii.com/llms.txt
Use this file to discover all available pages before exploring further.
ntro workflow test is the design-time inner loop. You author a runbook locally, run the command, and get a per-scenario summary back in under a second. No Temporal cluster, no Docker, no deploy cycle.
It catches the same workflow bugs the deployed e2e flow would catch — wrong @ui_step ordering, malformed activity payloads, signal handling regressions, child-workflow dispatch problems — but in seconds rather than minutes. This is what makes the coding-agent loop tight: the agent generates a change, runs the test, sees the result, iterates.
Prerequisites
ntro-cli, so if you have the CLI, you have the harness.
Run a single workflow
HAPPY and REJECT_ALL) by default. HAPPY exercises the most code; REJECT_ALL verifies your runbook handles rejection cleanly.
Parent + children
Most production runbooks dispatch child workflows. Register all of them on the same harness invocation:--child is registered alongside the parent so the harness can dispatch them when the parent calls run_child_workflow(slug=...). Without registering a child, the dispatch fails with “child workflow slug not registered”.
Specific scenarios
Run only one scenario:runbooks/<slug>/tests/scenarios.py and reference by name:
CI / scripting — JSON output
What’s auto-mocked vs what’s real
The harness uses your runbook’s real code paths — yourNtroWorkflow subclass, your @ui_step decorators, your Pydantic models. The bits it fakes are:
- Activity returns — derived from the activity’s return type via Pydantic introspection
- HITL responses —
HAPPYapproves,REJECT_ALLrejects, custom scenarios script per-step submit_filesignals — syntheticdocument_refs derived from the workflow’s advertised args- Temporal worker — runs in-memory via
WorkflowEnvironmentinstead of a real Temporal cluster
ntro.testing for the harness internals and how to write custom scenarios.
What this catches (and what it doesn’t)
| Catches | Doesn’t catch |
|---|---|
@ui_step ordering / declaration order issues | Real LLM call failures (those need a real provider) |
| Activity signature mismatches | Real database schema mismatches (use the data plane for that — see CI database fixtures) |
| HITL signal handling regressions | Real worker-side config drift |
| Child workflow dispatch failures | Real Temporal cluster behaviour (timeouts under load, eviction edge cases) |
| Pydantic validation errors at activity boundaries | Real provider rate limits |
A typical iteration
Related
Testing capability (SDK)
Internals:
WorkflowHarness, Scenario, custom mocks.Deploy to production
Once scenarios pass, ship it.