Testing workflows locally

ntro workflow test is the design-time inner loop. You author a runbook locally, run the command, and get a per-scenario summary back in under a second. No Temporal cluster, no Docker, no deploy cycle. It catches the same workflow bugs the deployed e2e flow would catch — wrong @ui_step ordering, malformed activity payloads, signal handling regressions, child-workflow dispatch problems — but in seconds rather than minutes. This is what makes the coding-agent loop tight: the agent generates a change, runs the test, sees the result, iterates.

Prerequisites

pip install 'ntro[testing]'

This is installed automatically as a dependency of ntro-cli, so if you have the CLI, you have the harness.

Run a single workflow

ntro workflow test ./runbooks/document-ingest

Output:

✓  happy        (0.42s)
    [ 0.08s] submit_file       di-7a820232  signal=submit, source=invoice
    [ 0.18s] review            di-7a820232  response=approved
    [ 0.42s] done              di-7a820232
✓  reject_all   (0.31s)
summary: 2 passed, 0 failed (of 2)

The harness runs both built-in scenarios (HAPPY and REJECT_ALL) by default. HAPPY exercises the most code; REJECT_ALL verifies your runbook handles rejection cleanly.

Parent + children

Most production runbooks dispatch child workflows. Register all of them on the same harness invocation:

ntro workflow test ./runbooks/nav-monthly \
  --child ./runbooks/document-ingest \
  --child ./runbooks/nav-monthly-journals

Each --child is registered alongside the parent so the harness can dispatch them when the parent calls run_child_workflow(slug=...). Without registering a child, the dispatch fails with “child workflow slug not registered”.

Specific scenarios

Run only one scenario:

ntro workflow test ./runbooks/nav-monthly --scenario happy

Run several:

ntro workflow test ./runbooks/nav-monthly --scenario happy --scenario reject_all

Custom scenario names work too — define them in runbooks/<slug>/tests/scenarios.py and reference by name:

ntro workflow test ./runbooks/nav-monthly --scenario extract-with-fixes

CI / scripting — JSON output

ntro workflow test ./runbooks/nav-monthly --scenario happy --json

Returns a structured payload of scenario results that’s easy to parse from a CI script:

{
  "summary": {"passed": 2, "failed": 0, "total": 2},
  "scenarios": [
    {"name": "happy", "passed": true, "duration_s": 0.42, "events": [...]},
    {"name": "reject_all", "passed": true, "duration_s": 0.31, "events": [...]}
  ]
}

The non-zero exit code on failure is your CI gate.

What’s auto-mocked vs what’s real

The harness uses your runbook’s real code paths — your NtroWorkflow subclass, your @ui_step decorators, your Pydantic models. The bits it fakes are:

Activity returns — derived from the activity’s return type via Pydantic introspection
HITL responses — HAPPY approves, REJECT_ALL rejects, custom scenarios script per-step
submit_file signals — synthetic document_refs derived from the workflow’s advertised args
Temporal worker — runs in-memory via WorkflowEnvironment instead of a real Temporal cluster

Everything else is your code running for real. See ntro.testing for the harness internals and how to write custom scenarios.

What this catches (and what it doesn’t)

Catches	Doesn’t catch
`@ui_step` ordering / declaration order issues	Real LLM call failures (those need a real provider)
Activity signature mismatches	Real database schema mismatches (use the data plane for that — see CI database fixtures)
HITL signal handling regressions	Real worker-side config drift
Child workflow dispatch failures	Real Temporal cluster behaviour (timeouts under load, eviction edge cases)
Pydantic validation errors at activity boundaries	Real provider rate limits

For everything in the right column, deploy to a staging tenant and run there. The local harness is the inner loop; staging is the outer loop. Both are needed.

A typical iteration

# Edit runbooks/nav-monthly/templates/activities.py
vim runbooks/nav-monthly/templates/activities.py

# Run the local harness
ntro workflow test ./runbooks/nav-monthly \
  --child ./runbooks/document-ingest \
  --child ./runbooks/nav-monthly-journals

# Failure! Read the trace, fix the bug, re-run.
ntro workflow test ./runbooks/nav-monthly \
  --child ./runbooks/document-ingest \
  --child ./runbooks/nav-monthly-journals

# All green. Commit, push, open PR.
git add . && git commit -m "Fix journal balance edge case for split rent"

Iteration cycle: under 5 seconds. That’s the value prop.

Testing capability (SDK)

Internals: WorkflowHarness, Scenario, custom mocks.

Deploy to production

Once scenarios pass, ship it.

Get Started

Tooling

Ntro SDK

Deploy & Run

Testing workflows locally

Prerequisites

Run a single workflow

Parent + children

Specific scenarios

CI / scripting — JSON output

What’s auto-mocked vs what’s real

What this catches (and what it doesn’t)

A typical iteration

Testing capability (SDK)

Deploy to production

Get Started

Tooling

Ntro SDK

Deploy & Run

Documentation Index

​Prerequisites

​Run a single workflow

​Parent + children

​Specific scenarios

​CI / scripting — JSON output

​What’s auto-mocked vs what’s real

​What this catches (and what it doesn’t)

​A typical iteration

​Related

Testing capability (SDK)

Deploy to production

Prerequisites

Run a single workflow

Parent + children

Specific scenarios

CI / scripting — JSON output

What’s auto-mocked vs what’s real

What this catches (and what it doesn’t)

A typical iteration

Related