Testing

ntro.testing is the inner-loop test harness for runbook authors. It boots an in-memory Temporal (temporalio.testing.WorkflowEnvironment), registers your workflow plus any child workflows with auto-mocked activities, and drives the agent loop per scenario. The whole thing runs sub-second. No Temporal cluster, no Docker, no deploy cycle. Catches the same workflow bugs the deployed e2e flow would, but in seconds.

Install

pip install 'ntro[testing]'

The [testing] extra adds the harness on top of [workflow]. The ntro workflow test CLI command installs this transitively.

The API

from ntro.testing import (
    WorkflowHarness,
    Scenario,
    HAPPY,
    REJECT_ALL,
    load_runbook,
    report,
)

Three core surfaces:

load_runbook(path) — discover the workflow class, activities, and skill definition under a runbook directory
WorkflowHarness(workflow, child_workflows=[...]) — async context manager that boots the in-memory Temporal env
Scenario(name, ...) — scripts the HITL responses and per-step overrides for a single test run

Canonical example

Lifted from nav-monthly/tests/test_scenarios.py:

import asyncio
from ntro.testing import (
    WorkflowHarness,
    Scenario,
    HAPPY,
    REJECT_ALL,
    load_runbook,
    report,
)


async def main():
    nav, _ = load_runbook("./runbooks/nav-monthly")
    di,  _ = load_runbook("./runbooks/document-ingest")
    nj,  _ = load_runbook("./runbooks/nav-monthly-journals")

    results = []
    for scenario in [HAPPY, REJECT_ALL]:
        async with WorkflowHarness(nav, child_workflows=[di, nj]) as h:
            results.append(await h.run(input=ctx, scenario=scenario))

    print(report.human(results))


asyncio.run(main())

Output:

✓  happy        (0.86s)
    [ 0.13s] submit_file  hly-7a820232  signal=tb_submitted, source=xero-trial-balance
    [ 0.24s] drill_down   hly-7a820232  children=[...:document-ingest]
    [ 0.36s] review       ument-ingest  response=approved
    [ 0.47s] review       hly-journals  response=approved
    [ 0.86s] done         hly-7a820232
✓  reject_all   (0.45s)
summary: 2 passed, 0 failed (of 2)

What’s auto-mocked

The harness fakes the things that would otherwise need a real environment:

Activity returns

Derived from each @activity.defn’s return type via Pydantic introspection. Required fields get type-conformant fakes:

Type	Fake
`str`	`"auto-mock"`
`int`, `float`, `Decimal`	`0`
`bool`	`False`
`datetime`, `date`	`now()` / `today()`
`BaseModel` (nested)	recursive fake

Fields with defaults are left alone — the runbook author’s defaults are usually the most realistic value the harness can produce, so it doesn’t override them.

HITL responses

Controlled by the Scenario. HAPPY approves every review, REJECT_ALL rejects every review. For mixed cases:

custom = Scenario(
    name="mixed",
    approve_reviews=False,
    review_overrides={
        "extraction_review": "approved",
        "journal_proposal_review": "rejected",
    },
    corrections={
        "extraction_review": [...]   # synthetic corrections to apply on approve
    },
)

submit_file signals

The harness sends a fake document_ref plus tenant_slug / entity_slug derived from the workflow’s advertised args. Your runbook’s parse_pdf / parse_starting_tb activities receive a real-shaped payload without anything having to actually upload a file.

Built-in scenarios

Name	Behaviour
`HAPPY`	Auto-approves every review; uses fakes for everything else. The path that exercises the most code.
`REJECT_ALL`	Auto-rejects every review. Verifies your runbook handles rejection cleanly — workflows should terminate without dangling state.

Both are importable as singletons (from ntro.testing import HAPPY, REJECT_ALL).

Custom scenarios

from ntro.testing import Scenario

ctx_with_corrections = Scenario(
    name="extract-with-fixes",
    approve_reviews=True,
    review_overrides={"journal_proposal_review": "approved"},
    corrections={
        "extraction_review": [
            {"field_path": "supplier_name", "value": "Acme Holdings Ltd"},
            {"field_path": "invoice_total", "value": "£1,234.56"},
        ],
    },
)

Use these in the same for scenario in [...] loop. Run a custom scenario when you want to verify a specific edge case (a rejection followed by a re-submit, a correction that shifts the GL allocation, etc.).

Why this exists

The deployed flow boots Temporal, spawns a worker, registers workflows, listens for signals, processes activities — for a runbook author iterating on a small change, that’s seconds-to-minutes per cycle. The harness skips all of it. You change a line in activities.py, ntro workflow test runs, you see the result. Inner loop is tight enough that runbook authors actually use it. The harness uses the same code paths your runbook will use in production (Temporal’s in-memory environment, real ntro.workflow machinery, real Pydantic models). It’s not a separate test runtime — it’s the same runtime, just spun up fast.

Testing locally (CLI)

ntro workflow test wraps everything on this page in a CLI.

Deploy to production

Once your scenarios pass locally, deploy.

Get Started

Tooling

Ntro SDK

Deploy & Run

Install

The API

Canonical example

What’s auto-mocked

Built-in scenarios

Custom scenarios

Why this exists

Testing locally (CLI)

Deploy to production

Get Started

Tooling

Ntro SDK

Deploy & Run

Documentation Index

​Install

​The API

​Canonical example

​What’s auto-mocked

​Built-in scenarios

​Custom scenarios

​Why this exists

​Related

Testing locally (CLI)

Deploy to production

Install

The API

Canonical example

What’s auto-mocked

Built-in scenarios

Custom scenarios

Why this exists

Related