Most fund-ops runbooks start the same way: a file lands in storage (an invoice, a bank statement, a rent roll, a trial balance) and the runbook needs to turn its bytes into something it can reason about.Documentation Index
Fetch the complete documentation index at: https://docs.ntropii.com/llms.txt
Use this file to discover all available pages before exploring further.
ntro.capabilities.files does that turn.
Install
The API
One public coroutine:CellGrid-shaped object with two key surfaces:
| Field | Type | What’s in it |
|---|---|---|
grid.cells | list[list[Cell]] | 2-D cell grid preserving the source’s row / column structure. Each cell knows its position, its value, and its bounding box for PDFs. Useful when you need tabular layout to interpret the data (rent rolls, trial balances). |
grid.plain_text | str | The same content flattened to plain text in reading order. Useful when you only care about the prose — invoice line items, narrative summaries, anything you’ll feed straight to AI extraction. |
plain_text and passes cells as structured_context so the model can disambiguate when layout matters.
PDF parsing — format="pdf"
Backed by pdfplumber. Best for:
- Scanned-and-OCR’d documents (invoices, statements, contracts)
- Form-style documents with key-value pairs
- Documents with tables that have visible borders
document-ingest runbook:
- The bytes come from the tenant data plane (Postgres), not the activity payload. Signals carry only the
document_refso payloads stay small. - Both
grid.cellsandgrid.plain_textflow into theRawDocumentso the next step (AI extraction) has both.
Excel parsing — format="xlsx"
Backed by openpyxl. Best for:
- Trial balances exported from Xero / SAP / Sage
- Investor registers, capital call schedules, NAV templates
- Anything where preserving sheet / cell coordinates matters
nav-monthly-journals runbook:
Choosing between cells and plain_text
| You need… | Use |
|---|---|
| Free-text extraction (invoice descriptions, prose paragraphs) | plain_text |
| Tabular extraction where row / column position carries meaning | plain_text for the prompt, cells as structured_context |
| Bounding-box-based document classification (PDF only) | cells (each cell has .bbox) |
| Cheap “give me everything as one string” | plain_text |
Related
Private AI
The natural next step —
ai.extract() consumes what files.parse() produces.Data
Where parsed documents typically come from (
storage.read or the data plane).