Document Lifecycle
How current internal OCR states map to the future public document status model in the Numora Public API developer preview.
Scope
This page explains how a document moves through the Numora pipeline.
It also distinguishes between:
- The current internal OCR states already used by the existing app.
- The future normalized
document.statusvalues exposed by the public API.
Why Two Status Layers Exist
The current Numora app stores OCR-oriented internal states that reflect the extraction pipeline directly.
The public API should expose a smaller and more stable status model for external developers. That public model should be easy to understand without leaking internal implementation details.
Current Internal OCR States
The current app uses these internal OCR states:
pendingprocessingfor_reviewreviewedfailed
These values describe the internal extraction and review flow used by the existing OCR subsystem.
Public Document Status Model
The public API should normalize those internal states into the following document.status values:
processingreview_requiredreviewedpush_pendingpush_succeededpush_failedfailed
The first three values are enough for the initial public document lifecycle. The push_* states become relevant when the public document resource also reflects downstream write-back state.
Status Mapping
| Current internal OCR state | Future public document.status | Meaning |
|---|---|---|
pending | processing | The document has been accepted, but extraction work has not completed yet. |
processing | processing | Extraction or background processing is still running. |
for_review | review_required | Extraction finished and the document is waiting for human review or approval. |
reviewed | reviewed | Human review is complete and the extracted result has been confirmed. |
failed | failed | Extraction or a required processing step failed. |
Future Write-Back Extension
When the public API starts exposing downstream delivery state on the document resource, the reviewed state may later expand into:
push_pendingpush_succeededpush_failed
Those values should describe destination delivery state, not OCR extraction state.
Recommended Lifecycle Paths
Extraction and review
The most common lifecycle is:
pending -> processing -> for_review -> reviewed
Public API view:
processing -> review_required -> reviewed
Extraction failure
If extraction fails:
pending -> processing -> failed
Public API view:
processing -> failed
Review complete and downstream delivery
If the document is later coupled to write-back execution:
pending -> processing -> for_review -> reviewed -> push_pending -> push_succeeded
or:
pending -> processing -> for_review -> reviewed -> push_pending -> push_failed
Implementation Notes
- The current OCR subsystem treats
for_review,reviewed, andfailedas terminal OCR states. - The public API should not expose internal OCR state names directly if a more stable normalized status already exists.
- Public docs should describe
review_requiredas the external equivalent of the internalfor_reviewstate.